Skip to content

[ExecuTorch][WebGPU] Add slice_copy op (aten.slice_copy.Tensor)#20394

Open
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/43/basefrom
gh/JulianCloudNTH/43/head
Open

[ExecuTorch][WebGPU] Add slice_copy op (aten.slice_copy.Tensor)#20394
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/43/basefrom
gh/JulianCloudNTH/43/head

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Adds aten.slice_copy.Tensor to the WebGPU delegate as a gather: each output element is mapped back to its source input element along the sliced dim via start + coord * step.

Composition (single compute dispatch):

  • runtime/ops/slice/Slice.cpp — reads args = [self, dim, start, end, step, out] via read_scalar (static Int/Null-sentinel default; throws on dynamic SymInt); normalizes negative dim/start, clamps start to [0, in_size]; builds two TensorMeta UBOs + a SliceParams{dim, start, step} uniform; guards fp32; dispatches over compute_1d_workgroup_count(out.numel) with override wg_size; releases all uniforms after the bind group.
  • runtime/ops/slice/slice.wgsl — delinearizes the output index over the contiguous output strides, maps the sliced-dim coordinate back to the input (start + coord*step), relinearizes over the input strides.
    @exported-using-ghexport

Differential Revision: D108793168

Differential Revision: D108793168

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20394

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 5 Unrelated Failures

As of commit 92bf354 with merge base 0e65ba6 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@JulianCloudNTH

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

Claude finished @JulianCloudNTH's task in 2m 27s —— View job


Code Review: slice_copy WebGPU op

  • Read changed files
  • Check conventions vs sibling ops & TensorMeta
  • Analyze correctness, edge cases, style
  • Post review

Reviewed against the sibling select op (runtime/ops/select/), which this closely and correctly mirrors. The gather approach is sound: contiguous-output delinearization → map sliced-dim coord via start + coord*step → relinearize over input strides. The std140 TensorMeta UBO usage, fp32 guard, workgroup sizing, uniform lifetime handling, and bind-group layout all match the established pattern. Nice work — the negative-dim/start normalization and the Int/Null/SymInt sentinel handling in read_scalar are handled carefully.

A few points below; only the first is a real (if low-likelihood) gap.

Suggestions

1. step is not validated — a non-positive value silently misbehaves (Slice.cpp:73, 86-88)
step is read and cast straight to uint32_t. The other parameters get explicit range checks (dim out of range, start clamp), but step doesn't. A step <= 0 would wrap to a huge u32 and produce out-of-bounds reads in the shader (no bounds check on in_bufi). PyTorch/the partitioner guarantee step >= 1 for slice_copy.Tensor, so this is defensive, but it's cheap and matches the surrounding validation style:

const int64_t step = read_scalar(graph, args.at(4), 1, "step");
if (step < 1) {
  throw std::runtime_error("slice: step must be >= 1");
}

Fix this →

2. No bounds check on in_bufi in the shader (slice.wgsl:39)
The computed input index is used to read input[in_bufi] without a guard. This relies entirely on AOT-correct shapes + params (1 above). That's consistent with select.wgsl, so it's acceptable as-is — just noting the invariant the C++ side must uphold.

3. Test coverage and partitioner allowlist appear to be missing from this PR

  • aten.slice_copy.Tensor is not added to WEBGPU_SUPPORTED_OPS in backends/webgpu/test/tester.py (cf. select_copy.int at line 26).
  • No SliceModule/_SLICE_CONFIGS op-test was added — the _fn_config_suite builder in op_tests/cases.py:149 already lists slice in its docstring (view/select/slice), so wiring one up should be a near-copy of the select suite.

If those land in another PR of this ghstack, ignore this — but the runtime op isn't exercised by the op-test framework without them.

Nits

  • The bind-group / pipeline / layout boilerplate (Slice.cpp:106-182) is byte-for-byte identical to Select.cpp. This is the existing per-op convention, so duplicating it is the right call here, but the WebGPU backend may eventually want a shared helper as the op count grows. No action needed.
  • slice_wgsl.h carries a wgsl-sha256 — make sure test_wgsl_codegen.py passes (it should, since the header matches slice.wgsl).

Overall: clean, idiomatic, and consistent with the backend. The only thing I'd actually change before merge is the step guard (#1).
· branch gh/JulianCloudNTH/43/head

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant