Skip to content

[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default)#20360

Open
JulianCloudNTH wants to merge 3 commits into
gh/JulianCloudNTH/35/basefrom
gh/JulianCloudNTH/35/head
Open

[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default)#20360
JulianCloudNTH wants to merge 3 commits into
gh/JulianCloudNTH/35/basefrom
gh/JulianCloudNTH/35/head

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Adds aten.view_copy.default to the WebGPU delegate. A contiguous reshape on a dense row-major buffer backend is a flat copy output[i] = input[i], so the op is a single 1D-dispatch copy kernel.

Composition (single compute dispatch):

  • runtime/ops/view_copy/view_copy.h — declares add_flat_copy(graph, in_id, out_id): fail-loud guards (both tensors, fp32, numel-preserving) + the view_copy.wgsl dispatch over compute_1d_workgroup_count(num_elements) with override wg_size; mirrors Vulkan add_view_copy_node.
  • runtime/ops/view_copy/ViewCopy.cpp — reads args = [self, size, out], ignores the AOT-fixed size value-id (output shape comes from out_tensor.dims), calls add_flat_copy.
  • runtime/ops/view_copy/view_copy.wgsl — guards idx >= num_elements, writes output[idx] = input[idx].
  • add_flat_copy is factored into the header so the stacked squeeze/unsqueeze ops reuse it without a new kernel.
    @exported-using-ghexport

Differential Revision: D108793164

Differential Revision: D108793164

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20360

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit 867bd73 with merge base 0e65ba6 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@JulianCloudNTH JulianCloudNTH requested a review from psiddh June 18, 2026 15:55
[ghstack-poisoned]
@JulianCloudNTH

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

Claude finished @JulianCloudNTH's task in 1m 14s —— View job


Code Review: WebGPU view_copy op

  • Read changed files & sibling ops
  • Check consistency with existing WebGPU op patterns
  • Review correctness, resource handling, edge cases

Clean, well-scoped addition. The structure mirrors mul/BinaryOp.cpp faithfully (workgroup sizing, uniform creation, bind-group layout, release ordering), and factoring add_flat_copy into the header so stacked squeeze/unsqueeze ops can reuse the kernel is the right call. Resource lifetime matches the established convention — shader/BGL/pipeline-layout/uniform released after dispatch, while pipeline and bind group are retained by the graph. No leaks or double-frees.

A few observations, none blocking:

1. The "fp32 guard" is really a 4-byte-alignment guardViewCopy.cpp:44-49

if (in_tensor.nbytes % sizeof(float) != 0 || out_tensor.nbytes % sizeof(float) != 0)

This admits any dtype whose byte count is a multiple of 4 (e.g. an even-numel fp16 tensor), not just fp32. That happens to be safe here — a flat byte copy is dtype-agnostic and the in_numel == num_elements check below is effectively an nbytes-equal check — so the result is still correct. But the error string "expected fp32" overpromises. Compare with mul_impl (mul/BinaryOp.cpp:53-61), which asserts nbytes == numel * 4 against the dims-derived numel for a true dtype check. Since WebGPUTensor carries dims, you could tighten this to match mul's precision, or just soften the comment/message to "operand size not 4-byte aligned" to reflect what's actually enforced.

2. Contiguity assumption is implicit. The flat output[i] = input[i] is only correct when the input is dense row-major — a transposed/strided view would copy in the wrong order. This matches the buffer-backend design (and Vulkan's add_view_copy_node), and the PR body states it, but there's no runtime guard against a non-contiguous input. If non-contiguous inputs are structurally impossible at this point in the WebGPU delegate, a one-line comment in add_flat_copy noting that invariant would save a future reader the trip to the PR description.

3. args.at(args.size() - 1) for outViewCopy.cpp:154. Correct and robust against the AOT size value-id. Minor: a named local (const int out_id = args.at(args.size() - 1);) reads a touch clearer than the inline expression, but this is taste.

4. Generated header has unused constantsview_copy_wgsl.h:39-40 define kViewCopyWorkgroupSizeY/Z, which aren't referenced (only ...SizeX is). This is @generated, so presumably the codegen template emits all three uniformly — fine to leave, just flagging.

Nit: ViewCopyParams::_pad[3] pads the uniform to 16 bytes and the binding uses sizeof(ViewCopyParams) (16) while the WGSL Params is 4 bytes — that's valid (binding size ≥ struct size), just worth a one-word comment that the padding is for the 16-byte uniform alignment requirement (the existing comment at :24 covers this).

No correctness bugs found. The nbytes-based numel check correctly prevents the OOB input[] read, the idx >= num_elements shader guard is present, and the dispatch count goes through compute_1d_workgroup_count's limit check. LGTM with the optional polish above.

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants