[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default) by JulianCloudNTH · Pull Request #20360 · pytorch/executorch

JulianCloudNTH · 2026-06-17T23:59:38Z

Stack from ghstack (oldest at bottom):

Adds aten.view_copy.default to the WebGPU delegate. A contiguous reshape on a dense row-major buffer backend is a flat copy output[i] = input[i], so the op is a single 1D-dispatch copy kernel.

Composition (single compute dispatch):

runtime/ops/view_copy/view_copy.h — declares add_flat_copy(graph, in_id, out_id): fail-loud guards (both tensors, fp32, numel-preserving) + the view_copy.wgsl dispatch over compute_1d_workgroup_count(num_elements) with override wg_size; mirrors Vulkan add_view_copy_node.
runtime/ops/view_copy/ViewCopy.cpp — reads args = [self, size, out], ignores the AOT-fixed size value-id (output shape comes from out_tensor.dims), calls add_flat_copy.
runtime/ops/view_copy/view_copy.wgsl — guards idx >= num_elements, writes output[idx] = input[idx].
add_flat_copy is factored into the header so the stacked squeeze/unsqueeze ops reuse it without a new kernel.
@exported-using-ghexport

Differential Revision: D108793164

[ghstack-poisoned]

pytorch-bot · 2026-06-17T23:59:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20360

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit 867bd73 with merge base 0e65ba6 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 5343bba6577171397a676a284384ed22d67f533f6fd73aea6a81dbbdceb57e96 /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t fb0359baa95e8f7195cec09bf28e4608b35bd93d7c7d99600a1348a25ddd94e4 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-18T00:00:47Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

JulianCloudNTH · 2026-06-18T21:50:24Z

@claude review

claude · 2026-06-18T21:51:13Z

Claude finished @JulianCloudNTH's task in 1m 14s —— View job

Code Review: WebGPU `view_copy` op

Read changed files & sibling ops
Check consistency with existing WebGPU op patterns
Review correctness, resource handling, edge cases

Clean, well-scoped addition. The structure mirrors mul/BinaryOp.cpp faithfully (workgroup sizing, uniform creation, bind-group layout, release ordering), and factoring add_flat_copy into the header so stacked squeeze/unsqueeze ops can reuse the kernel is the right call. Resource lifetime matches the established convention — shader/BGL/pipeline-layout/uniform released after dispatch, while pipeline and bind group are retained by the graph. No leaks or double-frees.

A few observations, none blocking:

1. The "fp32 guard" is really a 4-byte-alignment guard — ViewCopy.cpp:44-49

if (in_tensor.nbytes % sizeof(float) != 0 || out_tensor.nbytes % sizeof(float) != 0)

This admits any dtype whose byte count is a multiple of 4 (e.g. an even-numel fp16 tensor), not just fp32. That happens to be safe here — a flat byte copy is dtype-agnostic and the in_numel == num_elements check below is effectively an nbytes-equal check — so the result is still correct. But the error string "expected fp32" overpromises. Compare with mul_impl (mul/BinaryOp.cpp:53-61), which asserts nbytes == numel * 4 against the dims-derived numel for a true dtype check. Since WebGPUTensor carries dims, you could tighten this to match mul's precision, or just soften the comment/message to "operand size not 4-byte aligned" to reflect what's actually enforced.

2. Contiguity assumption is implicit. The flat output[i] = input[i] is only correct when the input is dense row-major — a transposed/strided view would copy in the wrong order. This matches the buffer-backend design (and Vulkan's add_view_copy_node), and the PR body states it, but there's no runtime guard against a non-contiguous input. If non-contiguous inputs are structurally impossible at this point in the WebGPU delegate, a one-line comment in add_flat_copy noting that invariant would save a future reader the trip to the PR description.

3. args.at(args.size() - 1) for out — ViewCopy.cpp:154. Correct and robust against the AOT size value-id. Minor: a named local (const int out_id = args.at(args.size() - 1);) reads a touch clearer than the inline expression, but this is taste.

4. Generated header has unused constants — view_copy_wgsl.h:39-40 define kViewCopyWorkgroupSizeY/Z, which aren't referenced (only ...SizeX is). This is @generated, so presumably the codegen template emits all three uniformly — fine to leave, just flagging.

Nit: ViewCopyParams::_pad[3] pads the uniform to 16 bytes and the binding uses sizeof(ViewCopyParams) (16) while the WGSL Params is 4 bytes — that's valid (binding size ≥ struct size), just worth a one-word comment that the padding is for the 16-byte uniform alignment requirement (the existing comment at :24 covers this).

No correctness bugs found. The nbytes-based numel check correctly prevents the OOB input[] read, the idx >= num_elements shader guard is present, and the dispatch count goes through compute_1d_workgroup_count's limit check. LGTM with the optional polish above.

[ghstack-poisoned]

Update

1175a70

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 17, 2026 23:59

JulianCloudNTH temporarily deployed to cadence June 17, 2026 23:59 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026

JulianCloudNTH requested a review from psiddh June 18, 2026 15:55

Update

5dc3de1

[ghstack-poisoned]

JulianCloudNTH had a problem deploying to cadence June 18, 2026 21:36 — with GitHub Actions Error

JulianCloudNTH temporarily deployed to cadence June 18, 2026 21:36 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 18, 2026

Update

867bd73

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 18, 2026 22:27 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default)#20360

[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default)#20360
JulianCloudNTH wants to merge 3 commits into
gh/JulianCloudNTH/35/basefrom
gh/JulianCloudNTH/35/head

JulianCloudNTH commented Jun 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JulianCloudNTH commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20360

❌ 2 New Failures, 3 Unrelated Failures

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: WebGPU view_copy op

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 17, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 18, 2026 •

edited

Loading

Code Review: WebGPU `view_copy` op