[ExecuTorch][WebGPU] Add permute_copy + IntList graph support (aten.permute_copy.default) by JulianCloudNTH · Pull Request #20396 · pytorch/executorch

JulianCloudNTH · 2026-06-18T21:35:50Z

Stack from ghstack (oldest at bottom):

Adds aten.permute_copy.default (a coordinate-reorder gather) to the WebGPU delegate, and the IntList graph value type it needs to read its dims argument.

Composition:

runtime/WebGPUGraph.{h,cpp} — adds ValueType::IntList backed by std::vector<std::vector<int64_t>> int_lists_ + get_int_list(int); build() deserializes vkgraph::GraphTypes::IntList via value_as_IntList()->items() (int64, matching the FlatBuffer [long]); mirrors the existing scalar value plumbing.
runtime/ops/permute/Permute.cpp — reads the permutation via get_int_list, normalizes negative dims, validates it is a permutation of [0, ndim), builds two TensorMeta UBOs + a PermuteParams{perm: vec4<u32>} uniform, guards fp32 + rank≤4, dispatches over compute_1d_workgroup_count(out.numel) with override wg_size; releases all uniforms after the bind group.
runtime/ops/permute/permute.wgsl — delinearizes the output index over the contiguous output strides, reads input at in.strides[perm[d]] per dim (mirrors Vulkan permute_buffer.glsl).
Registers both aten.permute_copy.default and aten.permute.default to the same handler.
@exported-using-ghexport

Differential Revision: D108793162

[ghstack-poisoned]

pytorch-bot · 2026-06-18T21:35:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20396

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit dcc3a48 with merge base 0e65ba6 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 924748f14259c6a7febf75f2aed6850ff6e48e1fc69ec8f8f2a8f732ead4c289 /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 37db4da251f56493c8d09470335882c58556de9d396b55f25cf436e928add847 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest / macos / macos-job (gh) (similar failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-18T21:36:31Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-18T21:54:22Z

@claude review

claude · 2026-06-18T21:55:03Z

Claude finished @JulianCloudNTH's task in 3m 11s —— View job

Code Review — PR #20396: permute_copy + IntList graph support

Overall this is a clean, well-scoped addition that faithfully mirrors the existing per-op pattern (Slice.cpp) and the Vulkan permute_buffer.glsl reference. The IntList plumbing matches the FlatBuffer [long] schema (schema/program.fbs:193-195, Vulkan schema.fbs:88-89) and the existing scalar-value conventions. I verified the WGSL gather math, the bounds/validation logic, and the uniform/bind-group lifetime. A few notes below — nothing blocking.

Correctness ✅

Gather math is correct. For output coord d, in_bufi += coord * in_strides[perm[d]] correctly maps output dim d to input dim dims[d] (PyTorch permute semantics: out.shape[d] == in.shape[dims[d]]). Output delinearization over contiguous out_meta.strides is sound since the AOT output is contiguous.
No division-by-zero in the shader. out_meta.strides[d] for d < ndim is always ≥1 with contiguous strides; an empty tensor (numel == 0) dispatches 0 workgroups.
No OOB on the fixed-size arrays. perm[]/seen[] are sized kTensorMetaMaxNdim (4) and the ndim > 4 guard (Permute.cpp:62) runs before the indexing loop. The permutation-validity check (negative-dim normalize + seen[] dup detection) is correct.
fp32 + rank≤4 guards are present and consistent with Slice.cpp.

Suggestions (non-blocking)

Missing op test. Every other buffer op under backends/webgpu/test/ops/ (slice, squeeze, unsqueeze, select, view_copy, …) ships a test module, but there's no test/ops/permute/. Since permute has nontrivial index math, a correctness test across a few rank-2/3/4 permutations (including negative dims and identity) would be valuable and would also exercise the new IntList deserialization path. Add a permute op test →
Significant boilerplate duplication with Slice.cpp. Permute.cpp:111-187 (shader module → bind-group-layout → pipeline → bind-group → dispatch → release) is ~95% identical to Slice.cpp:106-182, differing only in the params struct and WGSL constant. This is a pre-existing pattern across ops, so following it here is reasonable — but the duplication is now large enough that a small shared helper (e.g. "build a 1-storage-in / 1-storage-out / N-uniform pipeline+dispatch") would pay off across the growing op set. Out of scope for this PR; flagging for the backend maintainers.
kPermuteWorkgroupSizeY / kPermuteWorkgroupSizeZ are unused (permute_wgsl.h:57-58). These appear to be emitted by the WGSL codegen for all shaders, so this matches the generated-header convention and is fine — just noting they're dead for this 1-D dispatch.
Redundant rank guard. fill_tensor_meta (TensorMeta.h:42-44) already throws on ndim > 4, so the explicit check at Permute.cpp:62-64 is belt-and-suspenders. Harmless, and arguably clearer since it guards the array writes directly.

Verified

WGSL SHA d34f5973…00197 matches the embedded value in permute_wgsl.h:16 (generated header is in sync with the .wgsl source). ✅
IntList enum addition and int_lists_ resize in build() correctly mirror the other value types. ✅
· branch gh/JulianCloudNTH/45/head

[ghstack-poisoned]

Update

5cbf24e

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 18, 2026 21:35

JulianCloudNTH temporarily deployed to cadence June 18, 2026 21:36 — with GitHub Actions Inactive

JulianCloudNTH mentioned this pull request Jun 18, 2026

[ExecuTorch][WebGPU] cat op test suite (cases.py op-test framework) #20399

Open

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026

Update

dcc3a48

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 18, 2026 22:25 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add permute_copy + IntList graph support (aten.permute_copy.default)#20396

[ExecuTorch][WebGPU] Add permute_copy + IntList graph support (aten.permute_copy.default)#20396
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/45/basefrom
gh/JulianCloudNTH/45/head

JulianCloudNTH commented Jun 18, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulianCloudNTH commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20396

❌ 2 New Failures, 3 Unrelated Failures

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — PR #20396: permute_copy + IntList graph support

Correctness ✅

Suggestions (non-blocking)

Verified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulianCloudNTH commented Jun 18, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 18, 2026 •

edited

Loading