[ExecuTorch][WebGPU] Add permute_copy + IntList graph support (aten.permute_copy.default)#20396
[ExecuTorch][WebGPU] Add permute_copy + IntList graph support (aten.permute_copy.default)#20396JulianCloudNTH wants to merge 2 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20396
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit dcc3a48 with merge base 0e65ba6 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review |
|
Claude finished @JulianCloudNTH's task in 3m 11s —— View job Code Review — PR #20396: permute_copy + IntList graph support
Overall this is a clean, well-scoped addition that faithfully mirrors the existing per-op pattern ( Correctness ✅
Suggestions (non-blocking)
Verified
|
Stack from ghstack (oldest at bottom):
Adds
aten.permute_copy.default(a coordinate-reorder gather) to the WebGPU delegate, and theIntListgraph value type it needs to read itsdimsargument.Composition:
runtime/WebGPUGraph.{h,cpp}— addsValueType::IntListbacked bystd::vector<std::vector<int64_t>> int_lists_+get_int_list(int);build()deserializesvkgraph::GraphTypes::IntListviavalue_as_IntList()->items()(int64, matching the FlatBuffer[long]); mirrors the existing scalar value plumbing.runtime/ops/permute/Permute.cpp— reads the permutation viaget_int_list, normalizes negative dims, validates it is a permutation of[0, ndim), builds twoTensorMetaUBOs + aPermuteParams{perm: vec4<u32>}uniform, guards fp32 + rank≤4, dispatches overcompute_1d_workgroup_count(out.numel)withoverride wg_size; releases all uniforms after the bind group.runtime/ops/permute/permute.wgsl— delinearizes the output index over the contiguous output strides, readsinputatin.strides[perm[d]]per dim (mirrors Vulkanpermute_buffer.glsl).aten.permute_copy.defaultandaten.permute.defaultto the same handler.@exported-using-ghexport
Differential Revision: D108793162
Differential Revision: D108793162