Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op#20355
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20355
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Unrelated Failure, 1 Unclassified FailureAs of commit a126ccb with merge base ef5c8a7 ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: qualcomm" |
|
@claude Review this PR |
|
Claude finished @psiddh's task in 4m 55s —— View job PR Review: QNN
|
Summary
Added full support for the
aten.addmmcore ATen op via a two-pass decomposition strategy:AddmmToLinearTransform(ExecuTorch shared pass): Converts the commonnn.Lineardecomposition pattern (addmm(bias, input, weight.T)) back toaten.linear, mapping to QNN's fusedFullyConnectedop for optimal performance.DecomposeAddmm(new pass): Handles remaining standaloneaddmmnodes by decomposing them intomm + add. Supports non-unitalpha/betascalars via additionalmulnodes.AddmmToLinearTransformalone is not sufficient because it only handles the subset ofaddmmnodes that match thenn.Lineardecomposition pattern, specifically whereargs[2]is a transposed weight (t_copyorpermute_copy).Standalone
addmm(bias, A, B)calls whereBis not transposed are explicitly skipped by that pass.DecomposeAddmmserves as the fallback for these cases.Also made some small improvements to the
new_op_developmentskill based on recent learnings.Test plan