Skip to content

Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op#20355

Merged
psiddh merged 1 commit into
pytorch:mainfrom
CodeLinaro:addmm
Jun 18, 2026
Merged

Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op#20355
psiddh merged 1 commit into
pytorch:mainfrom
CodeLinaro:addmm

Conversation

@qti-horodnic

Copy link
Copy Markdown
Contributor

Summary

Added full support for the aten.addmm core ATen op via a two-pass decomposition strategy:

  1. AddmmToLinearTransform (ExecuTorch shared pass): Converts the common nn.Linear decomposition pattern (addmm(bias, input, weight.T)) back to aten.linear, mapping to QNN's fused FullyConnected op for optimal performance.

  2. DecomposeAddmm (new pass): Handles remaining standalone addmm nodes by decomposing them into mm + add. Supports non-unit alpha/beta scalars via additional mul nodes.

AddmmToLinearTransform alone is not sufficient because it only handles the subset of addmm nodes that match the nn.Linear decomposition pattern, specifically where args[2] is a transposed weight (t_copy or permute_copy).
Standalone addmm(bias, A, B) calls where B is not transposed are explicitly skipped by that pass. DecomposeAddmm serves as the fallback for these cases.

Also made some small improvements to the new_op_development skill based on recent learnings.

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_addmm --model SM8750 --host aisw-vm15-labsd --device 545ee4aa --build_folder build-android

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNFloatingPointOperator.test_qnn_backend_addmm --model SM8750 --host aisw-vm15-labsd --device 545ee4aa --build_folder build-android

@pytorch-bot

pytorch-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20355

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure, 1 Unclassified Failure

As of commit a126ccb with merge base ef5c8a7 (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

  • periodic (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026
@qti-horodnic

Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jun 17, 2026
@psiddh

psiddh commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@claude Review this PR

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

Claude finished @psiddh's task in 4m 55s —— View job


PR Review: QNN addmm support

I reviewed the decomposition pass, pipeline wiring, annotator changes, and tests. Overall this is a clean, well-scoped addition that follows existing QNN decompose-pass conventions. A few things worth addressing below.

🐛 Correctness: wrong meta["val"] shape on the scaled-bias node

In backends/qualcomm/_passes/decompose_addmm.py, every intermediate node copies node.meta (the addmm output meta), but bias_scaled does not have the addmm output shape:

meta = node.meta            # addmm output, e.g. (4, 8)
...
bias_scaled = graph.create_node("call_function", mul_op, (bias_node, beta_const))
bias_scaled.meta = copy_meta(meta)   # <-- claims (4, 8), but bias is (8,)

bias * beta has the bias shape ((8,) in the test), not the addmm output shape (4, 8). So bias_scaled.meta["val"] is inconsistent with the actual tensor. Downstream passes that read meta["val"].shape (e.g. ExpandBroadcastTensorShape, LayoutTransform) can misbehave on shapes where the bias isn't already the full output shape. The test happens to pass because the subsequent add broadcasts correctly at runtime regardless of the intermediate meta, but the metadata is still wrong.

mm_node / mm_scaled are fine (mm output equals addmm output shape), so only the bias branch needs the source node's meta:

bias_scaled.meta = copy_meta(bias_node.meta)

Fix this →

❓ PR description vs. actual wiring: AddmmToLinearTransform

The summary describes a "two-pass strategy" where AddmmToLinearTransform first converts the nn.Linear pattern to aten.linear. However, I don't see AddmmToLinearTransform wired into the QNN pipeline anywhere — backends/transforms/addmm_mm_to_linear.py is only referenced by the Samsung and Vulkan backends:

backends/samsung/enn_preprocess.py:67:    AddmmToLinearTransform(),
backends/vulkan/vulkan_preprocess.py:166:  AddmmToLinearTransform(),

So in this PR, all addmm nodes (including those from nn.Linear) fall through to DecomposeAddmmmm + add, rather than mapping to QNN's fused FullyConnected. That's functionally correct, but it means the "optimal FullyConnected" path from the description isn't actually exercised. Could you clarify: is AddmmToLinearTransform expected to be added in a follow-up, or does QNN's existing linear handling already prevent nn.Linear from lowering to addmm (e.g. via the decomposition table)? If the latter, the description is just slightly misleading; if the former, the perf claim doesn't hold yet.

🔎 Minor

  • Pipeline registration is broad. DecomposeAddmm is added to both get_annotation_passes() and get_default_pass_activations() (capture/to-edge) but not get_export_passes(). This is consistent with your updated skill doc (annotation handles the quantized path; capture handles the pure-float path where the annotation pipeline doesn't run), so this looks intentional — just confirming it's deliberate.
  • DCE style. This pass uses graph.eliminate_dead_code() + graph_module.recompile(), whereas sibling passes (decompose_binary_alpha.py) use dead_code_elimination_pass(graph_module). Both work; matching the helper would be marginally more consistent.
  • Test coverage. The two test cases (alpha=beta=1 and alpha=2, beta=3) use a non-transposed mat2 (3, 8), which correctly exercises the standalone DecomposeAddmm path. Good. If AddmmToLinearTransform does get wired in later, consider also adding a case with a transposed weight so both branches are covered. The new mm entry in the MatMul annotator (htp/lpai) is the right enabler for the quantized path — nice.

✅ Looks good

  • mm builder already exists (op_matmul.py targets aten.mm.default), so the decomposition outputs are buildable.
  • addmm is not in to_be_implemented_operator, so no partition cleanup needed.
  • Keyword-only alpha/beta are correctly read from kwargs with default 1, and != 1 short-circuits the extra mul nodes for the common case.
  • Edge vs. ATen op-target handling via is_edge is correct for both pipelines.

Nice, focused PR — the main item to address is the bias_scaled meta shape; the rest is clarification/polish.
· branch addmm

@psiddh psiddh merged commit 66884b4 into pytorch:main Jun 18, 2026
177 of 185 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants