Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op by qti-horodnic · Pull Request #20355 · pytorch/executorch

qti-horodnic · 2026-06-17T21:48:02Z

Summary

Added full support for the aten.addmm core ATen op via a two-pass decomposition strategy:

AddmmToLinearTransform (ExecuTorch shared pass): Converts the common nn.Linear decomposition pattern (addmm(bias, input, weight.T)) back to aten.linear, mapping to QNN's fused FullyConnected op for optimal performance.
DecomposeAddmm (new pass): Handles remaining standalone addmm nodes by decomposing them into mm + add. Supports non-unit alpha/beta scalars via additional mul nodes.

AddmmToLinearTransform alone is not sufficient because it only handles the subset of addmm nodes that match the nn.Linear decomposition pattern, specifically where args[2] is a transposed weight (t_copy or permute_copy).
Standalone addmm(bias, A, B) calls where B is not transposed are explicitly skipped by that pass. DecomposeAddmm serves as the fallback for these cases.

Also made some small improvements to the new_op_development skill based on recent learnings.

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_addmm --model SM8750 --host aisw-vm15-labsd --device 545ee4aa --build_folder build-android

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNFloatingPointOperator.test_qnn_backend_addmm --model SM8750 --host aisw-vm15-labsd --device 545ee4aa --build_folder build-android

… ATen op

pytorch-bot · 2026-06-17T21:48:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20355

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure, 1 Unclassified Failure

As of commit a126ccb with merge base ef5c8a7 ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / hifi-build / hifi4 (gh)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh)
Input required and not supplied: aws-region
pull / test-arm-backend-no-driver (test_pytest_models_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 3baabb4690645daa3372cf8f1722ba96703a2af498ef2cf1c8ebea3313b64670 /exec failed with exit code 1
pull / test-phi-3-mini-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 054c8c5e1ca4edd074ec1c3aa02085d5ac19a329427f29c5abcd53922ef5d866 /exec failed with exit code 1

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

periodic (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

qti-horodnic · 2026-06-17T21:48:19Z

@pytorchbot label "release notes: qualcomm"

psiddh · 2026-06-18T07:43:25Z

@claude Review this PR

claude · 2026-06-18T07:44:14Z

Claude finished @psiddh's task in 4m 55s —— View job

PR Review: QNN `addmm` support

I reviewed the decomposition pass, pipeline wiring, annotator changes, and tests. Overall this is a clean, well-scoped addition that follows existing QNN decompose-pass conventions. A few things worth addressing below.

🐛 Correctness: wrong `meta["val"]` shape on the scaled-bias node

In backends/qualcomm/_passes/decompose_addmm.py, every intermediate node copies node.meta (the addmm output meta), but bias_scaled does not have the addmm output shape:

meta = node.meta            # addmm output, e.g. (4, 8)
...
bias_scaled = graph.create_node("call_function", mul_op, (bias_node, beta_const))
bias_scaled.meta = copy_meta(meta)   # <-- claims (4, 8), but bias is (8,)

bias * beta has the bias shape ((8,) in the test), not the addmm output shape (4, 8). So bias_scaled.meta["val"] is inconsistent with the actual tensor. Downstream passes that read meta["val"].shape (e.g. ExpandBroadcastTensorShape, LayoutTransform) can misbehave on shapes where the bias isn't already the full output shape. The test happens to pass because the subsequent add broadcasts correctly at runtime regardless of the intermediate meta, but the metadata is still wrong.

mm_node / mm_scaled are fine (mm output equals addmm output shape), so only the bias branch needs the source node's meta:

bias_scaled.meta = copy_meta(bias_node.meta)

Fix this →

❓ PR description vs. actual wiring: `AddmmToLinearTransform`

The summary describes a "two-pass strategy" where AddmmToLinearTransform first converts the nn.Linear pattern to aten.linear. However, I don't see AddmmToLinearTransform wired into the QNN pipeline anywhere — backends/transforms/addmm_mm_to_linear.py is only referenced by the Samsung and Vulkan backends:

backends/samsung/enn_preprocess.py:67:    AddmmToLinearTransform(),
backends/vulkan/vulkan_preprocess.py:166:  AddmmToLinearTransform(),

So in this PR, all addmm nodes (including those from nn.Linear) fall through to DecomposeAddmm → mm + add, rather than mapping to QNN's fused FullyConnected. That's functionally correct, but it means the "optimal FullyConnected" path from the description isn't actually exercised. Could you clarify: is AddmmToLinearTransform expected to be added in a follow-up, or does QNN's existing linear handling already prevent nn.Linear from lowering to addmm (e.g. via the decomposition table)? If the latter, the description is just slightly misleading; if the former, the perf claim doesn't hold yet.

🔎 Minor

Pipeline registration is broad. DecomposeAddmm is added to both get_annotation_passes() and get_default_pass_activations() (capture/to-edge) but not get_export_passes(). This is consistent with your updated skill doc (annotation handles the quantized path; capture handles the pure-float path where the annotation pipeline doesn't run), so this looks intentional — just confirming it's deliberate.
DCE style. This pass uses graph.eliminate_dead_code() + graph_module.recompile(), whereas sibling passes (decompose_binary_alpha.py) use dead_code_elimination_pass(graph_module). Both work; matching the helper would be marginally more consistent.
Test coverage. The two test cases (alpha=beta=1 and alpha=2, beta=3) use a non-transposed mat2 (3, 8), which correctly exercises the standalone DecomposeAddmm path. Good. If AddmmToLinearTransform does get wired in later, consider also adding a case with a transposed weight so both branches are covered. The new mm entry in the MatMul annotator (htp/lpai) is the right enabler for the quantized path — nice.

✅ Looks good

mm builder already exists (op_matmul.py targets aten.mm.default), so the decomposition outputs are buildable.
addmm is not in to_be_implemented_operator, so no partition cleanup needed.
Keyword-only alpha/beta are correctly read from kwargs with default 1, and != 1 short-circuits the extra mul nodes for the common case.
Edge vs. ATen op-target handling via is_edge is correct for both pipelines.

Nice, focused PR — the main item to address is the bias_scaled meta shape; the rest is clarification/polish.
· branch addmm

Qualcomm AI Engine Direct - Adding QNN backend support for addmm core…

a126ccb

… ATen op

qti-horodnic requested review from abhinaykukkadapu and psiddh as code owners June 17, 2026 21:48

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026

pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jun 17, 2026

qti-horodnic had a problem deploying to cadence June 18, 2026 07:43 — with GitHub Actions Failure

psiddh approved these changes Jun 18, 2026

View reviewed changes

psiddh merged commit 66884b4 into pytorch:main Jun 18, 2026
177 of 185 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op#20355

Qualcomm AI Engine Direct - Adding QNN backend support for the addmm coreATen op#20355
psiddh merged 1 commit into
pytorch:mainfrom
CodeLinaro:addmm

qti-horodnic commented Jun 17, 2026

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

qti-horodnic commented Jun 17, 2026

Uh oh!

psiddh commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qti-horodnic commented Jun 17, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20355

❌ 4 New Failures, 1 Unrelated Failure, 1 Unclassified Failure

Uh oh!

qti-horodnic commented Jun 17, 2026

Uh oh!

psiddh commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: QNN addmm support

🐛 Correctness: wrong meta["val"] shape on the scaled-bias node

❓ PR description vs. actual wiring: AddmmToLinearTransform

🔎 Minor

✅ Looks good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

claude Bot commented Jun 18, 2026 •

edited

Loading

PR Review: QNN `addmm` support

🐛 Correctness: wrong `meta["val"]` shape on the scaled-bias node

❓ PR description vs. actual wiring: `AddmmToLinearTransform`