Graph Safe Current Scaling Support for GroupedLinear Module/Ops by vthumbe1503 · Pull Request #3143 · NVIDIA/TransformerEngine

vthumbe1503 · 2026-06-25T00:51:38Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

for more information, see https://pre-commit.ci

Removed details about FP8 current scaling methods. Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>

vthumbe1503 · 2026-06-25T00:57:35Z

/te-ci pytorch

greptile-apps · 2026-06-25T01:01:21Z

Greptile Summary

This PR extends the CUDA-graph-safe grouped-GEMM path in both the module (_GroupedLinear) and ops (GroupedLinear) layers to support FP8 per-tensor current scaling, which is backed by tex.group_quantize and cuBLASLt grouped GEMM and runs on Hopper (CC 9.0) as well as Blackwell.

Core logic change: Float8CurrentScalingQuantizer is now detected before the Blackwell-only CC check in both _is_grouped_tensor_path_supported (module) and _is_graph_safe_path_supported (ops), enabling an early return True on Hopper.
Bug fix carried forward: The guard that frees grouped_x.rowwise_data/scale_inv before save_for_backward is now gated on grouped_x.columnwise_data is not None, preventing null-activation corruption for per-tensor FP8. The single_grouped_weight restriction for Float8CurrentScaling in make_grouped_weights is also removed, and _is_graph_safe_path_supported gains a single_grouped_weight parameter to properly restrict only NVFP4.
Tests: New fp8_current_scaling parametrize cases are added to test_grouped_linear.py with correctly relaxed Hopper skip logic; test_grouped_mlp.py adds fp8_current_scaling/nvfp4_rht to the cuda-graph-safe test suite with appropriate per-recipe skip guards.

Confidence Score: 5/5

The changes are safe to merge — the core guard fixes are correct in both code paths and the new Float8CurrentScaling early-return is logically sound.

Both the module and ops layers correctly gate rowwise_data cleanup on columnwise_data is not None, preventing null-activation corruption for per-tensor FP8. The new dispatch is consistent between paths, and the single_grouped_weight restriction is properly scoped to NVFP4 only.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/module/grouped_linear.py	Adds Float8CurrentScalingQuantizer early-return in `_is_grouped_tensor_path_supported` and fixes the backward-pass guard so `rowwise_data`/`scale_inv` are only cleared when `columnwise_data` is present.
transformer_engine/pytorch/ops/basic/grouped_linear.py	Imports Float8CurrentScalingQuantizer, threads `single_grouped_weight` into `_is_graph_safe_path_supported`, removes the erroneous `float8_current_scaling` restriction in `make_grouped_weights`, and applies the `columnwise_data is not None` guard before freeing `rowwise_data`.
tests/pytorch/test_grouped_linear.py	Adds `fp8_current_scaling` parametrize cases and correctly relaxes the Blackwell-only skip for that recipe.
tests/pytorch/test_grouped_mlp.py	Adds `fp8_current_scaling` and `nvfp4_rht` to the cuda-graph-safe test; stale skip reason string for the delayed-scaling guard.

_{Reviews (4): Last reviewed commit: "Merge branch 'nvfp4_and_fp8_current_scal..." | Re-trigger Greptile}

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

… weight being cuda graphable Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

…3/TransformerEngine into nvfp4_and_fp8_current_scaling

vthumbe1503 and others added 3 commits June 25, 2026 00:40

support in grouped linear and relevant tests

bad0b2c

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9af6df6

for more information, see https://pre-commit.ci

Unecessary details remove

bd0832c

Removed details about FP8 current scaling methods. Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>

vthumbe1503 marked this pull request as ready for review June 25, 2026 00:57

vthumbe1503 requested review from ksivaman and timmoon10 as code owners June 25, 2026 00:57

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread transformer_engine/pytorch/module/grouped_linear.py Outdated

vthumbe1503 and others added 5 commits June 26, 2026 17:26

fix grouped linear module's grouped tensor path

a163af3

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

allow more current scaling use-cases.. block nvfp4+rht+single grouped…

971160f

… weight being cuda graphable Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

cb57694

for more information, see https://pre-commit.ci

some minor comment fixing

f97ef20

Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>

Merge branch 'nvfp4_and_fp8_current_scaling' of github.com:vthumbe150…

b195152

…3/TransformerEngine into nvfp4_and_fp8_current_scaling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph Safe Current Scaling Support for GroupedLinear Module/Ops#3143

Graph Safe Current Scaling Support for GroupedLinear Module/Ops#3143
vthumbe1503 wants to merge 8 commits into
NVIDIA:mainfrom
vthumbe1503:nvfp4_and_fp8_current_scaling

vthumbe1503 commented Jun 25, 2026

Uh oh!

vthumbe1503 commented Jun 25, 2026

Uh oh!

greptile-apps Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vthumbe1503 commented Jun 25, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

vthumbe1503 commented Jun 25, 2026

Uh oh!

greptile-apps Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 25, 2026 •

edited

Loading