Graph Safe Current Scaling Support for GroupedLinear Module/Ops#3143
Graph Safe Current Scaling Support for GroupedLinear Module/Ops#3143vthumbe1503 wants to merge 8 commits into
Conversation
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
for more information, see https://pre-commit.ci
Removed details about FP8 current scaling methods. Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
|
/te-ci pytorch |
Greptile SummaryThis PR extends the CUDA-graph-safe grouped-GEMM path in both the module (
Confidence Score: 5/5The changes are safe to merge — the core guard fixes are correct in both code paths and the new Float8CurrentScaling early-return is logically sound. Both the module and ops layers correctly gate No files require special attention. Important Files Changed
Reviews (4): Last reviewed commit: "Merge branch 'nvfp4_and_fp8_current_scal..." | Re-trigger Greptile |
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
… weight being cuda graphable Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
…3/TransformerEngine into nvfp4_and_fp8_current_scaling
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: