Skip to content

[Common] Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK#3150

Open
phu0ngng wants to merge 1 commit into
NVIDIA:mainfrom
phu0ngng:update_nccl
Open

[Common] Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK#3150
phu0ngng wants to merge 1 commit into
NVIDIA:mainfrom
phu0ngng:update_nccl

Conversation

@phu0ngng

Copy link
Copy Markdown
Collaborator

Description

Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR bumps the 3rdparty/nccl submodule from commit 808d2433 to 696e971f to pull in an upstream NCCL fix for MAX_SUPPORTED_TOKENS_PER_RANK. No TransformerEngine source, build, or test files are changed.

  • The only file modified is the submodule pointer in 3rdparty/nccl; all TransformerEngine code is untouched.
  • The PR checklist item for adding tests is left unchecked, and the "Type of change" section has no box ticked — while acceptable for a pure submodule bump, documenting the type of change (e.g., Bug fix) and noting how the fix can be validated would improve traceability.

Confidence Score: 4/5

This PR touches only the NCCL submodule pointer; all TransformerEngine code is unchanged, so the blast radius is limited to whatever the new NCCL commit introduces.

The change is a single-line submodule bump to an upstream NCCL fix. No TransformerEngine logic is modified, reducing the chance of regressions in the main codebase. The main uncertainty is that the new NCCL commit cannot be verified locally (submodule not initialized), and no tests specifically exercising the MAX_SUPPORTED_TOKENS_PER_RANK path are included or referenced.

3rdparty/nccl — the submodule points to a new NCCL commit that is not reachable from the local checkout, so the actual code delta is not directly reviewable here.

Important Files Changed

Filename Overview
3rdparty/nccl Submodule pointer updated from 808d2433 to 696e971f to incorporate a fix for MAX_SUPPORTED_TOKENS_PER_RANK in NVIDIA's NCCL library. No TransformerEngine source files were modified.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant TE as TransformerEngine
    participant NCCLSub as 3rdparty/nccl (submodule)
    participant NCCLUpstream as NVIDIA/nccl (upstream)

    Note over NCCLUpstream: Commit 808d2433<br/>(before fix)
    TE->>NCCLSub: references old commit (808d2433)
    NCCLSub-->>TE: MAX_SUPPORTED_TOKENS_PER_RANK bug present

    Note over NCCLUpstream: Commit 696e971f<br/>(after fix)
    TE->>NCCLSub: updated pointer to 696e971f
    NCCLSub-->>TE: MAX_SUPPORTED_TOKENS_PER_RANK fix applied
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant TE as TransformerEngine
    participant NCCLSub as 3rdparty/nccl (submodule)
    participant NCCLUpstream as NVIDIA/nccl (upstream)

    Note over NCCLUpstream: Commit 808d2433<br/>(before fix)
    TE->>NCCLSub: references old commit (808d2433)
    NCCLSub-->>TE: MAX_SUPPORTED_TOKENS_PER_RANK bug present

    Note over NCCLUpstream: Commit 696e971f<br/>(after fix)
    TE->>NCCLSub: updated pointer to 696e971f
    NCCLSub-->>TE: MAX_SUPPORTED_TOKENS_PER_RANK fix applied
Loading

Reviews (1): Last reviewed commit: "update nccl" | Re-trigger Greptile

Comment thread 3rdparty/nccl
@@ -1 +1 @@
Subproject commit 808d2433dda3cccc80f8172a94a6b117359e7102
Subproject commit 696e971fb092a26f81a8c24b432beecdbbe3064e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No "Type of change" ticked and test checkbox unchecked

The PR template's "Type of change" section has no box checked, and the tests checkbox is left unchecked. Even for a submodule bump, recording the change category (e.g., "Bug fix") and noting how the fix can be exercised (e.g., pointing to an existing collective-communication test or a reproduction script) helps future reviewers and release-note authors trace the fix back to this PR.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@phu0ngng

Copy link
Copy Markdown
Collaborator Author

/te-ci L1

@phu0ngng phu0ngng added the 2.17 label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant