Skip to content

[Proposal] Add Nemotron-H hybrid Mamba2-Transformer adapter #1402

@jlarson4

Description

@jlarson4

Proposal

Add a TransformerBridge adapter for NemotronHForCausalLM (NVIDIA Nemotron-H), a hybrid that mixes Mamba-2 and attention layers.

Motivation

Nemotron-H keeps only a small fraction of attention layers (around 8%) and replaces the rest with Mamba-2. That makes the few attention layers a clean target for interpretability: researchers can ask what those layers do that the state-space layers cannot. The line has strong, ongoing NVIDIA releases and wide adoption, and it complements the existing Mamba and Mamba2 support.

We have a limited amount of support for Mamba layers, and working on this will open some new avenues to support possible work on those Mamba layers as well.

Gap scan (2026-06-18): 53 models, ~4.99M downloads, the highest-ranked hybrid state-space gap.

Pitch

Build on the existing Mamba2 components for the state-space layers and standard attention hooks for the interleaved attention layers. A tiny test checkpoint (trl-internal-testing/tiny-NemotronHForCausalLM-nano) keeps CI cheap.

  • Claude Code users can scaffold with /add-model-support nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.
  • Register at the four sites listed in contributing.md.
  • Verify smallest-first: trl-internal-testing/tiny-NemotronHForCausalLM-nano, then nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.

Additional context

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    TransformerBridgeBug specific to the new TransformerBridge systemcomplexity-highVery complicated changes for people to address who are quite familiar with the codehelp wantedExtra attention is neededlow-priorityMaintainers are not prioritising this work currently.new-architectureThis card involves adding a new architecture .

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions