-
Notifications
You must be signed in to change notification settings - Fork 600
[Proposal] Add Nemotron-H hybrid Mamba2-Transformer adapter #1402
Copy link
Copy link
Open
Labels
TransformerBridgeBug specific to the new TransformerBridge systemBug specific to the new TransformerBridge systemcomplexity-highVery complicated changes for people to address who are quite familiar with the codeVery complicated changes for people to address who are quite familiar with the codehelp wantedExtra attention is neededExtra attention is neededlow-priorityMaintainers are not prioritising this work currently.Maintainers are not prioritising this work currently.new-architectureThis card involves adding a new architecture .This card involves adding a new architecture .
Metadata
Metadata
Assignees
Labels
TransformerBridgeBug specific to the new TransformerBridge systemBug specific to the new TransformerBridge systemcomplexity-highVery complicated changes for people to address who are quite familiar with the codeVery complicated changes for people to address who are quite familiar with the codehelp wantedExtra attention is neededExtra attention is neededlow-priorityMaintainers are not prioritising this work currently.Maintainers are not prioritising this work currently.new-architectureThis card involves adding a new architecture .This card involves adding a new architecture .
Type
Fields
Give feedbackNo fields configured for issues without a type.
Proposal
Add a TransformerBridge adapter for
NemotronHForCausalLM(NVIDIA Nemotron-H), a hybrid that mixes Mamba-2 and attention layers.Motivation
Nemotron-H keeps only a small fraction of attention layers (around 8%) and replaces the rest with Mamba-2. That makes the few attention layers a clean target for interpretability: researchers can ask what those layers do that the state-space layers cannot. The line has strong, ongoing NVIDIA releases and wide adoption, and it complements the existing Mamba and Mamba2 support.
We have a limited amount of support for Mamba layers, and working on this will open some new avenues to support possible work on those Mamba layers as well.
Gap scan (2026-06-18): 53 models, ~4.99M downloads, the highest-ranked hybrid state-space gap.
Pitch
Build on the existing Mamba2 components for the state-space layers and standard attention hooks for the interleaved attention layers. A tiny test checkpoint (
trl-internal-testing/tiny-NemotronHForCausalLM-nano) keeps CI cheap./add-model-support nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.trl-internal-testing/tiny-NemotronHForCausalLM-nano, thennvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.Additional context
hf_scraperarchitecture-gaps pass (2026-06-18).Checklist