[megatron] feat: support gpt-oss #4323

ISEEKYAN · 2025-11-27T11:39:49Z

for now (latest TE=2.10), gptoss's optimized attn kernel is not supported for thd format, so we use bshd format here.
when bshd format is used, we need to pad the input_ids to the longest sequence length
so we recommend to disable dynamic batch size and set micro batch size to 1 to avoid paddings
but it is ok to try with micro_batch_size>1

see test_dapo_gptoss_20b_megatron.sh for example.

The training crashes with mismatch, need further experiments with MIS/TIS or fp16

gemini-code-assist

Code Review

This pull request adds support for gpt-oss by introducing a bshd data format to handle models that do not support the existing thd format. The changes are spread across model forwarding logic, configuration files, and utility functions. A new test script for gpt-oss is also included.

My review has identified a critical bug in the model_forward.py file where a hardcoded sequence_parallel flag can lead to runtime errors due to shape mismatches. Additionally, there's an inconsistency in the new test script where the default model path does not match the path where the model is saved, which would cause the script to fail. I've provided suggestions to fix both issues.

gemini-code-assist · 2025-11-27T11:41:04Z

verl/models/mcore/model_forward.py

            )
+            if post_process and logits_processor is not None:
+                args = {
+                    k: preprocess_bshd(v, attention_mask, position_ids, sequence_parallel=True, pre_process=True)[0]


The sequence_parallel argument for preprocess_bshd is hardcoded to True when processing logits_processor_args. This is inconsistent with the processing of input_ids, which uses the sp variable. If sequence parallelism is disabled (sp=False), this will lead to a shape mismatch between the model's output and the logits_processor arguments, causing a runtime error. The sp variable should be used for consistency.

Suggested change

k: preprocess_bshd(v, attention_mask, position_ids, sequence_parallel=True, pre_process=True)[0]

k: preprocess_bshd(v, attention_mask, position_ids, sequence_parallel=sp, pre_process=True)[0]

gemini-code-assist · 2025-11-27T11:41:05Z

recipe/dapo/test_dapo_gptoss_20b_megatron.sh

+NNODES=${NNODES:-1}
+# Paths
+RAY_DATA_HOME=${RAY_DATA_HOME:-"${HOME}/verl"}
+MODEL_PATH=${MODEL_PATH:-"${RAY_DATA_HOME}/models/gpt-oss-20b"}


The default MODEL_PATH is inconsistent with the output_dir in the get_model.py script. The get_model.py script saves the model to "$HOME/models/gpt-oss-20b-bf16", but MODEL_PATH defaults to "${RAY_DATA_HOME}/models/gpt-oss-20b", which resolves to "${HOME}/verl/models/gpt-oss-20b". This will cause the script to fail with a "model not found" error unless MODEL_PATH is explicitly set. To ensure consistency, the default MODEL_PATH should point to the correct directory.

Suggested change

MODEL_PATH=${MODEL_PATH:-"${RAY_DATA_HOME}/models/gpt-oss-20b"}

MODEL_PATH=${MODEL_PATH:-"$HOME/models/gpt-oss-20b-bf16"}

verl/trainer/config/_generated_ppo_megatron_trainer.yaml

for now (latest TE=2.10), gptoss's optimized attn kernel is not supported for thd format, so we use bshd format here. when bshd format is used, we need to pad the input_ids to the longest sequence length so we recommend to disable dynamic batch size and set micro batch size to 1 to avoid paddings but it is ok to try with micro_batch_size>1 see `test_dapo_gptoss_20b_megatron.sh` for example. <img width="1299" height="867" alt="image" src="https://github.com/user-attachments/assets/b166a4b7-9c3a-4840-84c1-e8de02b506db" /> The training crashes with mismatch, need further experiments with MIS/TIS or fp16

support gptoss

0dbac95

ISEEKYAN requested review from FightingZhen, PeterSH6, ZihengJiang, eric-haibin-lin, ji-huazhong, tongyx361 and vermouth1992 as code owners November 27, 2025 11:39

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

ISEEKYAN added 2 commits November 27, 2025 07:30

bug

7512866

fix

f7cde8e

ccclyu self-requested a review November 28, 2025 08:04

ccclyu self-assigned this Nov 28, 2025

yiakwy-xpu-ml-framework-team mentioned this pull request Nov 28, 2025

[BUG] fix sglang veRL GptOSS rollout problem sgl-project/sglang#14099

Open

6 tasks

wuxibin89 reviewed Dec 1, 2025

View reviewed changes

verl/trainer/config/_generated_ppo_megatron_trainer.yaml Outdated Show resolved Hide resolved

ISEEKYAN added 2 commits December 1, 2025 03:00

change argument name

0f2eee3

Merge branch 'main' into mcore_gptoss

46d0479

wuxibin89 approved these changes Dec 2, 2025

View reviewed changes

ISEEKYAN merged commit 9f3199f into volcengine:main Dec 2, 2025
76 of 80 checks passed

This was referenced Dec 3, 2025

[trainer] feat: add gpt-oss sglang-megatron example #4394

Open

[Feature] add gptoss continue train bf16-fp8 (sft) example [part1 - mcore] NVIDIA/Megatron-LM#2383

Open

mikequan0425 mentioned this pull request Dec 29, 2025

[recipe, fsdp] feat: support GPT-OSS-20B DAPO training script on ASCEND NPU #4716

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[megatron] feat: support gpt-oss #4323

[megatron] feat: support gpt-oss #4323

Uh oh!

ISEEKYAN commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

gemini-code-assist bot Nov 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	k: preprocess_bshd(v, attention_mask, position_ids, sequence_parallel=True, pre_process=True)[0]
	k: preprocess_bshd(v, attention_mask, position_ids, sequence_parallel=sp, pre_process=True)[0]

	MODEL_PATH=${MODEL_PATH:-"${RAY_DATA_HOME}/models/gpt-oss-20b"}
	MODEL_PATH=${MODEL_PATH:-"$HOME/models/gpt-oss-20b-bf16"}

[megatron] feat: support gpt-oss #4323

[megatron] feat: support gpt-oss #4323

Uh oh!

Conversation

ISEEKYAN commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants