Add `--first-last-layers-bf16` by zianglih · Pull Request #614 · radixark/miles

zianglih · 2026-02-17T23:36:59Z

@HumansAnd

Motivation

As mentioned in https://arxiv.org/abs/2509.25149:

Keep a few sensitive linear layers in higher precision

is helpful for training convergence.

Dependency

sgl-project/sglang#18742 for serving mixed-precision checkpoints

gemini-code-assist · 2026-02-17T23:37:19Z

Summary of Changes

Hello @zianglih, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the model quantization pipeline by introducing the ability to perform partial MXFP8 quantization. Specifically, it allows users to designate a certain number of the last decoder layers to retain their original BF16 precision, while the rest of the model is converted to MXFP8. This feature provides finer-grained control over model precision, potentially balancing performance and accuracy by preserving higher precision in critical layers.

Highlights

Partial Quantization Control: Introduced a new mechanism to selectively skip MXFP8 quantization for a specified number of final decoder layers, allowing them to remain in BF16 precision.
New Argument --first-last-layers-bf16: Added the --first-last-layers-bf16 flag and a corresponding --num-layers-at-end-in-bf16 argument to control this behavior in the model conversion and execution scripts.
Dynamic Skip Logic: Implemented dynamic calculation of layer prefixes to skip during quantization based on the total number of hidden layers and the specified number of BF16 layers at the end.
Improved Conversion Script: The convert_hf_to_mxfp8.py script was updated to read model configuration (like num_hidden_layers) and pass layer-specific parameters to the quantization logic.

Changelog

miles/backends/megatron_utils/megatron_to_hf/processors/quantizer_mxfp8.py
- Implemented conditional logic to skip MXFP8 quantization for main decoder layers based on first_last_layers_bf16 and num_layers_at_end_in_bf16 arguments.
- Added assertions to ensure num_layers is set and positive when first_last_layers_bf16 is enabled.
scripts/run_qwen3_30b_a3b.py
- Introduced num_layers_at_end_in_bf16 as a new ScriptArgs parameter with a default of 0.
- Added validation to ensure num_layers_at_end_in_bf16 is only used when rollout_mxfp8 is enabled.
- Modified the convert_hf_to_mxfp8.py command to pass the num_layers_at_end_in_bf16 argument.
- Added --first-last-layers-bf16 and related arguments to the misc_args for the execution command when train_fp8 or train_mxfp8 is enabled and num_layers_at_end_in_bf16 is greater than 0.
tools/convert_hf_to_mxfp8.py
- Imported the re module for regular expression operations.
- Modified the should_quantize function to accept an optional skip_weight_substrings argument, allowing dynamic control over which weights to skip.
- Updated the process_file function signature to accept num_hidden_layers and num_layers_at_end_in_bf16.
- Implemented logic within process_file to dynamically determine layer prefixes to skip based on num_layers_at_end_in_bf16 and passed these to should_quantize.
- Modified the convert_mxfp8 function signature to accept num_layers_at_end_in_bf16 and read num_hidden_layers from config.json.
- Updated the call to process_file within convert_mxfp8 to pass the new layer-related arguments.
- Added a natural_key function for sorting modules_to_not_convert to ensure numerical order.
- Added a command-line argument --num-layers-at-end-in-bf16 to the main function's argument parser.
- Passed the num_layers_at_end_in_bf16 argument from the command line to the convert_mxfp8 function.

Activity

The pull request was opened by zianglih.
The motivation for this change is to enable more flexible quantization strategies.
A dependency on sglang/pull/18742 was noted.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a feature to skip quantization for the last N layers of a model, keeping them in BF16 format. The changes span across model conversion scripts and training execution scripts to support this new functionality, primarily for MXFP8 quantization. My review identifies a logical contradiction in the feature's activation condition, a magic number that hurts maintainability, and some redundant and non-robust file handling. Addressing these points will improve the code's correctness and quality.

gemini-code-assist · 2026-02-17T23:41:37Z

scripts/run_qwen3_30b_a3b.py

+    if (args.train_fp8 or args.train_mxfp8) and args.num_layers_at_end_in_bf16 > 0:
+        misc_args += (
+            "--first-last-layers-bf16 "
+            "--num-layers-at-start-in-bf16 0 "
+            f"--num-layers-at-end-in-bf16 {args.num_layers_at_end_in_bf16} "
+        )


There's a logical contradiction regarding when this feature is enabled. The __post_init__ check on line 40 asserts that num_layers_at_end_in_bf16 is only supported when rollout_mxfp8 is enabled. However, this block enables the feature for both train_fp8 and train_mxfp8. If train_fp8 is used (which usually implies rollout_fp8 and not rollout_mxfp8), the assertion on line 40 will fail, making the feature unusable with train_fp8.

Given that other changes in this PR are specific to mxfp8, it seems this feature is intended only for mxfp8. If so, the condition should be narrowed to resolve the contradiction.

Suggested change

if (args.train_fp8 or args.train_mxfp8) and args.num_layers_at_end_in_bf16 > 0:

misc_args += (

"--first-last-layers-bf16 "

"--num-layers-at-start-in-bf16 0 "

f"--num-layers-at-end-in-bf16 {args.num_layers_at_end_in_bf16} "

)

if args.train_mxfp8 and args.num_layers_at_end_in_bf16 > 0:

misc_args += (

"--first-last-layers-bf16 "

"--num-layers-at-start-in-bf16 0 "

f"--num-layers-at-end-in-bf16 {args.num_layers_at_end_in_bf16} "

)

gemini-code-assist · 2026-02-17T23:41:37Z

tools/convert_hf_to_mxfp8.py

+    num_maybe_mtp_layers = 1
+    dynamic_skip_layer_prefixes: set[str] = {
+        f"model.layers.{i}." for i in range(tail_start_idx, num_hidden_layers + num_maybe_mtp_layers)
+    }


The use of the magic number 1 for num_maybe_mtp_layers makes the code less readable and harder to maintain. It's not immediately clear why this value is 1 and if it's model-specific.

To improve clarity and maintainability, please define this as a named constant with a comment explaining its purpose. For example:

# Number of MTP (Mixture of Transformer Parallel) layers to account for, which might not be included in `num_hidden_layers`. # This can be model-specific. NUM_MAYBE_MTP_LAYERS = 1 # ... # ... range(tail_start_idx, num_hidden_layers + NUM_MAYBE_MTP_LAYERS)

gemini-code-assist · 2026-02-17T23:41:37Z

tools/convert_hf_to_mxfp8.py

+    config_path = os.path.join(input_path, "config.json")
+    with open(config_path) as f:
+        cfg = json.load(f)
+    num_hidden_layers = int(cfg["num_hidden_layers"])


This block reads config.json, but the same file is read again at line 198, which is redundant. There are also inconsistencies in file handling throughout the function:

The open call here at line 155 is not protected by os.path.exists, which will cause a crash if the file is missing. The later read at line 198 is protected.

File operations at lines 198 and 200 do not use a with statement, which is not best practice.

Please consider refactoring to load the config once at the start of the function, and use with statements for all file I/O to ensure robustness and proper resource management.

zianglih requested review from fzyzcjy, maocheng23 and yueming-yuan as code owners February 17, 2026 23:37

gemini-code-assist bot reviewed Feb 17, 2026

View reviewed changes

zianglih mentioned this pull request Feb 18, 2026

[Roadmap] Blackwell MXFP8 and NVFP4 RL training #615

Open

13 tasks

Add initial num_layers_at_end_in_bf16

56515e8

ziang-and force-pushed the mxfp8-last-n branch from d5236a8 to 56515e8 Compare February 18, 2026 10:01

yueming-yuan approved these changes Feb 18, 2026

View reviewed changes

zianglih mentioned this pull request Feb 22, 2026

Allow serving FP8 checkpoints with mixed precision across layers sgl-project/sglang#18742

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--first-last-layers-bf16`#614

Add `--first-last-layers-bf16`#614
zianglih wants to merge 1 commit intoradixark:mainfrom
zianglih:mxfp8-last-n

zianglih commented Feb 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Uh oh!

gemini-code-assist bot Feb 17, 2026

Uh oh!

gemini-code-assist bot Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zianglih commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Dependency

Uh oh!

gemini-code-assist bot commented Feb 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zianglih commented Feb 17, 2026 •

edited

Loading