Even rounding for mxfp4 #65

yiliu30 · 2025-08-13T06:37:28Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Yi Liu <yiliu4@habana.ai>

gemini-code-assist

Summary of Changes

Hello @yiliu30, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an "even rounding" method for MXFP4 (Mixed-Precision 4-bit Floating Point) quantization, aiming to improve numerical precision. It also includes a significant refactoring of the project's dependency management by removing explicit PyTorch and related library version constraints from various requirements.txt files and pyproject.toml, likely to allow for more flexible or system-managed installations. Additionally, several example scripts have been updated, possibly for testing the new quantization method and other model configurations.

Highlights

MXFP4 Even Rounding: Implemented an "even rounding" method for MXFP4 quantization, which can be enabled via the VLLM_MXFP4_EVEN_ROUNDING environment variable.
Dependency Management Refactoring: Refactored dependency management by removing explicit torch, torchaudio, torchvision, and xformers version pins from pyproject.toml and various requirements/*.txt files across different hardware platforms (CPU, CUDA, ROCm, Neuron, TPU, XPU).
Example Script Updates: Updated example inference scripts (basic_hpu.py, start_vllm.sh) to include new model paths, adjust memory utilization settings, increase maximum sequence length, and modify lm_eval parameters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an alternative "even rounding" implementation for MXFP4 quantization, controlled by the VLLM_MXFP4_EVEN_ROUNDING environment variable. The PR also contains substantial changes to dependency management, removing pinned versions of torch and its related packages from all requirements files. This is a major change that could impact build stability and reproducibility, and its motivation should be clarified in the PR description. Additionally, several example and testing scripts have been modified with hardcoded paths and commented-out code, which appear to be remnants of local development and should be cleaned up before merging. A critical syntax error was found in the new quantization utility function.

vllm/model_executor/layers/quantization/utils/mxfp4_emulation_utils.py

gemini-code-assist · 2025-08-13T06:39:25Z

examples/offline_inference/basic/start_vllm.sh

+# echo "Stopping vLLM server"
+#kill ${pid}
+#echo "Script execution completed"
+#sleep 10


The server cleanup logic (kill ${pid}) is commented out. This will leave the vLLM server process running after the script completes, which can consume resources unnecessarily. If this was for debugging, please re-enable it.

Suggested change

# echo "Stopping vLLM server"

#kill ${pid}

#echo "Script execution completed"

#sleep 10

echo "Stopping vLLM server"

kill ${pid}

echo "Script execution completed"

sleep 10

gemini-code-assist · 2025-08-13T06:39:25Z

examples/offline_inference/basic/basic_hpu.py

 model_path = "/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-V2-Lite-NVFP4-autoround"
 # model_path = "/software/users/yiliu4/deepseek-ai/DeepSeek-R1-NVFP4-OFFLINE"
 model_path = "/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-RTN"
-
+model_path = "/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-autoround"
+model_path = "/software/users/yiliu4/HF_HOME/Yi30/Llama-3.2-1B-Instruct-MXFP4-llmc"
 # model_path = "/software/users/yiliu4/HF_HOME/Yi30/DeepSeek-V2-Lite-NVFP4-W4A4-RTN-GLOBAL-SCALE-WW"
-
+model_path = "/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-MXFP4-RTN"


This block contains multiple assignments to model_path, many of which are immediately overwritten. This appears to be for local testing and should be cleaned up. Please consolidate this to a single default model_path and rely on the command-line argument --model_path to specify different models for testing.

gemini-code-assist · 2025-08-13T06:39:25Z

examples/offline_inference/basic/start_vllm.sh

 nvfp4_model_path=/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-autoround/
+nvfp4_model_path="/software/users/yiliu4/deepseek-ai/DeepSeek-R1-nvfp4-fix-723"
+nvfp4_model_path="/software/users/yiliu4/deepseek-ai/DeepSeek-R1-nvfp4-fix-723-skip-atten"
+nvfp4_model_path=/software/users/yiliu4/deepseek-ai/DeepSeek-R1-NVFP4-OFFLINE
 nvfp4_model_path="/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-RTN"
+nvfp4_model_path="/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-RTN"
+nvfp4_model_path="/software/users/yiliu4/HF_HOME/weiweiz1/DeepSeek-R1-NVFP4-autoround"


There are multiple assignments to nvfp4_model_path, including a duplicate. This appears to be for local testing and should be cleaned up to avoid confusion. Please retain only the necessary model path assignments.

gemini-code-assist · 2025-08-13T06:39:25Z

examples/offline_inference/basic/start_vllm.sh

 model_base_name=$(basename $model_path)

-EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output_${task_name}_bs${batch_size}__${timestamp}"
+EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output__bs${batch_size}__${timestamp}"


The task_name variable is no longer included in EVAL_LOG_NAME, but it is still used in the echo command on line 297 and the lm_eval command on line 303. This creates a mismatch between the log message and the actual log file name. For better traceability, consider adding task_name back to the log file name.

Suggested change

EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output__bs${batch_size}__${timestamp}"

EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output_${task_name}_bs${batch_size}__${timestamp}"

…utils.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Yi Liu <yiliu4@habana.ai>

Yi4Liu added 4 commits August 8, 2025 10:02

start vllm cmd

b685558

Signed-off-by: Yi Liu <yiliu4@habana.ai>

use exisitng torch

25a59e2

Signed-off-by: Yi Liu <yiliu4@habana.ai>

use datasets 3.6

cb613ca

Signed-off-by: Yi Liu <yiliu4@habana.ai>

add even rounding for mxfp4

a15084f

Signed-off-by: Yi Liu <yiliu4@habana.ai>

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

yiliu30 and others added 2 commits August 13, 2025 14:54

Update vllm/model_executor/layers/quantization/utils/mxfp4_emulation_…

01ec35b

…utils.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add ao back

355bde6

Signed-off-by: Yi Liu <yiliu4@habana.ai>

yiliu30 merged commit c9c5d12 into hpu-mxfp8-moe Aug 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Even rounding for mxfp4 #65

Even rounding for mxfp4 #65

Uh oh!

yiliu30 commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output__bs${batch_size}__${timestamp}"
	EVAL_LOG_NAME="mxfp8_${model_base_name}_lm_eval_output_${task_name}_bs${batch_size}__${timestamp}"

Uh oh!

Even rounding for mxfp4 #65

Even rounding for mxfp4 #65

Uh oh!

Conversation

yiliu30 commented Aug 13, 2025

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants