Skip to content

Conversation

@yiliu30
Copy link
Owner

@yiliu30 yiliu30 commented Aug 1, 2025

Reverts #61

@yiliu30 yiliu30 merged commit 95135b6 into hpu-mxfp8-moe Aug 1, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yiliu30, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request reverts previous changes that introduced a "static global scale" mechanism for NVFP4 quantization, particularly for input tensors. My aim is to simplify the quantization-dequantization (QDQ) process by ensuring that the global scale for inputs is dynamically derived from the input tensor itself, rather than being provided externally or statically.

Highlights

  • NVFP4 Quantization Simplification: I have removed the concept of an explicit input_global_scale from NVFP4 quantization functions, which streamlines the quantize-dequantize (QDQ) process.
  • Function Signature Updates: I've updated the signatures of functions like nvfp4_unpacked_weight_gemm and related calls to no longer accept an input_global_scale parameter, aligning with the simplified QDQ approach.
  • Removal of _with_gs Variants: I've eliminated the qdq_nvfp4_with_gs and to_nvfp4_with_gs functions, consolidating QDQ logic into a single qdq_nvfp4 function that now derives its own global scale internally.
  • Example Code Cleanup: I've cleaned up an HPU example by removing a commented-out model path and a debug print statement, which were likely related to the original "static global scale" feature being reverted.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts changes related to 'Nvfp4 static gs', which simplifies the NVFP4 quantization logic by removing static global scaling. The changes are consistent across multiple files, removing functions like qdq_nvfp4_with_gs and to_nvfp4_with_gs, and updating function signatures and calls accordingly. A notable improvement is the removal of a duplicated qdq_nvfp4 function, which enhances code clarity and maintainability. The review focuses on this cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants