Skip to content

Handle module names from Dynamo compiler in FP8 Quantizer #2223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sandeep-maddipatla
Copy link

Type of Change

  • Bug Fix, with No API change.

Description

  • The Measure component generates stats with original module names from the model, as opposed to the altered module names used by torch.compile to reference the original model layers in a compiled model.
  • We need the quantizer to reference the original module names to be able to use the measured stat dumps correctly. This PR captures the Quantizer change to do this

Expected Behavior & Potential Risk

Current behavior is that is the quantizer searches for the stats with altered module names from compiled model and fails with an exception such as below

Exception: Error - Layer '_orig_mod.x_embedder' was called but was not quantized because no measures were supplied.

With this PR, this error is no longer generated.

How has this PR been tested?

The test script below reproduces the problem. Re-run this with the PR in place to verify there is no error.

# Fetch sources, install dependencies
pip install optimum-habana sentencepiece
git clone https://github.com/huggingface/optimum-habana
cd /path/to/working-dir
cp -r /path/to/optimum-habana/examples/stable-diffusion/quantization .
huggingface-cli login --token YourHFTokenGoesHere

Have below test script in working directory.

import os
import torch
from optimum.habana.diffusers import  GaudiFlowMatchEulerDiscreteScheduler, GaudiFluxPipeline

mode = os.environ.get('MODE', 'quant')

# load model
model_name = "black-forest-labs/FLUX.1-dev"
scheduler = GaudiFlowMatchEulerDiscreteScheduler.from_pretrained(
    model_name,
    subfolder="scheduler"
)
pipe = GaudiFluxPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=False,
    gaudi_config="Habana/stable-diffusion",
    bf16_full_eval=True,
    torch_dtype=torch.bfloat16
)

if mode == 'measure':
    # dump measure stats through INC
    os.environ["QUANT_CONFIG"] = "quantization/flux/measure_config.json"
    pipe(
        prompt="A picture of sks dog in a bucket",
        quant_mode="measure",
    )
    print('Measurement step done')
elif mode == 'quant':
    # quantize with INC (from measured stats)
    os.environ["QUANT_CONFIG"] = "quantization/flux/quantize_config.json"
    pipe.transformer = torch.compile(pipe.transformer, backend="hpu_backend")
    image = pipe(
        prompt="A picture of sks dog in a bucket",
        quant_mode="quantize"
    ).images[0]
    image.save(f"output_image.png")
    print('Quant Step done')
else:
    print(f'Unrecognized setting for MODE={mode}')

Run the two-step quantization with below commands - run measure step first, and then quant.

MODE=measure PT_HPU_LAZY_MODE=0 python reproducer.py
MODE=quant PT_HPU_LAZY_MODE=0 python reproducer.py

Dependency Change?

No library / dependency changes

- Quantizer equivalent for how the measure component handles the same scenario
@skaulintel
Copy link

looks good to me.

@thuang6 thuang6 requested review from xin3he and yiliu30 June 16, 2025 03:08
@xin3he
Copy link
Contributor

xin3he commented Jun 16, 2025

I'd like to get comments from Habana team, @ulivne and @linoybu , please take a look~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants