Add DS/QWEN Examples #2333

yiliu30 · 2025-11-12T07:03:11Z

User description

Signed-off-by: yiliu30 yi4.liu@intel.com

PR Type

Enhancement

Description

Added DS/QWEN quantization examples
Included quantization scripts for different schemes
Added generation script using vLLM

Diagram Walkthrough

flowchart LR
  A["Add quantize.py"] -- "DS/QWEN quantization" --> B["Add generate.py"]
  B -- "vLLM integration" --> C["Add run scripts"]

File Walkthrough

Relevant files

Enhancement

2 files

quantize.py `Added quantization script for DS/QWEN`	+149/-0
generate.py `Added generation script using vLLM`	+73/-0

Additional files

5 files

README.md	+51/-0
run_eval.sh	+105/-0
run_gen.sh	+80/-0
run_generate.sh	+115/-0
run_quant.sh	+36/-0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

PRAgent4INC · 2025-11-12T07:03:52Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns Trust Remote Code: The use of `trust_remote_code=True` when loading models and tokenizers can expose the system to security risks if the source is not trusted. Ensure that the models and tokenizers are from a reliable source.
⚡ Recommended focus areas for review Hardcoded Iterations The `iters` parameter is hardcoded to 0 for most configurations in `topologies_config`. This should be configurable via command-line arguments or another method to allow flexibility. "scheme": "MXFP8", "fp_layers": "lm_head", "iters": 0, }, "ds_mxfp4": { "scheme": "MXFP4", "fp_layers": "lm_head,self_attn", "iters": 0, }, "qwen_mxfp8": { "scheme": "MXFP8", "fp_layers": "lm_head,mlp.gate", "iters": 0, }, "qwen_mxfp4": { "scheme": "MXFP4", "fp_layers": "lm_head,mlp.gate,self_attn", "iters": 0, # TODO: set to 200 before merge Unused Argument The `--skip_attn` argument is defined but not used in the provided code. It should either be utilized or removed to avoid confusion. help="Skip quantize attention layers.", ) parser.add_argument( Trust Remote Code The `trust_remote_code=True` flag is used when loading models and tokenizers. This can pose a security risk if the source of the model or tokenizer is not trusted. Consider adding a warning or making this configurable. fp32_model = AutoModelForCausalLM.from_pretrained( model_name, device_map="cpu", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True, )

PRAgent4INC · 2025-11-12T07:04:23Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Remove unused argument This argument is not used in the current code. Either remove it or implement its functionality. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/quantize.py [130-133] -+ "--skip_attn", -+ action="store_true", -+ help="Skip quantize attention layers.", +# Removed unused argument +# "--skip_attn", +# action="store_true", +# help="Skip quantize attention layers.", Suggestion importance[1-10]: 8 __ Why: The suggestion identifies an unused argument and proposes either removal or implementation. This is important for maintaining clean and functional code.	Medium
	Remove redundant argument The `iters` argument is already defined in `topologies_config`. Consider removing this argument to avoid redundancy. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/quantize.py [135-139] -+ "--iters", -+ type=int, -+ default=0, -+ help="Number of iterations for quantization.", +# Removed redundant argument +# "--iters", +# type=int, +# default=0, +# help="Number of iterations for quantization.", Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies redundancy in the `iters` argument and proposes its removal. This helps in avoiding confusion and maintaining consistency.	Medium
	Set iters value directly Remove the TODO comment and set the value directly if it's confirmed. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/quantize.py [30] -"iters": 0, # TODO: set to 200 before merge +"iters": 200, Suggestion importance[1-10]: 7 __ Why: The suggestion correctly addresses the TODO comment and sets the value directly. However, it does not provide additional context or validation for the change.	Medium
	Use logging for messages Consider logging the message instead of printing it for better control over output verbosity. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/generate.py [5-8] try: from auto_round_extension.vllm_ext import apply as apply_auto_round_extension apply_auto_round_extension() except ImportError: - print("auto_round_extension.vllm_ext not found, proceeding without auto-round extension.") + import logging + logging.warning("auto_round_extension.vllm_ext not found, proceeding without auto-round extension.") Suggestion importance[1-10]: 7 __ Why: Logging provides better control over output verbosity and is generally preferred over print statements in production code.	Medium
	Externalize prompts Externalize the prompts to a configuration file or environment variable for flexibility. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/generate.py [54-58] -prompts = [ - "Hello, my name is", - "The president of the United States is", - "The capital of France is", - "The future of AI is", -] +import os +prompts_str = os.getenv("PROMPTS", '["Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is"]') +prompts = eval(prompts_str) Suggestion importance[1-10]: 6 __ Why: Externalizing prompts can make the script more flexible and easier to configure without modifying the code.	Low
	Use JSON for output Format the output using a more structured approach, such as JSON, for easier parsing. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/generate.py [60-66] +import json outputs = llm.generate(prompts, sampling_params) # Print the outputs. -print("-" * 50) -for output in outputs: - prompt = output.prompt - generated_text = output.outputs[0].text - print(f"Prompt: {prompt!r}\nGenerated text: {generated_text!r}") - print("-" * 50) +print(json.dumps([{"prompt": output.prompt, "generated_text": output.outputs[0].text} for output in outputs], indent=4)) Suggestion importance[1-10]: 6 __ Why: Using JSON for output makes it easier to parse and integrate with other systems, but it may not be necessary for simple scripts.	Low
	Eliminate duplicate assignments Remove duplicate assignments to `model_path`. examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/run_eval.sh [6-9] -model_path="/storage/yiliu7/quantized_model_ds_mxfp4" -model_path="/storage/yiliu7/quantized_model_ds_mxfp4" -model_path="/storage/yiliu7/quantized_model_qwen_mxfp4" model_path="/storage/yiliu7/quantized_model_qwen_mxfp8" Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies and removes duplicate assignments to `model_path`, improving code clarity and maintainability.	Low

for more information, see https://pre-commit.ci

add qwen-ds example

df06537

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 changed the title ~~Add DS/QWEN Ex.am.ples~~ Add DS/QWEN Examples Nov 12, 2025

PRAgent4INC added Review effort 4/5 Possible security concern labels Nov 12, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

859f13f

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DS/QWEN Examples #2333

Add DS/QWEN Examples #2333

Uh oh!

yiliu30 commented Nov 12, 2025 •

edited by PRAgent4INC

Loading

Uh oh!

PRAgent4INC commented Nov 12, 2025

Uh oh!

PRAgent4INC commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add DS/QWEN Examples #2333

Are you sure you want to change the base?

Add DS/QWEN Examples #2333

Uh oh!

Conversation

yiliu30 commented Nov 12, 2025 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Nov 12, 2025

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Nov 12, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiliu30 commented Nov 12, 2025 •

edited by PRAgent4INC

Loading