fix llama3 OOM issue and lm_head unsupport issue #2360

xin3he · 2025-12-09T08:41:48Z

PR Type

Bug fix, Enhancement, Documentation

Description

Added gpu_memory_utilization parameter to prevent OOM
Removed lm_head quantization to support vLLM inference
Updated README with notes on quantization and accuracy

Diagram Walkthrough

flowchart LR
  A["Add gpu_memory_utilization"] -- "Prevent OOM" --> B["Update run_benchmark.sh"]
  C["Remove lm_head quantization"] -- "Support vLLM inference" --> D["Update run_quant.sh"]
  E["Add notes on quantization"] -- "Update README.md" --> F["Document changes"]

File Walkthrough

Relevant files

Enhancement

run_benchmark.sh `Add gpu_memory_utilization parameter` examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/run_benchmark.sh Added `gpu_memory_utilization=0.8` to `model_args`	+2/-2

Bug fix

run_quant.sh `Remove lm_head quantization` examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/run_quant.sh Removed `--quant_lm_head` from quantization commands	+3/-6

Documentation

README.md `Update README with quantization notes` examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md Added notes on quantization accuracy and lm_head support	+8/-0

Signed-off-by: He, Xin3 <xin3.he@intel.com>

PRAgent4INC · 2025-12-09T08:42:35Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The removal of `--quant_lm_head` might affect the quantization process, especially if the `quant_lm_head` flag is necessary for certain configurations or models. CMD="python quantize.py --model_name_or_path \"$INPUT_MODEL\" $COMMON_ARGS --dtype MXFP8 --iters 0 --export_path \"$OUTPUT_MODEL\"" echo "Executing command: $CMD" python quantize.py \ --model_name_or_path "$INPUT_MODEL" \ $COMMON_ARGS \ --dtype MXFP8 \ --iters 0 \ --export_path "$OUTPUT_MODEL" ;; Hardcoded Value The `gpu_memory_utilization` value is hardcoded to `0.8`. This might not be suitable for all environments and could lead to suboptimal performance or OOM issues in different setups. local cmd="lm_eval --model vllm --model_args pretrained=\"$MODEL_PATH\",add_bos_token=$add_bos_token,tensor_parallel_size=$TENSOR_PARALLEL_SIZE,gpu_memory_utilization=0.8,data_parallel_size=1 --tasks $tasks --batch_size $BATCH_SIZE" echo "Executing command: $cmd" lm_eval --model vllm \ --model_args pretrained="$MODEL_PATH",add_bos_token=$add_bos_token,tensor_parallel_size=$TENSOR_PARALLEL_SIZE,gpu_memory_utilization=0.8,data_parallel_size=1 \ --tasks $tasks \

PRAgent4INC · 2025-12-09T08:42:52Z

PR Code Suggestions ✨

Signed-off-by: He, Xin3 <xin3.he@intel.com>

fix OOM issue and lm_head unsupport issue

86193d7

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he changed the title ~~fix OOM issue and lm_head unsupport issue~~ fix llama3 OOM issue and lm_head unsupport issue Dec 9, 2025

PRAgent4INC added the Review effort 3/5 label Dec 9, 2025

mem from 0.8 to 0.65

c0522ec

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix llama3 OOM issue and lm_head unsupport issue #2360

fix llama3 OOM issue and lm_head unsupport issue #2360

xin3he commented Dec 9, 2025 •

edited by PRAgent4INC

Loading

Uh oh!

PRAgent4INC commented Dec 9, 2025

Uh oh!

PRAgent4INC commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix llama3 OOM issue and lm_head unsupport issue #2360

Are you sure you want to change the base?

fix llama3 OOM issue and lm_head unsupport issue #2360

Conversation

xin3he commented Dec 9, 2025 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Dec 9, 2025

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Dec 9, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xin3he commented Dec 9, 2025 •

edited by PRAgent4INC

Loading