Skip to content

EXAQ: Exponent Aware Quantization For LLMs Acceleration

License

Notifications You must be signed in to change notification settings

Anonymous1252022/EXAQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exponent-Aware Quantization

Overview

An implementation of exponent-aware quantization (EXAQ) algorithm. EXAQ is a pioneering approach to the exponent operation input quantization, based on analytical model that strategically shifts the focus towards minimizing the quantization error of subsequent to the exponent operation.

The substantial portion of the code was copied from https://github.com/EleutherAI/lm-evaluation-harness repository, whereas the main logic of EXAQ algorithm is concentrated in lm_eval/experimental/utils.py.

Preparation Before Evaluation

The code was mainly tested on nvcr.io/nvidia/pytorch:24.03-py3 image.

Before usage, install all dependencies.

pip install -r requirements.txt

Evaluation

Basic script for evaluation:

PYTHONPATH=${path_to_current_repository} \
python __main__.py \
--model hf \
--model_args pretrained=${model} \
-tasks ${task} \
--device cuda:0 \
--batch_size 4 \
--dtype bfloat16 \
--replace-sdpa \
--quantize \
--cast-dtype float32 \
--bitwidth ${bitwidth} \
--clip-type ${clip_type} \
--calibrate

where:

  • model is one of the llama models, i.e. any version and any size (Example: huggyllama/llama-7b).
  • task is evaluation tasks: boolq, piqa, hellaswag, winogrande, arc_challenge, arc_easy, openbookqa
  • bitwidth is one of the following: 2, 3, 4
  • clip_type is one of the following: NONE, GAUSS

About

EXAQ: Exponent Aware Quantization For LLMs Acceleration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages