Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ankan-ban committed Sep 14, 2023
1 parent dfb2428 commit c247a35
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ wget https://huggingface.co/abhinavkulkarni/meta-llama-Llama-2-7b-chat-hf-w4-g12
python3 convert_awq_to_bin.py pytorch_model.bin output
./weight_packer config.json output llama2-7b-awq-q4.bin
./llama2_q4 llama2-7b-awq-q4.bin 256 "write an essay about GPUs"
./llama2_q4 llama2-7b-awq-q4.bin -n 256 -i "write an essay about GPUs"
```

And here are the commands for the 13B model:
Expand All @@ -49,7 +49,7 @@ python3 convert_awq_to_bin.py pytorch_model-00003-of-00003.bin output
./weight_packer config.json output llama2-13b-awq-q4.bin
./llama2_q4 llama2-13b-awq-q4.bin 256 "write an essay about GPUs"
./llama2_q4 llama2-13b-awq-q4.bin -n 256 -i "write an essay about GPUs"
```


Expand Down Expand Up @@ -78,7 +78,7 @@ python3 convert_awq_to_bin.py awq_weights/llama2-7b-awq.pt output
We get ~200 tokens per second with RTX 4090 for 7b paramater models:

```
llama2_q4_opt.exe llama2-7b-awq-q4.bin 256 "write an essay about GPUs"
llama2_q4_opt.exe llama2-7b-awq-q4.bin -n 256 -i "write an essay about GPUs"
Model params:-
dim: 4096
Expand Down

0 comments on commit c247a35

Please sign in to comment.