Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8bit support #295

Closed
mymusise opened this issue Jun 28, 2023 · 12 comments
Closed

8bit support #295

mymusise opened this issue Jun 28, 2023 · 12 comments

Comments

@mymusise
Copy link

Hi, will vllm support 8bit quantization? Like https://github.com/TimDettmers/bitsandbytes

In HF, we can run a 13B LLM on a 24G GPU with load_in_8bit=True.

Although PageAttention can save 25% of GPU memory, but we have to deploy a 13B LLM on a 26G GPU, at least.

In the cloud, v100-32G is more expensive than A5000-24G 😭

Is there any way to save video memory usage? 😭

@mymusise
Copy link
Author

same with #214

@gururise
Copy link

gururise commented Jul 7, 2023

Would love to see bitsandbytes integration to load models in 8 and 4-bit quantized mode.

@generalsvr
Copy link

Quantization support is crucial. 8 and 4 bit support is a must

@cabbagetalk
Copy link

why I use fastchat-vllm to inference vicuna-13B,It took 75 G of video memory(A800, 80G) @mymusise

@mymusise
Copy link
Author

mymusise commented Jul 20, 2023

why I use fastchat-vllm to inference vicuna-13B,It took 75 G of video memory(A800, 80G) @mymusise

@cabbagetalk you can add gpu_memory_utilization=0.4 to free your memory

@proceduralia
Copy link

This would especially useful for running the new meta-llama/Llama-2-70b-chat-hf models!

@boxter007
Copy link

Hi, will vllm support 8bit quantization? Like https://github.com/TimDettmers/bitsandbytes

In HF, we can run a 13B LLM on a 24G GPU with load_in_8bit=True.

Although PageAttention can save 25% of GPU memory, but we have to deploy a 13B LLM on a 26G GPU, at least.

In the cloud, v100-32G is more expensive than A5000-24G 😭

Is there any way to save video memory usage? sob

在这里看到熟人啊

@yhyu13
Copy link

yhyu13 commented Dec 21, 2023

Does vllm support it now?

@louis-csm
Copy link

Any fix for this issue?

@warvyvr
Copy link

warvyvr commented Jan 26, 2024

Hi guys, do you have plan to support it?

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
@qashzar
Copy link

qashzar commented May 6, 2024

any fix for integrating bitsandbytes?

@hmellor
Copy link
Collaborator

hmellor commented May 20, 2024

No fix, but the feature request is #4033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests