New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

4bit量化后的模型为什么这么大 #31

Open

liuyukid opened this issue Jul 27, 2023 · 2 comments

liuyukid commented Jul 27, 2023 •

edited

Loading

chatglm2-6b-int4的模型大概在4G左右，我看LinkSoul/Chinese-Llama-2-7b-4bit的模型大小在13G左右

CRGBS commented Jul 27, 2023

量化方法不同可以考慮用llamacpp能運行的ggml版本

wutengcoding commented Aug 10, 2023

请问本地加载的时候如何分布式加载呢，有多个gpu会自动做分片加载吗，单gpu装不下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment