Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: BitsandBytes quantization with TP>1 #8197

Open
1 task done
jvlinsta opened this issue Sep 5, 2024 · 1 comment
Open
1 task done

[Feature]: BitsandBytes quantization with TP>1 #8197

jvlinsta opened this issue Sep 5, 2024 · 1 comment

Comments

@jvlinsta
Copy link

jvlinsta commented Sep 5, 2024

🚀 The feature, motivation and pitch

Any QLoRA adapters trained on large checkpoints (e.g., 70B) are unusable as we cannot use TP>1 to shard the model over multiple GPUs. Therefore, resolving this would enable models that were trained with quantization, rather than having to rely on GPTQ and AWQ, which are applied post-hoc after training.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@copasseron
Copy link

feature added in v0.6.2 #8434.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants