Running two different models on one GPU #1046
Unanswered
daytonturner
asked this question in
Q&A
Replies: 1 comment 1 reply
-
This is the correct way to go about it. Use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have an A6000 48GB, and I'd like to be able to serve a quantized Llama-2 and WizardCoder, both of which can easily fit inside the 48GB available, but I'm unsure the best way to go about this - or if its a bad idea for some reason?
Initially, I thought simply running two TGI instances, each pointing to the respective model would be a reasonable approach, but I'm wondering if my assumptions are correct? Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions