-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding Punica integeration #107
Comments
Hi @psych0v0yager, yes, there are a few ways you can achieve running multiple adapters in a single batch:
We have a very simple example of (3) here, but I'll make a note to add more examples of how to do this using AsyncClient. Hope that answers your question! |
Thanks for the fast reply and multiple solutions! I'll be sure to check out your example for number 3, and I look forward to seeing more documentation on the AsyncClient. Imo the AsyncClient seems like the most convenient for a MoE type situation. |
Awesome, @psych0v0yager to help me understand your use case a little better, for the MoE situation you're describing, are you interested in generating a different sequence for each adapter and then combining them, or mixing multiple adapters for the same request and generating a single sequence? It sounds like the first one (generating a different sequence for each adapter), but wanted to confirm, as both are use cases we want to support. |
@tgaddair thanks for the reply! I was interested in the first one (generating a different sequence for each adapter). Specifically I was imagining running 5 adapters concurrently, each of them generating a different sequence. Once the batch of 5 is done, I want to feed all 5 sequences to a 6th adapter that is finetuned to select the best sequence. |
The acknowledgements of this project mention the SGMV kernels created by the Punica project. Is there a way we can run multiple adapters simultaneously using LoRAX in a similar way shown in the Punica example? Can this be done via the AsyncClient?
The text was updated successfully, but these errors were encountered: