-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server #10546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hey @khluu , this is my first time opening PR to vLLM. Do I need to pass all the test in BuildKite for the code to be reviewed and merged? I also send you email with subject Thanks for the help! |
IIUC, although this PR is related to LoRA loading, it seems you haven't touched the underlying LORA logic. What you might need is to add unit tests similar to #6566. |
Oh if it's just for merging, you can just wait until your PR is reviewed and approved, then your PR reviewer can run CI for you. Fastcheck is more like a subset of CI, running short tests and give you the flexibility to run extra specific test if needed (which you need to be in Buildkite org to do). |
@youkaichao do you think we can make this a plugin? |
This pull request has merge conflicts that must be resolved before it can be |
I would love to see this merged in! |
For LoRA specific code, @jeejeelee has more context than me to review the code. From the integration perspective, I think it's okay to create a lora resolver registry, and expose a function to register new lora resolvers. We don't need a new plugin category. General plugin should be enough to call the function to register new lora resolvers. One technical detail is, we should collect all registered resolvers, and try them one by one to see which resolver succeeds. We don't need to let users select the resolver. |
Thanks for the feedback @youkaichao From what I understand so far, there are three main areas I'd need to modify for LoRA plugin support:
Does this sounds like a reasonable approach @youkaichao @jeejeelee? Thanks in advance! |
@joerunde WDYT |
I’ve made the concrete change here: #15733 |
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
7260dd4
to
397ed64
Compare
Updated the code with:
|
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
@joerunde I’ve added the test for LoRA loading flow via LoRAResolver, along with the corresponding documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for the plugin support, @jeejeelee @youkaichao what do y'all think about getting this merged?
…test Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Can you add an example in the documentation for such plugin? so we have some examples people can leverage and knowing how to use it. |
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
@simon-mo I’ve added an example of the |
also if you want to see what a resolver implementation would look like, i have my PR converted to a plugin up here: https://github.com/jberkhahn/filesystem_resolver |
…m-project#10546) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
…m-project#10546) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Signed-off-by: Yang Wang <elainewy@meta.com>
…m-project#10546) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
…m-project#10546) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
…m-project#10546) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
@angkywilliam hey! I've been trying to use LoRA resolver on local file system with no luck. I've detailed my attempt here: #18630. Any help is welcome :) After it works locally, I want to try loading from s3. |
In the current implementation, vLLM only supports loading LoRA from local storage.
At OpenPipe, we are extending the serving engine's capabilities by introducing a LoRAResolver. LoRAResolver enables vLLM users to implement custom logic for fetching LoRA files from remote servers.
For example, in OpenPipe's case, we are dynamically loading LoRA for our customer from S3.