-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ray.llm][Batch] Support LoRA #50804
Conversation
Awesome!! do you want to update the docs/guides in the sam PR for LoRA? |
@@ -183,6 +191,27 @@ def _prepare_llm_request(self, row: Dict[str, Any]) -> vLLMEngineRequest: | |||
else: | |||
image = [] | |||
|
|||
# The request is for LoRA. | |||
lora_request = None | |||
if "model" in row and row["model"] != self.model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a tiny comment here that says model is lora if and only if the model id present in the row is different than the one in engine_kwargs.
We need to also document this in our docs that for LoRA this is how the preprocessor should be constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should document it somewhere later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, especially if it's going to be a local path
with self.lora_lock: | ||
# Make sure no other thread has loaded the same LoRA adapter. | ||
if lora_name not in self.lora_name_to_request: | ||
lora_request = vllm.lora.request.LoRARequest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big question. Who actually has downloaded the lora adaptor weights? Are those stored in s3? in local? That part I don't understand with the lora support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for base model, vllm engine does the downloading which comes from huggingface usually. What about lora here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can download LoRA from HF as well. For example: https://huggingface.co/EdBergJr/Llama32_Baha_3. If the LoRA weights are not available in HF, users have to download it first and specify a local path (in a shared file system). Later we could support loading LoRA weights from S3 directly in vLLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document these in the follow up. I think there is so many gotchas here. I know because we have solved all of these issues with the online serving and it's non-trivial. Especially around loading stuff from s3.
In online we have to have a lora-aware router (which we already do through serve multiplex router). Here in ray data, I don't know how that looks like. Maybe we should do a thread safe lora download per node here. or something similar. It needs a bit more thinking (non-blocking to this PR obv)
@gvspraveen @richardliaw the example is up. Please review and comment: https://anyscale-ray--50804.com.readthedocs.build/en/50804/llm/examples/batch/vllm-with-lora.html |
Looks great, can you also link the example back into this section: https://anyscale-ray--50804.com.readthedocs.build/en/50804/data/working-with-llms.html#configure-vllm-for-llm-inference |
@kouroshHakha @GeneDer could any of you stamp? |
Why are these changes needed?
Example usage:
cc @gvspraveen @kouroshHakha
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.