[ray.llm][Batch] Support LoRA #50804

comaniac · 2025-02-21T20:13:58Z

Why are these changes needed?

Support using LoRA in vLLM v0.
Users have to enable LoRA mode in the vLLM engine config, and specify the LoRA adapter in the input request. The adapter will be dynamically loaded to the vLLM engine.
The LoRA adapter has to be available in HF hub, or it has to be downloaded to the local file system in advance.

Example usage:

import ray
from ray.data.llm import build_llm_processor, vLLMEngineProcessorConfig

processor_config = vLLMEngineProcessorConfig(
    model="meta-llama/Llama-3.2-1B-Instruct",
    engine_kwargs=dict(
        # Enable LoRA in the vLLM engine.
        enable_lora=True,
        max_lora_rank=32,
    ),
    batch_size=16,
    concurrency=1,
)

processor = build_llm_processor(
    processor_config,
    preprocess=lambda row: dict(
        # Specify LoRA adapter path in the request.
        model="EdBergJr/Llama32_Baha_3",
        messages=[
            {"role": "system", "content": "You are a calculator"},
            {"role": "user", "content": f"{row['id']} ** 3 = ?"},
        ],
        sampling_params=dict(
            temperature=0.3,
            max_tokens=50,
            detokenize=False,
        ),
    ),
    postprocess=lambda row: {
        "resp": row["generated_text"],
    },
)

ds = ray.data.range(60)
ds = ds.map(
    lambda x: {"id": x["id"], "val": x["id"] + 5}
)
ds = processor(ds)
ds = ds.materialize()
for out in ds.take_all():
    print(out)
    print("==========")

cc @gvspraveen @kouroshHakha

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py

gvspraveen · 2025-02-21T20:25:27Z

Awesome!!

do you want to update the docs/guides in the sam PR for LoRA?

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

kouroshHakha · 2025-02-21T20:47:03Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

@@ -183,6 +191,27 @@ def _prepare_llm_request(self, row: Dict[str, Any]) -> vLLMEngineRequest:
        else:
            image = []

+        # The request is for LoRA.
+        lora_request = None
+        if "model" in row and row["model"] != self.model:


add a tiny comment here that says model is lora if and only if the model id present in the row is different than the one in engine_kwargs.

We need to also document this in our docs that for LoRA this is how the preprocessor should be constructed.

Yes we should document it somewhere later.

yes, especially if it's going to be a local path

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

kouroshHakha · 2025-02-21T20:49:03Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+                with self.lora_lock:
+                    # Make sure no other thread has loaded the same LoRA adapter.
+                    if lora_name not in self.lora_name_to_request:
+                        lora_request = vllm.lora.request.LoRARequest(


big question. Who actually has downloaded the lora adaptor weights? Are those stored in s3? in local? That part I don't understand with the lora support.

for base model, vllm engine does the downloading which comes from huggingface usually. What about lora here?

You can download LoRA from HF as well. For example: https://huggingface.co/EdBergJr/Llama32_Baha_3. If the LoRA weights are not available in HF, users have to download it first and specify a local path (in a shared file system). Later we could support loading LoRA weights from S3 directly in vLLM.

We should document these in the follow up. I think there is so many gotchas here. I know because we have solved all of these issues with the online serving and it's non-trivial. Especially around loading stuff from s3.

In online we have to have a lora-aware router (which we already do through serve multiplex router). Here in ray data, I don't know how that looks like. Maybe we should do a thread safe lora download per node here. or something similar. It needs a bit more thinking (non-blocking to this PR obv)

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

doc/source/data/examples.yml

comaniac · 2025-02-24T20:11:58Z

@gvspraveen @richardliaw the example is up. Please review and comment: https://anyscale-ray--50804.com.readthedocs.build/en/50804/llm/examples/batch/vllm-with-lora.html

doc/source/llm/examples/batch/vllm-with-lora.ipynb

richardliaw · 2025-02-24T21:08:30Z

Looks great, can you also link the example back into this section: https://anyscale-ray--50804.com.readthedocs.build/en/50804/data/working-with-llms.html#configure-vllm-for-llm-inference

pcmoritz · 2025-02-24T21:10:42Z

Also vice versa link https://anyscale-ray--50804.com.readthedocs.build/en/50804/data/working-with-llms.html#working-with-llms from the example :)

…

On Mon, Feb 24, 2025 at 1:08 PM Richard Liaw ***@***.***> wrote: Looks great, can you also link the example back into this section: https://anyscale-ray--50804.com.readthedocs.build/en/50804/data/working-with-llms.html#configure-vllm-for-llm-inference — Reply to this email directly, view it on GitHub <#50804 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA3VJFN6MIXCBQAIZSZPXT2ROC6RAVCNFSM6AAAAABXUAIZNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGY2TENJUHE> . You are receiving this because your review was requested.Message ID: ***@***.***> [image: richardliaw]*richardliaw* left a comment (ray-project/ray#50804) <#50804 (comment)> Looks great, can you also link the example back into this section: https://anyscale-ray--50804.com.readthedocs.build/en/50804/data/working-with-llms.html#configure-vllm-for-llm-inference — Reply to this email directly, view it on GitHub <#50804 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA3VJFN6MIXCBQAIZSZPXT2ROC6RAVCNFSM6AAAAABXUAIZNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZGY2TENJUHE> . You are receiving this because your review was requested.Message ID: ***@***.***>

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

comaniac · 2025-02-24T22:37:26Z

@kouroshHakha @GeneDer could any of you stamp?

loar

046efa1

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

comaniac requested a review from a team as a code owner February 21, 2025 20:13

comaniac added the go add ONLY when ready to merge, run all tests label Feb 21, 2025

gvspraveen reviewed Feb 21, 2025

View reviewed changes

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated Show resolved Hide resolved

gvspraveen reviewed Feb 21, 2025

View reviewed changes

python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py Show resolved Hide resolved

kouroshHakha reviewed Feb 21, 2025

View reviewed changes

comaniac added 3 commits February 21, 2025 13:36

fix

9d077ee

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

comment

3f6873c

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

fix

645319d

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

kouroshHakha reviewed Feb 22, 2025

View reviewed changes

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated Show resolved Hide resolved

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Show resolved Hide resolved

comment

29145db

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

comaniac requested review from a team as code owners February 22, 2025 01:28

comaniac added 7 commits February 21, 2025 17:29

clean

e0e2386

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

fix

ee9e0be

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

fix

c934b38

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

fix

bf17641

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

fix

a32f752

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

exclude

31497cd

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

orphan

0a73650

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

pcmoritz reviewed Feb 24, 2025

View reviewed changes

doc/source/data/examples.yml Outdated Show resolved Hide resolved

richardliaw reviewed Feb 24, 2025

View reviewed changes

doc/source/llm/examples/batch/vllm-with-lora.ipynb Outdated Show resolved Hide resolved

comments

1f6c389

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

richardliaw approved these changes Feb 24, 2025

View reviewed changes

richardliaw enabled auto-merge (squash) February 24, 2025 22:14

kouroshHakha approved these changes Feb 24, 2025

View reviewed changes

richardliaw merged commit e49a260 into ray-project:master Feb 24, 2025
6 checks passed

comaniac deleted the lora branch February 24, 2025 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ray.llm][Batch] Support LoRA #50804

[ray.llm][Batch] Support LoRA #50804

comaniac commented Feb 21, 2025

gvspraveen commented Feb 21, 2025

kouroshHakha Feb 21, 2025

comaniac Feb 21, 2025

PawaritL Feb 22, 2025

kouroshHakha Feb 21, 2025

kouroshHakha Feb 21, 2025

comaniac Feb 21, 2025

kouroshHakha Feb 22, 2025

comaniac commented Feb 24, 2025

richardliaw commented Feb 24, 2025

pcmoritz commented Feb 24, 2025 via email

comaniac commented Feb 24, 2025

[ray.llm][Batch] Support LoRA #50804

[ray.llm][Batch] Support LoRA #50804

Conversation

comaniac commented Feb 21, 2025

Why are these changes needed?

Checks

gvspraveen commented Feb 21, 2025

kouroshHakha Feb 21, 2025

Choose a reason for hiding this comment

comaniac Feb 21, 2025

Choose a reason for hiding this comment

PawaritL Feb 22, 2025

Choose a reason for hiding this comment

kouroshHakha Feb 21, 2025

Choose a reason for hiding this comment

kouroshHakha Feb 21, 2025

Choose a reason for hiding this comment

comaniac Feb 21, 2025

Choose a reason for hiding this comment

kouroshHakha Feb 22, 2025

Choose a reason for hiding this comment

comaniac commented Feb 24, 2025

richardliaw commented Feb 24, 2025

pcmoritz commented Feb 24, 2025 via email

comaniac commented Feb 24, 2025