Create TorchaxMergedColumnParallelLinearWithLoRA lora wrapper for single chip #496

vanbasten23 · 2025-08-18T21:11:19Z

Description

This PR creates a TorchaxMergedColumnParallelLinearWithLoRA lora wrapper as we discussed in the design. This lora wrapper resembles MergedColumnParallelLinearWithLoRA in vLLM

Tests

MODEL_IMPL_TYPE=vllm TPU_BACKEND_TYPE=jax pytest -rs -vv tests/lora/test_layers.py -k test_column_parallel_packed

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

github-actions · 2025-08-18T21:11:31Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

vanbasten23 · 2025-08-20T05:16:16Z

@hfan @lsy323 could you please take a look when you get a chance? Thanks!

vanbasten23 · 2025-08-20T16:57:22Z

also cc @kyuyeunk . This is the PR that I'm working on. I think your refactoring main...support_fp8_quant should be able to play well with my PR.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

kyuyeunk · 2025-08-21T08:22:15Z

also cc @kyuyeunk . This is the PR that I'm working on. I think your refactoring main...support_fp8_quant should be able to play well with my PR.

Thanks @vanbasten23! This is the proper PR for the refactoring work: #512. As mentioned in the description, my PR removes all the custom torchax layers (like JaxMergedColumnParallelLinear) in favor of utilizing pre-existing vLLM APIs. Will this PR still work despite the refactoring?

vanbasten23 · 2025-08-21T16:35:43Z

Thanks @vanbasten23! This is the proper PR for the refactoring work: #512. As mentioned in the description, my PR removes all the custom torchax layers (like JaxMergedColumnParallelLinear) in favor of utilizing pre-existing vLLM APIs. Will this PR still work despite the refactoring?

I think so. All this PR needs is that this line should work: aka this forward pass self.base_layer(x). In the current PR, self.base_layer is JaxMergedColumnParallelLinear. After your pr is in, self.base_layer becomes MergedColumnParallelLinear defined in vLLM. Behavior-wise, both JaxMergedColumnParallelLinear and MergedColumnParallelLinear should be identical.

tpu_commons/lora/layers.py

tpu_commons/lora/torch_lora_ops.py

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

lsy323 · 2025-08-21T20:41:20Z

tpu_commons/platforms/tpu_jax.py

    @classmethod
    def get_punica_wrapper(cls) -> str:
-        return "vllm.lora.punica_wrapper.punica_tpu.PunicaWrapperTPU"
+        return "tpu_commons.lora.torch_punica_tpu.PunicaWrapperTPU"


Maybe make it reusable for jax as well

maybe check if MODEL_IMPL_TYPE=vllm here

Great callout. If we add lora to JAX, then we'll check MODEL_IMPL_TYPE=vllm here.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

hfan · 2025-08-25T19:47:06Z

tests/lora/test_layers.py

+        device=device)
+    lora_mapping = LoRAMapping(index_mapping, prompt_mapping, is_prefill=stage)
+
+    with torchax.default_env(), jax.default_device(jax.devices("tpu")[0]):


do you need to set jax.default_device here?

hfan · 2025-08-25T19:47:12Z

tests/lora/test_layers.py

+        punica_wrapper.move_to_device(mesh)
+
+    jax_inputs = []
+    with torchax.default_env(), jax.default_device(jax.devices("tpu")[0]):


hfan · 2025-08-25T19:51:19Z

tests/lora/utils.py

+
+
+# https://github.com/vllm-project/vllm/blob/279a5f31b3faa6f40759516efa5c742f637ab8b7/tests/lora/utils.py
+class DummyLoRAManager:


Is there any difference from the one in the vLLM main repo? I wonder if it is possible to just to use the on in vLLM

hfan · 2025-08-25T20:20:01Z

tpu_commons/models/vllm/sharding.py

+        vllm_config.model_config.model,
+        vllm_config.scheduler_config.max_num_batched_tokens,
+        vllm_config.parallel_config.tensor_parallel_size,
+        "MergedColumnParallelLinear")


why does it need to get fuse_matmuls here, instead of just using the pre-calculated value of fuse_matmuls as shard_merged_column_parallel_linear.fuse_matmuls?

vanbasten23 added 4 commits August 15, 2025 21:43

add the first test.

a3c6142

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

now running the test fails at shard_parallel_layers_to_tpu

18d6acc

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

not make fuse_matmuls a attr of a function object

4f99d0b

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

now, the ut fails at lora_linear(torch.cat(inputs))[0]

ea93d13

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

vanbasten23 added 2 commits August 19, 2025 04:59

the test finished and failed the shape check at last

c476112

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

test passed.

13d2851

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

QiliangCui force-pushed the main branch 2 times, most recently from 39f220e to 0b0c7d6 Compare August 19, 2025 06:59

vanbasten23 added 9 commits August 19, 2025 18:34

add some comments and playground test code, all edits are in the test.

4301fee

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix a bug. add more tests. All test passed

896e526

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

cleaned up

1703cee

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

add comments

3c7100c

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

Set and test lora.bias

0cb09b8

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix a bug. All tests passed

6242c81

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

simplified the code. all test pass

7402e69

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

rename comment

641a9d4

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

Merge branch 'main' into xiowei/write_ColumnParallelLinearWithLoRA

ceeff56

vanbasten23 requested review from hfan and lsy323 and removed request for hfan August 20, 2025 05:15

vanbasten23 marked this pull request as ready for review August 20, 2025 05:15

vanbasten23 changed the title ~~[do not review yet] Create column parallel linear with lora wrapper~~ Create TorchaxMergedColumnParallelLinearWithLoRA lora wrapper for single chip Aug 20, 2025

vanbasten23 added 3 commits August 20, 2025 17:45

just merged with main, ut passed.

b5fd628

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

remove unused code

c359dc3

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

add test for lora bgmv op

0dad381

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

vanbasten23 force-pushed the xiowei/write_ColumnParallelLinearWithLoRA branch from a1a77dd to 0dad381 Compare August 20, 2025 18:40

lsy323 reviewed Aug 21, 2025

View reviewed changes

tpu_commons/lora/layers.py Outdated Show resolved Hide resolved

lsy323 reviewed Aug 21, 2025

View reviewed changes

tpu_commons/lora/torch_lora_ops.py Outdated Show resolved Hide resolved

merge with main

59c8aa0

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

lsy323 reviewed Aug 21, 2025

View reviewed changes

Not use create_torchax_tensor_with_partition_spec. All test pass.

eee32a3

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

hfan reviewed Aug 25, 2025

View reviewed changes



		# https://github.com/vllm-project/vllm/blob/279a5f31b3faa6f40759516efa5c742f637ab8b7/tests/lora/utils.py
		class DummyLoRAManager:

Create TorchaxMergedColumnParallelLinearWithLoRA lora wrapper for single chip #496

Are you sure you want to change the base?

Create TorchaxMergedColumnParallelLinearWithLoRA lora wrapper for single chip #496

Uh oh!

Conversation

vanbasten23 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

github-actions bot commented Aug 18, 2025

Description

Tests

Checklist

Uh oh!

vanbasten23 commented Aug 20, 2025

Uh oh!

vanbasten23 commented Aug 20, 2025

Uh oh!

kyuyeunk commented Aug 21, 2025

Uh oh!

vanbasten23 commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

lsy323 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

lsy323 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

hfan Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

hfan Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

hfan Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

hfan Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vanbasten23 commented Aug 18, 2025 •

edited

Loading