Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hardware][TPU] Implement tensor parallelism with Ray #5871

Merged
merged 72 commits into from
Jul 27, 2024
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
76fc072
Add & warnings
WoosukKwon Jun 24, 2024
27a5ad8
Add in dummy_run
WoosukKwon Jun 24, 2024
5ab6f65
Add is_driver_worker
WoosukKwon Jun 24, 2024
c4e79a0
Make TPUExecutor similar to GPUExecutor
WoosukKwon Jun 24, 2024
ff81993
Add multiprocessing-based TPU executor
WoosukKwon Jun 24, 2024
16e80b2
Use TPU to initialize Ray cluster
WoosukKwon Jun 24, 2024
05884ce
Add pjrt proc init
WoosukKwon Jun 24, 2024
20d23eb
Add Ray TPU executor
WoosukKwon Jun 24, 2024
5d4df21
Use Ray TPU executor for tp
WoosukKwon Jun 24, 2024
6b2c76c
Minor
WoosukKwon Jun 24, 2024
d91446b
Fix TPUWorker.execute_model
WoosukKwon Jun 24, 2024
ab1595d
Add is_driver_worker & input broadcast
WoosukKwon Jun 24, 2024
4b45393
Call xm._init_world_size_ordinal
WoosukKwon Jun 24, 2024
86451a2
Bug fix on vocab
WoosukKwon Jun 24, 2024
0539299
Use all gather for TPU
WoosukKwon Jun 24, 2024
b35917c
Support TPU in GroupCoordinator
WoosukKwon Jun 24, 2024
b9a84bc
Delete multiproc TPU executor
WoosukKwon Jun 25, 2024
c756b76
Minor
WoosukKwon Jun 25, 2024
16e9934
[Bugfix][TPU] Fix CPU cache allocation & swapping
WoosukKwon Jun 26, 2024
e25f470
Merge branch 'fix-tpu-swpa' into tpu-n
WoosukKwon Jun 26, 2024
ca6d1d6
yapf
WoosukKwon Jun 26, 2024
cd4f68d
Add Ray to TPU dependency
WoosukKwon Jun 26, 2024
5df4164
Merge branch 'main' into tpu-n
WoosukKwon Jun 26, 2024
546987a
Fix
WoosukKwon Jun 26, 2024
330be6e
Fix
WoosukKwon Jun 26, 2024
b45ed24
Merge branch 'main' into tpu-n
WoosukKwon Jun 29, 2024
8fab9fd
Add use_all_gather to LoRA
WoosukKwon Jun 29, 2024
c4cbe9f
Fix
WoosukKwon Jun 29, 2024
2871c7c
Merge branch 'main' into tpu-n
WoosukKwon Jun 30, 2024
db7adc7
Add an assert for dim == -1
WoosukKwon Jun 30, 2024
696790d
is_tpu -> use_xla
WoosukKwon Jun 30, 2024
8a08896
Merge branch 'main' into tpu-n
WoosukKwon Jun 30, 2024
36f9070
Merge branch 'main' into tpu-n
WoosukKwon Jul 1, 2024
28afe56
yapf
WoosukKwon Jul 2, 2024
60bf64d
Add hack in vocab
WoosukKwon Jul 2, 2024
0fbb050
Merge branch 'main' into tpu-n
WoosukKwon Jul 7, 2024
ddf4cbe
Merge branch 'main' into tpu-n
WoosukKwon Jul 7, 2024
cd4842d
Fix multi-modal support
WoosukKwon Jul 9, 2024
54e637b
Merge branch 'main' into tpu-n
WoosukKwon Jul 9, 2024
73ed611
Merge branch 'main' into tpu-n
WoosukKwon Jul 10, 2024
717b3fa
Merge branch 'main' into tpu-n
WoosukKwon Jul 15, 2024
6b0c35d
Merge branch 'main' into tpu-n
WoosukKwon Jul 17, 2024
7f583ba
Merge branch 'main' into tpu-n
WoosukKwon Jul 18, 2024
106864d
Remove unused
WoosukKwon Jul 18, 2024
223661f
Minor
WoosukKwon Jul 18, 2024
5bd67bc
Merge branch 'main' into tpu-n
WoosukKwon Jul 21, 2024
ab7cccf
Fix comm error
WoosukKwon Jul 21, 2024
4e0c90a
Use custom inference_mode
WoosukKwon Jul 21, 2024
a2358ed
Remove hack in vocab embedding
WoosukKwon Jul 21, 2024
ac21351
Use patch
WoosukKwon Jul 21, 2024
ba76d9e
Update inference_mode
WoosukKwon Jul 21, 2024
452c321
use_all_gather -> use_gather
WoosukKwon Jul 21, 2024
dcb63b7
Fix patch
WoosukKwon Jul 21, 2024
825cc44
Fix typo
WoosukKwon Jul 21, 2024
f27ef99
Merge branch 'main' into tpu-n
WoosukKwon Jul 22, 2024
9730288
Remove inference_mode
WoosukKwon Jul 22, 2024
631b08b
Add no_grad
WoosukKwon Jul 23, 2024
d65a7d0
Merge branch 'main' into tpu-n
WoosukKwon Jul 23, 2024
755fe0b
Merge branch 'main' into tpu-n
WoosukKwon Jul 24, 2024
d5fadfd
Merge branch 'main' into tpu-n
WoosukKwon Jul 26, 2024
af3a259
[TPU] Support collective communications in XLA devices
WoosukKwon Jul 26, 2024
0f2abea
Use current_platform
WoosukKwon Jul 26, 2024
8ebea7e
is_xla -> is_tpu
WoosukKwon Jul 26, 2024
782b182
Define TPU communicator
WoosukKwon Jul 26, 2024
76fd300
Merge branch 'main' into tpu-n
WoosukKwon Jul 26, 2024
75f842b
Merge branch 'add-xla-comm' into tpu-n
WoosukKwon Jul 26, 2024
8087227
Fix
WoosukKwon Jul 26, 2024
f04e179
Address comments
WoosukKwon Jul 26, 2024
f493c89
Device init
WoosukKwon Jul 26, 2024
f14b085
Fix patch
WoosukKwon Jul 26, 2024
1668582
Merge branch 'add-xla-comm' into tpu-n
WoosukKwon Jul 26, 2024
a05cf0f
Merge branch 'main' into tpu-n
WoosukKwon Jul 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix
  • Loading branch information
WoosukKwon committed Jun 26, 2024
commit 330be6e43d82abdd42797c12f58e532fc963a0fd
1 change: 1 addition & 0 deletions vllm/attention/backends/pallas.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ def get_kv_cache_shape(
) -> Tuple[int, ...]:
return (num_kv_heads, num_blocks, block_size, head_size)

@torch.compile(backend="openxla")
@staticmethod
def swap_blocks(
src_kv_cache: Tuple[torch.Tensor, torch.Tensor],
Expand Down
Loading