Skip to content

[WIP] [Core][P/D] CPU connector for PD disagg #18332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6a9eee3
[Add] cpu kv sender interfaces
ApostaC May 16, 2025
ad7d852
[WIP] Rewrite the whole implementation for layer-wise pipeline
ApostaC May 19, 2025
b1e003e
[Add] adding tests for cpu kv pd
ApostaC May 19, 2025
ccc8c1d
[Add] adding nixl transfer impl WIP
ApostaC May 20, 2025
050cfe6
[Add] tests for nixl
ApostaC May 20, 2025
e5034b0
Passed the nixl protocol unit tests
ApostaC May 20, 2025
81e31b8
[Add] correct nixl data plane functionality
ApostaC May 20, 2025
a6ffb26
[Add] sender to receiver data plane finished
ApostaC May 21, 2025
9b0c66b
[add] NixlPrefillManager and NixlDecodeManager and test example
ApostaC May 22, 2025
2647ce5
[Add] unit tests for more functionalities
ApostaC May 23, 2025
96cb2b5
[Ckpt] everything is functional
ApostaC May 23, 2025
005d5c1
[Add] remove the hard-coded host and port
ApostaC May 27, 2025
44b36be
[Add] precommit fixes
ApostaC May 27, 2025
2e2937f
[fix] ruff errors
ApostaC May 27, 2025
242098b
[fix] format checker issue for tests
ApostaC May 27, 2025
76e1473
[remove] outdated tests
ApostaC May 28, 2025
01de06a
[remove] hardcodes and fix precommit issues
ApostaC May 28, 2025
104418e
[remove] previous debug codes
ApostaC May 28, 2025
b4994f0
[Add] bug fix for TP and correctly shutdown
ApostaC May 28, 2025
1f01921
[fix] concurrency bug in TP > 1
ApostaC May 29, 2025
af03fd5
[Add] nsys analysis and add potential optimizations
ApostaC May 29, 2025
483ed5a
[Add] small fixes for corner cases
ApostaC May 30, 2025
b27c101
temp fix for pending request ids
ApostaC May 30, 2025
78e6c02
[fix] online problems
ApostaC May 30, 2025
32dc419
[Add] passed the initial benchmark test
ApostaC May 31, 2025
9683b48
Address the review comments
ApostaC Jun 15, 2025
60c43be
Merge branch 'main' into local-dev/cpu-kv
ApostaC Jun 15, 2025
3720cf8
[fix] crash problem
ApostaC Jun 24, 2025
7d40a1d
fix the hang problem
ApostaC Jun 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# SPDX-License-Identifier: Apache-2.0
# Empty init file to mark directory as Python package
51 changes: 51 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/online_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/bash

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

if [[ $# -lt 1 ]]; then
echo "Usage: $0 <prefiller | decoder> [model]"
exit 1
fi

if [[ $# -eq 1 ]]; then
echo "Using default model: meta-llama/Llama-3.1-8B-Instruct"
MODEL="meta-llama/Llama-3.1-8B-Instruct"
else
echo "Using model: $2"
MODEL=$2
fi


if [[ $1 == "prefiller" ]]; then
# Prefiller listens on port 8100
#UCX_TLS=cuda_ipc,cuda_copy,tcp \
VLLM_ENABLE_V1_MULTIPROCESSING=1 \
VLLM_WORKER_MULTIPROC_METHOD=spawn \
CUDA_VISIBLE_DEVICES=0 \
vllm serve $MODEL \
--port 8100 \
--disable-log-requests \
--enforce-eager \
--kv-transfer-config \
'{"kv_connector":"CPUConnector","kv_role":"kv_producer","kv_connector_extra_config": {"host": "localhost", "port": "54321", "size": 40}}'


elif [[ $1 == "decoder" ]]; then
# Decoder listens on port 8200
#UCX_TLS=cuda_ipc,cuda_copy,tcp \
VLLM_ENABLE_V1_MULTIPROCESSING=1 \
VLLM_WORKER_MULTIPROC_METHOD=spawn \
CUDA_VISIBLE_DEVICES=1 \
vllm serve $MODEL \
--port 8200 \
--disable-log-requests \
--enforce-eager \
--kv-transfer-config \
'{"kv_connector":"CPUConnector","kv_role":"kv_consumer","kv_connector_extra_config": {"host": "localhost", "port": "54321", "size": 40}}'


else
echo "Invalid role: $1"
echo "Should be either prefiller, decoder"
exit 1
fi
4 changes: 4 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/output.txt

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/output_decode.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Hi Hi Hi Hi Hello, my name is [Your Name] and I am a [Your
Hi Hi The capital of France is Paris. The capital of France is Paris. The
Hello Hello Hello Your name is not in the list. Please check your email for
ow How The capital of China is Beijing. Beijing is a city in northern China.
24 changes: 24 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/run_nsys.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

if [[ $1 == "decoder" ]]; then
echo "Running decoder"
CUDA_VISIBLE_DEVICES=7 nsys profile \
--trace=cuda,nvtx,osrt \
--gpu-metrics-devices=cuda-visible \
--python-sampling=true \
--trace-fork-before-exec=true \
--output=decoder \
--force-overwrite=true \
python3 toy_decode.py

else
echo "Running prefiller"
CUDA_VISIBLE_DEVICES=6 nsys profile \
--trace=cuda,nvtx,osrt \
--gpu-metrics-devices=cuda-visible \
--python-sampling=true \
--trace-fork-before-exec=true \
--output=prefiller \
--force-overwrite=true \
python3 toy_example.py
fi
7 changes: 7 additions & 0 deletions tests/v1/kv_connector/cpu_kv_integration/temptest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SPDX-License-Identifier: Apache-2.0
from vllm.distributed.kv_transfer.kv_connector.v1.nixl_cpu_utils import (
NixlKVSender)

sender = NixlKVSender(1024 * 1024 * 1024)

sender.close()
Loading