Skip to content

Sync Upstream vllm repository #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 99 commits into
base: torchao
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
a3691b6
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (…
alex-jw-brooks Oct 8, 2024
069d3bd
[Frontend] Add Early Validation For Chat Template / Tool Call Parser …
alex-jw-brooks Oct 8, 2024
cfba685
[CI/Build] Add examples folder into Docker image so that we can lever…
panpan0000 Oct 8, 2024
9a94ca4
[Bugfix] fix OpenAI API server startup with --disable-frontend-multip…
dtrifiro Oct 8, 2024
1874c6a
[Doc] Update vlm.rst to include an example on videos (#9155)
sayakpaul Oct 8, 2024
de24046
[Doc] Improve contributing and installation documentation (#9132)
rafvasq Oct 8, 2024
bd37b9f
[Bugfix] Try to handle older versions of pytorch (#9086)
bnellnm Oct 8, 2024
2a13196
mypy: check additional directories (#9162)
russellb Oct 8, 2024
9ba0bd6
Add `lm-eval` directly to requirements-test.txt (#9161)
mgoin Oct 9, 2024
2f4117c
support bitsandbytes quantization with more models (#9148)
chenqianfzh Oct 9, 2024
ffc4b27
Add classifiers in setup.py (#9171)
terrytangyuan Oct 9, 2024
acce763
Update link to KServe deployment guide (#9173)
terrytangyuan Oct 9, 2024
480b7f4
[Misc] Improve validation errors around best_of and n (#9167)
tjohnson31415 Oct 9, 2024
7627172
[Bugfix][Doc] Report neuron error in output (#9159)
joerowell Oct 9, 2024
cdc72e3
[Model] Remap FP8 kv_scale in CommandR and DBRX (#9174)
hliuca Oct 9, 2024
0b5b5d7
[Frontend] Log the maximum supported concurrency (#8831)
AlpinDale Oct 9, 2024
8bfaa4e
[Bugfix] fix composite weight loading and EAGLE weight loading (#9160)
DarkLight1337 Oct 9, 2024
c8627cd
[ci][test] use load dummy for testing (#9165)
youkaichao Oct 9, 2024
dc4aea6
[Doc] Fix VLM prompt placeholder sample bug (#9170)
ycool Oct 9, 2024
21906a6
[Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179)
fahadh4ilyas Oct 9, 2024
cfaa600
[Bugfix] Access `get_vocab` instead of `vocab` in tool parsers (#9188)
DarkLight1337 Oct 9, 2024
7dea289
Add Dependabot configuration for GitHub Actions updates (#1217)
EwoutH Oct 9, 2024
ca77dd7
[Hardware][CPU] Support AWQ for CPU backend (#7515)
bigPYJ1151 Oct 9, 2024
cdca899
[CI/Build] mypy: check vllm/entrypoints (#9194)
russellb Oct 9, 2024
d5fbb87
[CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04 (#9…
mgoin Oct 9, 2024
cf25b93
[Core] Fix invalid args to _process_request (#9201)
russellb Oct 10, 2024
de895f1
[misc] improve model support check in another process (#9208)
youkaichao Oct 10, 2024
ce00231
[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213)
mgoin Oct 10, 2024
a64e7b9
[Bugfix] Machete garbage results for some models (large K dim) (#9212)
LucasWilkinson Oct 10, 2024
f3a507f
[Core] Add an environment variable which needs to be set explicitly t…
sroy745 Oct 10, 2024
07c11cf
[Bugfix] Fix lm_head weights tying with lora for llama (#9227)
Isotr0py Oct 10, 2024
04de905
[Model] support input image embedding for minicpmv (#9237)
whyiug Oct 10, 2024
83ea5c7
[OpenVINO] Use torch 2.4.0 and newer optimim version (#9121)
ilya-lavrenov Oct 10, 2024
18511ae
[Bugfix] Fix Machete unittests failing with `NotImplementedError` (#9…
LucasWilkinson Oct 10, 2024
055f327
[Doc] Improve debugging documentation (#9204)
rafvasq Oct 10, 2024
21efb60
[CI/Build] Make the `Dockerfile.cpu` file's `PIP_EXTRA_INDEX_URL` Co…
jyono Oct 10, 2024
78c0b41
Suggest codeowners for the core componenets (#9210)
simon-mo Oct 10, 2024
e4d652e
[torch.compile] integration with compilation control (#9058)
youkaichao Oct 10, 2024
9cc811c
Bump actions/github-script from 6 to 7 (#9197)
dependabot[bot] Oct 10, 2024
270953b
Bump actions/checkout from 3 to 4 (#9196)
dependabot[bot] Oct 10, 2024
fb870fd
Bump actions/setup-python from 3 to 5 (#9195)
dependabot[bot] Oct 10, 2024
a78c6ba
[ci/build] Add placeholder command for custom models test (#9262)
khluu Oct 10, 2024
e00c094
[torch.compile] generic decorators (#9258)
youkaichao Oct 10, 2024
f990bab
[Doc][Neuron] add note to neuron documentation about resolving triton…
omrishiv Oct 10, 2024
94bf9ae
[Misc] Fix sampling from sonnet for long context case (#9235)
Imss27 Oct 11, 2024
cbc2ef5
[misc] hide best_of from engine (#9261)
youkaichao Oct 11, 2024
e808156
[Misc] Collect model support info in a single process per model (#9233)
DarkLight1337 Oct 11, 2024
36ea790
[Misc][LoRA] Support loading LoRA weights for target_modules in reg f…
jeejeelee Oct 11, 2024
df3dcdf
[Bugfix] Fix priority in multiprocessing engine (#9277)
schoennenbeck Oct 11, 2024
7342a7d
[Model] Support Mamba (#6484)
tlrmchlsmth Oct 11, 2024
f710090
[Kernel] adding fused moe kernel config for L40S TP4 (#9245)
bringlein Oct 11, 2024
6cf1167
[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242)
sixsixcoder Oct 11, 2024
1a18238
[Doc] Remove outdated comment to avoid misunderstanding (#9287)
homeffjy Oct 11, 2024
8baf85e
[Doc] Compatibility matrix for mutual exclusive features (#8512)
wallashss Oct 11, 2024
de9fb4b
[Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being …
LucasWilkinson Oct 11, 2024
c6cf929
[Bugfix] Sets `is_first_step_output` for TPUModelRunner (#9202)
Oct 11, 2024
d11b46f
[bugfix] fix f-string for error (#9295)
prashantgupta24 Oct 12, 2024
ec10cb8
[BugFix] Fix tool call finish reason in streaming case (#9209)
maxdebayser Oct 12, 2024
89feb4c
[SpecDec] Remove Batch Expansion (2/3) (#9298)
LiuXiaoxuanPKU Oct 12, 2024
00298e0
[Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026)
xiangxu-google Oct 12, 2024
2b184dd
[Misc][Installation] Improve source installation script and doc (#9309)
cermeng Oct 12, 2024
250e26a
[Bugfix]Fix MiniCPM's LoRA bug (#9286)
jeejeelee Oct 12, 2024
f519902
[CI] Fix merge conflict (#9317)
LiuXiaoxuanPKU Oct 13, 2024
16b24e7
[Bugfix] Bandaid fix for speculative decoding tests (#9327)
tlrmchlsmth Oct 13, 2024
dfe43a2
[Model] Molmo vLLM Integration (#9016)
mrsalehi Oct 14, 2024
4141608
[Hardware][intel GPU] add async output process for xpu (#8897)
jikunshang Oct 14, 2024
203ab8f
[CI/Build] setuptools-scm fixes (#8900)
dtrifiro Oct 14, 2024
fd47e57
[Docs] Remove PDF build from Readtehdocs (#9347)
simon-mo Oct 14, 2024
473e7b3
[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350)
WoosukKwon Oct 14, 2024
4d31cd4
[Frontend] merge beam search implementations (#9296)
LunrEclipse Oct 14, 2024
f0fe4fe
[Model] Make llama3.2 support multiple and interleaved images (#9095)
xiangxu-google Oct 14, 2024
169b530
[Bugfix] Clean up some cruft in mamba.py (#9343)
tlrmchlsmth Oct 15, 2024
44eaa5a
[Frontend] Clarify model_type error messages (#9345)
stevegrubb Oct 15, 2024
8e836d9
[Doc] Fix code formatting in spec_decode.rst (#9348)
mgoin Oct 15, 2024
55e081f
[Bugfix] Update InternVL input mapper to support image embeds (#9351)
hhzhang16 Oct 15, 2024
e9d517f
[BugFix] Fix chat API continuous usage stats (#9357)
njhill Oct 15, 2024
5d264f4
pass ignore_eos parameter to all benchmark_serving calls (#9349)
gracehonv Oct 15, 2024
22f8a69
[Misc] Directly use compressed-tensors for checkpoint definitions (#8…
mgoin Oct 15, 2024
ba30942
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with emp…
CatherineSue Oct 15, 2024
717a5f8
[Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386)
LucasWilkinson Oct 16, 2024
ed92013
[Bugfix] Molmo text-only input bug fix (#9397)
mrsalehi Oct 16, 2024
7e7eae3
[Misc] Standardize RoPE handling for Qwen2-VL (#9250)
DarkLight1337 Oct 16, 2024
7abba39
[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303)
DarkLight1337 Oct 16, 2024
1de76a0
[CI/Build] Test VLM embeddings (#9406)
DarkLight1337 Oct 16, 2024
cee711f
[Core] Rename input data types (#8688)
DarkLight1337 Oct 16, 2024
59230ef
[Misc] Consolidate example usage of OpenAI client for multimodal mode…
ywang96 Oct 16, 2024
cf1d62a
[Model] Support SDPA attention for Molmo vision backbone (#9410)
Isotr0py Oct 16, 2024
415f76a
Support mistral interleaved attn (#9414)
patrickvonplaten Oct 16, 2024
fb60ae9
[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189)
mzusman Oct 16, 2024
5b8a1fd
[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCP…
0xjunhao Oct 16, 2024
8345045
[Performance][Spec Decode] Optimize ngram lookup performance (#9333)
LiuXiaoxuanPKU Oct 16, 2024
776dbd7
[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267)
russellb Oct 16, 2024
c3fab5f
[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token qu…
tlrmchlsmth Oct 16, 2024
92d86da
[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391)
rasmith Oct 17, 2024
dbfa8d3
Add notes on the use of Slack (#9442)
terrytangyuan Oct 17, 2024
e312e52
[Kernel] Add Exllama as a backend for compressed-tensors (#9395)
LucasWilkinson Oct 17, 2024
390be74
[Misc] Print stack trace using `logger.exception` (#9461)
DarkLight1337 Oct 17, 2024
9d30a05
[misc] CUDA Time Layerwise Profiler (#8337)
LucasWilkinson Oct 17, 2024
5e443b5
[Bugfix] Allow prefill of assistant response when using `mistral_comm…
sasha0552 Oct 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/lm-eval-harness/run-lm-eval-gsm-hf-baseline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# We can use this script to compute baseline accuracy on GSM for transformers.
#
# Make sure you have lm-eval-harness installed:
# pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@9516087b81a61d0e220b22cc1b75be76de23bc10
# pip install lm-eval==0.4.4

usage() {
echo``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# We use this for fp8, which HF does not support.
#
# Make sure you have lm-eval-harness installed:
# pip install lm-eval==0.4.3
# pip install lm-eval==0.4.4

usage() {
echo``
Expand Down
4 changes: 2 additions & 2 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ steps:
agents:
queue: cpu_queue
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg buildkite_commit=$BUILDKITE_COMMIT --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
# rename the files to change linux -> manylinux1
Expand All @@ -22,7 +22,7 @@ steps:
agents:
queue: cpu_queue
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg buildkite_commit=$BUILDKITE_COMMIT --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
# rename the files to change linux -> manylinux1
Expand Down
8 changes: 7 additions & 1 deletion .buildkite/run-cpu-test-ppc64le.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@ docker run -itd --entrypoint /bin/bash -v ~/.cache/huggingface:/root/.cache/hugg
# Run basic model test
docker exec cpu-test bash -c "
pip install pytest matplotlib einops transformers_stream_generator
pytest -v -s tests/models -m \"not vlm\" --ignore=tests/models/test_embedding.py --ignore=tests/models/test_oot_registration.py --ignore=tests/models/test_registry.py --ignore=tests/models/test_jamba.py --ignore=tests/models/test_danube3_4b.py" # Mamba and Danube3-4B on CPU is not supported
pytest -v -s tests/models -m \"not vlm\" \
--ignore=tests/models/test_embedding.py \
--ignore=tests/models/test_oot_registration.py \
--ignore=tests/models/test_registry.py \
--ignore=tests/models/test_jamba.py \
--ignore=tests/models/test_mamba.py \
--ignore=tests/models/test_danube3_4b.py" # Mamba kernels and Danube3-4B on CPU is not supported

# online inference
docker exec cpu-test bash -c "
Expand Down
11 changes: 9 additions & 2 deletions .buildkite/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,20 @@ docker exec cpu-test bash -c "
pytest -v -s tests/models/decoder_only/language \
--ignore=tests/models/test_fp8.py \
--ignore=tests/models/decoder_only/language/test_jamba.py \
--ignore=tests/models/decoder_only/language/test_mamba.py \
--ignore=tests/models/decoder_only/language/test_granitemoe.py \
--ignore=tests/models/decoder_only/language/test_danube3_4b.py" # Mamba and Danube3-4B on CPU is not supported

# Run compressed-tensor test
# docker exec cpu-test bash -c "
# pytest -s -v \
# tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup \
# tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynanmic_per_token"

# Run AWQ test
docker exec cpu-test bash -c "
pytest -s -v \
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup \
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynanmic_per_token"
tests/quantization/test_ipex_quant.py"

# online inference
docker exec cpu-test bash -c "
Expand Down
49 changes: 30 additions & 19 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ steps:
- vllm/
- tests/basic_correctness/test_chunked_prefill
commands:
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=XFORMERS VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s basic_correctness/test_chunked_prefill.py

- label: Core Test # 10min
mirror_hardwares: [amd]
Expand All @@ -88,7 +88,11 @@ steps:
- vllm/distributed
- tests/core
commands:
- pytest -v -s core
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core/test_scheduler.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/test_chunked_prefill_scheduler.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/block/e2e/test_correctness.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/block/e2e/test_correctness_sliding_window.py
- pytest -v -s core --ignore=core/block/e2e/test_correctness.py --ignore=core/test_scheduler.py --ignore=core/test_chunked_prefill_scheduler.py --ignore=core/block/e2e/test_correctness.py --ignore=core/block/e2e/test_correctness_sliding_window.py

- label: Entrypoints Test # 40min
working_dir: "/vllm-workspace/tests"
Expand All @@ -98,7 +102,6 @@ steps:
- vllm/
commands:
- pip install -e ./plugins/vllm_add_dummy_model
- pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@a4987bba6e9e9b3f22bd3a6c1ecf0abd04fd5622#egg=lm_eval[api]
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py
- pytest -v -s entrypoints/llm/test_lazy_outlines.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
Expand All @@ -118,7 +121,9 @@ steps:
- vllm/core/
- tests/distributed
- tests/spec_decode/e2e/test_integration_dist_tp4
- tests/compile
commands:
- pytest -v -s compile/test_basic_correctness.py
- pytest -v -s distributed/test_pynccl.py
- pytest -v -s spec_decode/e2e/test_integration_dist_tp4.py

Expand Down Expand Up @@ -179,14 +184,16 @@ steps:
- python3 offline_inference_vision_language_multi_image.py
- python3 tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference_encoder_decoder.py
- python3 offline_profile.py --model facebook/opt-125m

- label: Prefix Caching Test # 9min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/prefix_caching
commands:
- pytest -v -s prefix_caching
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s prefix_caching/test_prefix_caching.py
- pytest -v -s prefix_caching --ignore=prefix_caching/test_prefix_caching.py

- label: Samplers Test # 36min
source_file_dependencies:
Expand All @@ -210,7 +217,8 @@ steps:
- tests/spec_decode
commands:
- pytest -v -s spec_decode/e2e/test_multistep_correctness.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s spec_decode --ignore=spec_decode/e2e/test_multistep_correctness.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s spec_decode/e2e/test_compatibility.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s spec_decode --ignore=spec_decode/e2e/test_multistep_correctness.py --ignore=spec_decode/e2e/test_compatibility.py

- label: LoRA Test %N # 15min each
mirror_hardwares: [amd]
Expand All @@ -226,14 +234,16 @@ steps:
- vllm/
- tests/compile
commands:
- pytest -v -s compile/test_full_graph_smoke.py
- pytest -v -s compile/test_basic_correctness.py

- label: "PyTorch Fullgraph Test" # 18min
source_file_dependencies:
- vllm/
- tests/compile
commands:
- pytest -v -s compile/test_full_graph.py
# TODO: re-write in comparison tests, and fix symbolic shape
# for quantization ops.
# - label: "PyTorch Fullgraph Test" # 18min
# source_file_dependencies:
# - vllm/
# - tests/compile
# commands:
# - pytest -v -s compile/test_full_graph.py

- label: Kernels Test %N # 1h each
mirror_hardwares: [amd]
Expand Down Expand Up @@ -270,15 +280,14 @@ steps:
- csrc/
- vllm/model_executor/layers/quantization
- tests/quantization
command: pytest -v -s quantization
command: VLLM_TEST_FORCE_LOAD_FORMAT=auto pytest -v -s quantization

- label: LM Eval Small Models # 53min
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- pip install lm-eval
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- bash ./run-tests.sh -c configs/models-small.txt -t 1

Expand Down Expand Up @@ -332,17 +341,20 @@ steps:
source_file_dependencies:
- vllm/
- tests/models/embedding/language
- tests/models/embedding/vision_language
- tests/models/encoder_decoder/language
- tests/models/encoder_decoder/vision_language
commands:
- pytest -v -s models/embedding/language
- pytest -v -s models/embedding/vision_language
- pytest -v -s models/encoder_decoder/language
- pytest -v -s models/encoder_decoder/vision_language

# This test is used only in PR development phase to test individual models and should never run on main
- label: Custom Models Test
#mirror_hardwares: [amd]
optional: true
commands:
- echo 'Testing custom models...'
# PR authors can temporarily add commands below to test individual models
# e.g. pytest -v -s models/encoder_decoder/vision_language/test_mllama.py
# *To avoid merge conflicts, remember to REMOVE (not just comment out) them before merging the PR*
Expand Down Expand Up @@ -390,10 +402,10 @@ steps:
- tests/distributed/
- vllm/compilation
commands:
- pytest -v -s ./compile/test_full_graph_multi_gpu.py
- pytest -v -s ./compile/test_basic_correctness.py
- pytest -v -s ./compile/test_wrapper.py
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep -q 'Same node test passed'
- TARGET_TEST_SUITE=L4 pytest basic_correctness/ -v -s -m distributed_2_gpus
- TARGET_TEST_SUITE=L4 VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest basic_correctness/ -v -s -m distributed_2_gpus
# Avoid importing model tests that cause CUDA reinitialization error
- pytest models/encoder_decoder/language/test_bart.py -v -s -m distributed_2_gpus
- pytest models/encoder_decoder/vision_language/test_broadcast.py -v -s -m distributed_2_gpus
Expand Down Expand Up @@ -492,6 +504,5 @@ steps:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- pip install lm-eval
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- bash ./run-tests.sh -c configs/models-large.txt -t 4
30 changes: 29 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,33 @@
/.venv
/build
dist
Dockerfile*
vllm/*.so

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

.mypy_cache

# Distribution / packaging
.Python
/build/
cmake-build-*/
CMakeUserPresets.json
develop-eggs/
/dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
21 changes: 16 additions & 5 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,19 +1,30 @@
# See https://help.github.com/articles/about-codeowners/
# for more info about CODEOWNERS file

# This lists cover the "core" components of vLLM that require careful review
/vllm/attention/backends/abstract.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/core @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/engine/llm_engine.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/executor/executor_base.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker_base.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/worker/worker.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
/vllm/model_executor/layers/sampler.py @WoosukKwon @zhuohan123 @youkaichao @alexm-neuralmagic @comaniac @njhill
CMakeLists.txt @tlrmchlsmth @WoosukKwon

# Test ownership
/tests/async_engine @njhill @robertgshaw2-neuralmagic @simon-mo
/tests/test_inputs.py @DarkLight1337 @ywang96
/tests/entrypoints @DarkLight1337 @robertgshaw2-neuralmagic @simon-mo
/tests/entrypoints @DarkLight1337 @robertgshaw2-neuralmagic @simon-mo
/tests/models @DarkLight1337 @ywang96
/tests/multimodal @DarkLight1337 @ywang96
/tests/prefix_caching @comaniac @KuntaiDu
/tests/prefix_caching @comaniac @KuntaiDu
/tests/spec_decode @njhill @LiuXiaoxuanPKU
/tests/kernels @tlrmchlsmth @WoosukKwon
/tests/kernels @tlrmchlsmth @WoosukKwon
/tests/quantization @mgoin @robertgshaw2-neuralmagic
/.buildkite/lm-eval-harness @mgoin @simon-mo
/.buildkite/lm-eval-harness @mgoin @simon-mo
/tests/distributed/test_multi_node_assignment.py @youkaichao
/tests/distributed/test_pipeline_parallel.py @youkaichao
/tests/distributed/test_same_node.py @youkaichao
/tests/multi_step @alexm-neuralmagic @SolitaryThinker @comaniac
/tests/multi_step @alexm-neuralmagic @comaniac
/tests/weight_loading @mgoin @youkaichao
/tests/basic_correctness/test_chunked_prefill @rkooo567 @comaniac
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: 2
updates:
# Maintain dependencies for GitHub Actions
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
2 changes: 1 addition & 1 deletion .github/workflows/actionlint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: "Checkout"
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
uses: actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871 # v4.2.1
with:
fetch-depth: 0

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add_label_automerge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Add label
uses: actions/github-script@v6
uses: actions/github-script@v7
with:
script: |
github.rest.issues.addLabels({
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/clang-format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ jobs:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand Down
19 changes: 4 additions & 15 deletions .github/workflows/mypy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ on:
- main

jobs:
ruff:
mypy:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand All @@ -32,15 +32,4 @@ jobs:
pip install types-setuptools
- name: Mypy
run: |
mypy
mypy tests --follow-imports skip
mypy vllm/attention --follow-imports skip
mypy vllm/distributed --follow-imports skip
mypy vllm/engine --follow-imports skip
mypy vllm/executor --follow-imports skip
mypy vllm/lora --follow-imports skip
mypy vllm/model_executor --follow-imports skip
mypy vllm/prompt_adapter --follow-imports skip
mypy vllm/spec_decode --follow-imports skip
mypy vllm/worker --follow-imports skip

tools/mypy.sh
Loading