Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Add initial conditional testing spec #6841

Merged
merged 8 commits into from
Aug 6, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 88 additions & 35 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,27 @@
# https://github.com/vllm-project/buildkite-ci/blob/main/scripts/test-template-aws.j2
# to generate the final pipeline yaml file.

# Documentation
# label(str): the name of the test. emoji allowed.
# fast_check(bool): whether to run this on each commit by default, without the /ready tag.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether to run this on each commit on fastcheck pipeline

# fast_check_only(bool): whether to skip this test on full suite.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run this test on fastcheck pipeline only

# command(str): the single command to run for tests. incompatible with commands.
# commands(list): the list of commands to run for test. incompatbile with command.
# mirror_hardwares(list): the list of hardwares to run the test on as well. currently only supports [amd]
# gpu(str): override the GPU selection for the test. default is on L4 GPUs. currently only supports a100
# num_gpus(int): override the number of GPUs for the test. default to 1 GPU. currently support 2,4.
# num_nodes(int): whether to simulate multi-node setup by launch multiple containers on one host,
# in this case, commands must be specified. the first command runs on first host, the second
# command runs on the second host.
# working_dir(str): override the place where command execute. default to "/vllm-workspace/tests".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the place where command should execute, default to /vllm-workspace/tests

# source_file_dependencies(list): the list of prefix to opt-in the test for, if empty, the test will always run.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khluu can you review some of the docs here? I'm afraid I might have mis-interpret some fields.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!


steps:
- label: Async Engine, Inputs, Utils, Worker Test
fast_check: true
fast_check_only: true
source_file_dependencies:
- vllm/
commands:
- pytest -v -s async_engine # Async Engine
- pytest -v -s test_inputs.py
Expand All @@ -19,7 +35,8 @@ steps:

- label: Metrics, Tracing Test
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to later section

fast_check: true
fast_check_only: true
source_file_dependencies:
- vllm/
commands:
- pytest -v -s metrics # Metrics
- "pip install \
Expand All @@ -31,17 +48,17 @@ steps:

- label: Regression Test
mirror_hardwares: [amd]
fast_check: true
fast_check: false
source_file_dependencies:
- vllm/
command: pytest -v -s test_regression.py
working_dir: "/vllm-workspace/tests" # optional

- label: AsyncEngine Test
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed because it is covered in Async Engine, Inputs, Utils, Worker Test

#mirror_hardwares: [amd]
command: pytest -v -s async_engine

- label: Basic Correctness Test
mirror_hardwares: [amd]
fast_check: true
source_file_dependencies:
- vllm/
commands:
# This flashinfer installation will fail on AMD ROCm, so it is set as optional.
- pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp310-cp310-linux_x86_64.whl || true
Expand All @@ -54,14 +71,18 @@ steps:
- label: Core Test
mirror_hardwares: [amd]
fast_check: true
source_file_dependencies:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the source file dependencies, I think we should add the tests themselves as dependency too. Like if tests/core/ changes, this test should also run. Maybe we can rename source_file_dependencies to dependencies to be inclusive?

- vllm/core
- vllm/distributed
commands:
- pytest -v -s core
- pytest -v -s distributed/test_parallel_state.py

- label: Distributed Comm Ops Test
#mirror_hardwares: [amd]
working_dir: "/vllm-workspace/tests"
num_gpus: 2
source_file_dependencies:
- vllm/distributed
commands:
- pytest -v -s distributed/test_comm_ops.py
- pytest -v -s distributed/test_shm_broadcast.py
Expand All @@ -70,6 +91,8 @@ steps:
working_dir: "/vllm-workspace/tests"
num_gpus: 2
num_nodes: 2
source_file_dependencies:
- vllm/distributed
commands:
- # the following commands are for the first node, with ip 192.168.10.10 (ray environment already set up)
- VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py
Expand All @@ -81,6 +104,8 @@ steps:
mirror_hardwares: [amd]
working_dir: "/vllm-workspace/tests"
num_gpus: 2
source_file_dependencies:
- vllm/
commands:
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py
- TEST_DIST_MODEL=facebook/opt-125m DISTRIBUTED_EXECUTOR_BACKEND=ray pytest -v -s distributed/test_basic_distributed_correctness.py
Expand All @@ -102,10 +127,11 @@ steps:
- CUDA_VISIBLE_DEVICES=0,1 pytest -v -s distributed/test_utils.py

- label: Distributed Tests (4 GPUs)
#mirror_hardwares: [amd]
working_dir: "/vllm-workspace/tests"
num_gpus: 4
fast_check: true
source_file_dependencies:
- vllm/
commands:
- pytest -v -s distributed/test_pynccl.py
# We want to test that models which use 2 GPUs work with 4 GPUs, which is why we duplicate them here.
Expand All @@ -118,11 +144,15 @@ steps:
- label: Pipeline Parallelism Test
working_dir: "/vllm-workspace/tests"
num_gpus: 4
source_file_dependencies:
- vllm/
commands:
- pytest -v -s distributed/test_pipeline_parallel.py

- label: Engine Test
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pytest -v -s engine test_sequence.py test_config.py test_logger.py
# OOM in the CI unless we run this separately
Expand All @@ -131,18 +161,20 @@ steps:
- label: Entrypoints Test
fast_check: true
mirror_hardwares: [amd]

source_file_dependencies:
- vllm/entrypoints
commands:
- pytest -v -s entrypoints/llm
- pytest -v -s entrypoints/openai

- label: Examples Test
working_dir: "/vllm-workspace/examples"
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/entrypoints
- examples/
commands:
# install aws cli for llava_example.py
# install tensorizer for tensorize_vllm_model.py
- pip install awscli tensorizer
- pip install awscli tensorizer # for llava example and tensorizer test
- python3 offline_inference.py
- python3 cpu_offload.py
- python3 offline_inference_with_prefix.py
Expand All @@ -151,108 +183,123 @@ steps:
- python3 tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors

- label: Inputs Test
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pytest -v -s test_inputs.py
- pytest -v -s multimodal

- label: Kernels Test %N
#mirror_hardwares: [amd]
source_file_dependencies:
- csrc/
commands:
- pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp310-cp310-linux_x86_64.whl
- pytest -v -s kernels --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT
parallelism: 4

- label: Models Test
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp310-cp310-linux_x86_64.whl
- pytest -v -s models -m \"not vlm\"

- label: Vision Language Models Test
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pytest -v -s models -m vlm

- label: Prefix Caching Test
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pytest -v -s prefix_caching

- label: Samplers Test
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/model_executor/layers
- vllm/sampling_metadata.py
command: pytest -v -s samplers

- label: LogitsProcessor Test
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/model_executor/layers
command: pytest -v -s test_logits_processor.py

- label: Utils Test
source_file_dependencies:
- vllm/
commands:
- pytest -v -s test_utils.py
- pytest -v -s test_embedded_commit.py

- label: Worker Test
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/worker
command: pytest -v -s worker

- label: Speculative decoding tests
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/spec_decode
commands:
# See https://github.com/vllm-project/vllm/issues/5152
- export VLLM_ATTENTION_BACKEND=XFORMERS
- pytest -v -s spec_decode

- label: LoRA Test %N
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/lora
- csrc/punica
command: pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_long_context.py
parallelism: 4

- label: LoRA Long Context (Distributed)
#mirror_hardwares: [amd]
num_gpus: 4
# This test runs llama 13B, so it is required to run on 4 GPUs.
num_gpus: 4
source_file_dependencies:
- vllm/lora
- csrc/punica
commands:
# FIXIT: find out which code initialize cuda before running the test
# before the fix, we need to use spawn to test it
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s -x lora/test_long_context.py

- label: Tensorizer Test
#mirror_hardwares: [amd]
soft_fail: true
fast_check: true
source_file_dependencies:
- vllm/model_executor/model_loader
commands:
- apt-get install -y curl libsodium23
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s tensorizer_loader

- label: Metrics Test
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered in fastcheck that combiend metrics and tracing

mirror_hardwares: [amd]
command: pytest -v -s metrics

- label: Quantization Test
#mirror_hardwares: [amd]
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
command: pytest -v -s quantization

- label: Tracing Test
commands:
- "pip install \
opentelemetry-sdk \
opentelemetry-api \
opentelemetry-exporter-otlp \
opentelemetry-semantic-conventions-ai"
- pytest -v -s tracing
Comment on lines -219 to -226
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered in fastcheck that combiend metrics and tracing


- label: Benchmarks
working_dir: "/vllm-workspace/.buildkite"
mirror_hardwares: [amd]
source_file_dependencies:
- benchmarks/
commands:
- pip install aiohttp
- bash run-benchmarks.sh

- label: LM Eval Small Models
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
source_file_dependencies:
- csrc/
- vllm/
commands:
- pip install lm-eval
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
Expand All @@ -262,6 +309,9 @@ steps:
gpu: a100
num_gpus: 4
working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
source_file_dependencies:
- csrc/
- vllm/
commands:
- pip install lm-eval
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
Expand All @@ -271,13 +321,16 @@ steps:
working_dir: "/vllm-workspace/test_docs/docs"
fast_check: true
no_gpu: True
source_file_dependencies: [] # always run
commands:
- pip install -r requirements-docs.txt
- SPHINXOPTS=\"-W\" make html

- label: Distributed Tests (A100)
gpu: a100
num_gpus: 4
source_file_dependencies:
- vllm/
commands:
# NOTE: don't test llama model here, it seems hf implementation is buggy
# see https://github.com/vllm-project/vllm/pull/5689 for details
Expand Down