Call for Help: Proper Build System (CMake, Bazel, etc). 

Currently vLLM's compilation tool uses PyTorch's extension builders, which calls Ninja under the hood. This works okay but have the following issues:
* Only supports NVIDIA and AMD GPUs. 
* Slow sequential builds. This is amplified by adding quantization kernels and LoRA kernels. 
* No caching and incremental builds. 

We would liked to ask for community's help on recommending a technology, prototype, and implement it. Ideally something like CMake or Bazel could work but it requires some careful thinking. 

The requirements:
* Must support multiple hardware architecture (NVIDIA, AMD, Intel, etc). 
* Must support incremental build, which also implies caching. 
* Must support parallelizable build.
* Good to have editor support (by generating compilation database). 
* Ideally it would not OOM like current setup. Currently due to the rigid structure, we have to carefully set `MAX_JOBS` and `NVCC_THREADS` to get around compiler goes out of memory. I think this is because nvcc spawn threads for each SM architecture we are compiling to. 
* vaguely, "future proof". 

Currently, the "build system" is all in here https://github.com/vllm-project/vllm/blob/main/setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions