-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Open
Labels
Description
In an effort to make the CMake more readable, stable and easy to use we have a few tasks we'd like to work on, creating a GitHub issue here to track that progress, some planned changes/investigations:
- Have vllm-flash-attn use ExternalProject currently vllm-flash-attn uses the parent CMake scope which creates many footguns since it is in a separate repo, using
ExternalProject
will mean that the vllm-flash-attn will be run in a separate CMake scope/process - Warn that PTX builds are not currently supported (post [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) #8845), currently if there is a
+PTX
inTORCH_CUDA_ARCH_LIST
this will be ignored. We should warn when this is the case. Alternatively we can add support for PTX builds although this is generally not desirable since PTX increases the wheel size by quite a bit (PTX is larger than SASS), and we already build for all currently supported arches. - Rename
define_gpu_extension_target
, currently this is used for CPU extensions too so the name is now misleading - Potential build both C++ and CUDA extensions when building for CUDA and using torch dispatcher to dispatch between the two, [Kernel] Factor registrations #8424
- Look into removing early returns in CMakeLists.txt (potentially move backends into its own files)
- Add a CI test of local builds, i.e.
pip install -e .