-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flash-attn #26239
Add flash-attn #26239
Commits on May 4, 2024
-
Flash Attention: Fast and Memory-Efficient Exact Attention! Repo at https://github.com/Dao-AILab/flash-attention
Configuration menu - View commit details
-
Copy full SHA for 07ec11e - Browse repository at this point
Copy the full SHA 07ec11eView commit details -
To try and fix `OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root`
Configuration menu - View commit details
-
Copy full SHA for a4def75 - Browse repository at this point
Copy the full SHA a4def75View commit details -
Configuration menu - View commit details
-
Copy full SHA for 06713e3 - Browse repository at this point
Copy the full SHA 06713e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51d7d74 - Browse repository at this point
Copy the full SHA 51d7d74View commit details -
Skip build on non-cuda platforms
This flash-attn library only runs on Linux with CUDA GPUs if I'm not mistaken.
Configuration menu - View commit details
-
Copy full SHA for aa17a2c - Browse repository at this point
Copy the full SHA aa17a2cView commit details -
Add libcublas-dev, libcusolver-dev, libcusparse-dev to host deps
Needed to compile flash-attn on CUDA 12.0 in conda-forge.
Configuration menu - View commit details
-
Copy full SHA for 4d8b37c - Browse repository at this point
Copy the full SHA 4d8b37cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 04a346a - Browse repository at this point
Copy the full SHA 04a346aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 16414ff - Browse repository at this point
Copy the full SHA 16414ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d2212d - Browse repository at this point
Copy the full SHA 2d2212dView commit details
Commits on May 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2cde3c1 - Browse repository at this point
Copy the full SHA 2cde3c1View commit details
Commits on May 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 501aa9d - Browse repository at this point
Copy the full SHA 501aa9dView commit details -
Set TORCH_CUDA_ARCH_LIST to 8.0 and above
Only compiling on Compute Capability 8.0 and above, see https://developer.nvidia.com/cuda-gpus. I.e. NVIDIA Ampere generation devices or newer.
Configuration menu - View commit details
-
Copy full SHA for a1b1faa - Browse repository at this point
Copy the full SHA a1b1faaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5235314 - Browse repository at this point
Copy the full SHA 5235314View commit details -
Configuration menu - View commit details
-
Copy full SHA for ef03f90 - Browse repository at this point
Copy the full SHA ef03f90View commit details -
Configuration menu - View commit details
-
Copy full SHA for ebae578 - Browse repository at this point
Copy the full SHA ebae578View commit details -
Configuration menu - View commit details
-
Copy full SHA for 460eeb2 - Browse repository at this point
Copy the full SHA 460eeb2View commit details -
BLD: Replace setup script with simpler one
This simpler script doesn't have unused features and doesn't set -O3 because our channel defaults are -O2
Configuration menu - View commit details
-
Copy full SHA for 317646a - Browse repository at this point
Copy the full SHA 317646aView commit details -
Update recipes/flash-attn/meta.yaml
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 0b81f6f - Browse repository at this point
Copy the full SHA 0b81f6fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 96e817a - Browse repository at this point
Copy the full SHA 96e817aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0733767 - Browse repository at this point
Copy the full SHA 0733767View commit details
Commits on May 7, 2024
-
ignore_run_exports_from libcublas-dev, libcusolver-dev, libcusparse-dev
Silence warnings like: ``` WARNING (flash-attn): dso library package conda-forge/linux-64::libcublas==12.0.1.189=hd3aeb46_3 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`) WARNING (flash-attn): dso library package conda-forge/linux-64::libcusparse==12.0.0.76=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`) WARNING (flash-attn): dso library package conda-forge/linux-64::libcusolver==11.4.2.57=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`) ```
Configuration menu - View commit details
-
Copy full SHA for 37e676a - Browse repository at this point
Copy the full SHA 37e676aView commit details -
Temporarily set TORCH_CUDA_ARCH_LIST=8.6+PTX and MAX_JOBS=1
Trying to reduce CPU load on Azure CI to debug build.
Configuration menu - View commit details
-
Copy full SHA for fc2fc76 - Browse repository at this point
Copy the full SHA fc2fc76View commit details -
Configuration menu - View commit details
-
Copy full SHA for 63dcb65 - Browse repository at this point
Copy the full SHA 63dcb65View commit details