Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flash-attn #26239

Merged
merged 23 commits into from
May 7, 2024
Merged

Add flash-attn #26239

merged 23 commits into from
May 7, 2024

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented May 4, 2024

Flash Attention: Fast and Memory-Efficient Exact Attention! Repo at https://github.com/Dao-AILab/flash-attention

Packaging flash-attn, so that I can package transformer-engine later (edit: see #26296)

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

Flash Attention: Fast and Memory-Efficient Exact Attention! Repo at https://github.com/Dao-AILab/flash-attention
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/flash-attn) and found it was in an excellent condition.

To try and fix `OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root`
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/flash-attn) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipes/flash-attn:

This flash-attn library only runs on Linux with CUDA GPUs if I'm not mistaken.
Needed to compile flash-attn on CUDA 12.0 in conda-forge.
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/flash-attn) and found some lint.

Here's what I've got...

For recipes/flash-attn:

  • noarch packages can't have skips with selectors. If the selectors are necessary, please remove noarch: python.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/flash-attn) and found some lint.

Here's what I've got...

For recipes/flash-attn:

  • Non noarch packages should have python requirement without any version constraints.
  • Non noarch packages should have python requirement without any version constraints.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/flash-attn) and found it was in an excellent condition.

number: 0
script: {{ PYTHON }} -m pip install . -vvv --no-deps --no-build-isolation
script_env:
- FLASH_ATTENTION_FORCE_BUILD=TRUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't set FLASH_ATTENTION_FORCE_BUILD=TRUE, the package tries to download pre-build binaries instead of building them. Pre-built binaries are not allowed on our channel, you must compile all binaries with our toolchains.

Copy link
Member Author

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carterbox for helping out and pushing that super helpful patch! I've kinda just started from the output of grayskull pypi flash-attn, and hoped that it would work almost out of the box (after adding in the correct cuda-related dependencies).

It seems like the linux-cuda builds timed out, so I'll kickstart that again, and maybe do some testing locally to make sure it works.

recipes/flash-attn/meta.yaml Show resolved Hide resolved
recipes/flash-attn/meta.yaml Show resolved Hide resolved
@carterbox
Copy link
Member

carterbox commented May 5, 2024

It seems like the linux-cuda builds timed out, so I'll kickstart that again, and maybe do some testing locally to make sure it works.

Because this package takes so long to compile, it probably exceeds the CI limits here. Try adjusting the Azure timeout to 6 hours. I believe that is the longest allowed.

https://conda-forge.org/docs/maintainer/conda_forge_yml/#timeout-minutes

You can also debug locally. Set TORCH_CUDA_ARCH_LIST to only one arch to reduce compile times.

@weiji14
Copy link
Member Author

weiji14 commented May 6, 2024

It seems like the linux-cuda builds timed out, so I'll kickstart that again, and maybe do some testing locally to make sure it works.

Because this package takes so long to compile, it probably exceeds the CI limits here. Try adjusting the Azure timeout to 6 hours. I believe that is the longest allowed.

https://conda-forge.org/docs/maintainer/conda_forge_yml/#timeout-minutes

Ah, didn't realize we could add a conda-forge.yml file in staged-recipes! Done in 501aa9d

You can also debug locally. Set TORCH_CUDA_ARCH_LIST to only one arch to reduce compile times.

Good tip. Actually, I think FlashAttention-2 only works on NVIDIA Ampere generation GPUs or newer according to https://github.com/Dao-AILab/flash-attention/tree/v2.5.8?tab=readme-ov-file#installation-and-features, so I've set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX at commit a1b1faa to allow compute capability 8.0 or above only.

Comment on lines +1 to +2
azure:
timeout_minutes: 360
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦🏼 You're right. But also, it looks like the timeout is already set to 360 minutes in staged-recipes. So probably, the builds are failing for other reasons. Perhaps, the worker crashes by running out of RAM or disk space? Let's try reducing the compute load as much as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this file because we need to have it in the feedstock.

@weiji14
Copy link
Member Author

weiji14 commented May 6, 2024

Ah, my local build on Linux CUDA 12.0 finally completed. Posting the tail end of the logs for reference:

[48/49] /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/bin/nvcc  -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.cu -o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/bin/x86_64-conda-linux-gnu-cc
nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
[49/49] /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/bin/nvcc  -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu -o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/bin/x86_64-conda-linux-gnu-cc
nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib/stubs -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work=/usr/local/src/conda/flash-attn-2.5.8 -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_=/usr/local/src/conda-prefix -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/targets/x86_64-linux/lib/stubs /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim96_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.o -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so
/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/flash_attn
creating build/bdist.linux-x86_64/wheel/flash_attn/modules
copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/mha.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/mlp.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/embedding.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/block.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
creating build/bdist.linux-x86_64/wheel/flash_attn/losses
copying build/lib.linux-x86_64-cpython-311/flash_attn/losses/cross_entropy.py -> build/bdist.linux-x86_64/wheel/flash_attn/losses
copying build/lib.linux-x86_64-cpython-311/flash_attn/losses/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/losses
copying build/lib.linux-x86_64-cpython-311/flash_attn/fused_softmax.py -> build/bdist.linux-x86_64/wheel/flash_attn
creating build/bdist.linux-x86_64/wheel/flash_attn/ops
creating build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/rotary.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/linear.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/cross_entropy.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/mlp.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/k_activations.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/layer_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/fused_dense.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/activations.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/layer_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/rms_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_triton_og.py -> build/bdist.linux-x86_64/wheel/flash_attn
copying build/lib.linux-x86_64-cpython-311/flash_attn/bert_padding.py -> build/bdist.linux-x86_64/wheel/flash_attn
copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_blocksparse_attention.py -> build/bdist.linux-x86_64/wheel/flash_attn
copying build/lib.linux-x86_64-cpython-311/flash_attn/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn
copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_triton.py -> build/bdist.linux-x86_64/wheel/flash_attn
creating build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/btlm.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gptj.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/baichuan.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/opt.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gpt_neox.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/bert.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gpt.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/bigcode.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/falcon.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/llama.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/models/vit.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_interface.py -> build/bdist.linux-x86_64/wheel/flash_attn
creating build/bdist.linux-x86_64/wheel/flash_attn/layers
copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/rotary.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/patch_embed.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
creating build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/generation.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/distributed.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/pretrained.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/benchmark.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_blocksparse_attn_interface.py -> build/bdist.linux-x86_64/wheel/flash_attn
copying build/lib.linux-x86_64-cpython-311/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
writing flash_attn.egg-info/PKG-INFO
writing dependency_links to flash_attn.egg-info/dependency_links.txt
writing requirements to flash_attn.egg-info/requires.txt
writing top-level names to flash_attn.egg-info/top_level.txt
reading manifest file 'flash_attn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.cu' under directory 'flash_attn'
warning: no files found matching '*.h' under directory 'flash_attn'
warning: no files found matching '*.cuh' under directory 'flash_attn'
warning: no files found matching '*.cpp' under directory 'flash_attn'
warning: no files found matching '*.hpp' under directory 'flash_attn'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'flash_attn.egg-info/SOURCES.txt'
Copying flash_attn.egg-info to build/bdist.linux-x86_64/wheel/flash_attn-2.5.8-py3.11.egg-info
running install_scripts
creating build/bdist.linux-x86_64/wheel/flash_attn-2.5.8.dist-info/WHEEL
creating '/tmp/pip-wheel-x3dm7157/flash_attn-2.5.8-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so'
adding 'flash_attn/__init__.py'
adding 'flash_attn/bert_padding.py'
adding 'flash_attn/flash_attn_interface.py'
adding 'flash_attn/flash_attn_triton.py'
adding 'flash_attn/flash_attn_triton_og.py'
adding 'flash_attn/flash_blocksparse_attention.py'
adding 'flash_attn/flash_blocksparse_attn_interface.py'
adding 'flash_attn/fused_softmax.py'
adding 'flash_attn/layers/__init__.py'
adding 'flash_attn/layers/patch_embed.py'
adding 'flash_attn/layers/rotary.py'
adding 'flash_attn/losses/__init__.py'
adding 'flash_attn/losses/cross_entropy.py'
adding 'flash_attn/models/__init__.py'
adding 'flash_attn/models/baichuan.py'
adding 'flash_attn/models/bert.py'
adding 'flash_attn/models/bigcode.py'
adding 'flash_attn/models/btlm.py'
adding 'flash_attn/models/falcon.py'
adding 'flash_attn/models/gpt.py'
adding 'flash_attn/models/gpt_neox.py'
adding 'flash_attn/models/gptj.py'
adding 'flash_attn/models/llama.py'
adding 'flash_attn/models/opt.py'
adding 'flash_attn/models/vit.py'
adding 'flash_attn/modules/__init__.py'
adding 'flash_attn/modules/block.py'
adding 'flash_attn/modules/embedding.py'
adding 'flash_attn/modules/mha.py'
adding 'flash_attn/modules/mlp.py'
adding 'flash_attn/ops/__init__.py'
adding 'flash_attn/ops/activations.py'
adding 'flash_attn/ops/fused_dense.py'
adding 'flash_attn/ops/layer_norm.py'
adding 'flash_attn/ops/rms_norm.py'
adding 'flash_attn/ops/triton/__init__.py'
adding 'flash_attn/ops/triton/cross_entropy.py'
adding 'flash_attn/ops/triton/k_activations.py'
adding 'flash_attn/ops/triton/layer_norm.py'
adding 'flash_attn/ops/triton/linear.py'
adding 'flash_attn/ops/triton/mlp.py'
adding 'flash_attn/ops/triton/rotary.py'
adding 'flash_attn/utils/__init__.py'
adding 'flash_attn/utils/benchmark.py'
adding 'flash_attn/utils/distributed.py'
adding 'flash_attn/utils/generation.py'
adding 'flash_attn/utils/pretrained.py'
adding 'flash_attn-2.5.8.dist-info/AUTHORS'
adding 'flash_attn-2.5.8.dist-info/LICENSE'
adding 'flash_attn-2.5.8.dist-info/METADATA'
adding 'flash_attn-2.5.8.dist-info/WHEEL'
adding 'flash_attn-2.5.8.dist-info/top_level.txt'
adding 'flash_attn-2.5.8.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Building wheel for flash_attn (setup.py): finished with status 'done'
Created wheel for flash_attn: filename=flash_attn-2.5.8-cp311-cp311-linux_x86_64.whl size=118037430 sha256=98dcb93971fcf325e6e445753d4bde87c83b2f106ae7d9623f41fbaaf887ed32
Stored in directory: /tmp/pip-ephem-wheel-cache-wunavxq9/wheels/f9/2f/9f/b8e4397695654fd6038ef99f6fc6a1e126be3c8b23e8ee6855
Successfully built flash_attn
Installing collected packages: flash_attn

Successfully installed flash_attn-2.5.8
Removed build tracker: '/tmp/pip-build-tracker-fy6htau6'

Resource usage statistics from building flash-attn:
 Process count: 20
 CPU time: Sys=0:02:37.8, User=4:35:12.0
 Memory: 22.5G
 Disk usage: 1.1M
 Time elapsed: 1:12:55.7


Packaging flash-attn
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'FLASH_ATTENTION_FORCE_BUILD' is being passed through with value 'TRUE'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'FLASH_ATTENTION_SKIP_CUDA_BUILD' is being passed through with value 'FALSE'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'FLASH_ATTENTION_FORCE_CXX11_ABI' is being passed through with value 'FALSE'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'MAX_JOBS' is being passed through with value '$CPU_COUNT'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'TORCH_CUDA_ARCH_LIST' is being passed through with value '"8.6+PTX"'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
warnings.warn(
Packaging flash-attn-2.5.8-py311h379968c_0
compiling .pyc files...
number of files: 104
Warning: rpath /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/lib is outside prefix /home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_ (removing it)
 INFO: sysroot: '/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/x86_64-conda-linux-gnu/sysroot/' files: '['/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/zone1970.tab', '/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/zone.tab', '/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/tzdata.zi', '/home/conda/staged-recipes/build_artifacts/flash-attn_1714961048474/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/right/Zulu']'
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libc10.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libtorch_cpu.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
ERROR (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): $RPATH/libtorch_python.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
ERROR (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libcudart.so.12 found in ['conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8']
ERROR (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): .. but ['conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8'] not in reqs/run, (i.e. it is overlinking) (likely) or a missing dependency (less likely)
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libc10_cuda.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libtorch_cuda.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libstdc++.so.6 found in conda-forge/linux-64::libstdcxx-ng==13.2.0=hc0a3c3a_6
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libgcc_s.so.1 found in conda-forge/linux-64::libgcc-ng==13.2.0=h77fa898_6
 INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO x86_64-conda-linux-gnu/sysroot/lib64/libc.so.6 found in CDT/compiler package conda-forge/noarch::sysroot_linux-64==2.17=h4a8ded7_14
WARNING (flash-attn): dso library package conda-forge/linux-64::libcublas==12.0.1.189=hd3aeb46_3 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): run-exports library package conda-forge/linux-64::pytorch==2.1.2=cuda120_py311h25b6552_303 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): dso library package conda-forge/linux-64::libcusolver==11.4.2.57=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): interpreter (Python) package conda-forge/linux-64::python==3.11.9=hb806964_0_cpython in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): dso library package conda-forge/linux-64::libcusparse==12.0.0.76=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
Traceback (most recent call last):
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 261, in <module>
  build_all(os.path.join(root_dir, "recipes"), args.arch)
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 151, in build_all
  build_folders(recipes_dir, folders, arch, channel_urls)
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 207, in build_folders
  conda_build.api.build([recipe], config=get_config(arch, channel_urls))
File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
  return build_tree(
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3762, in build_tree
  packages_from_this = build(
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2839, in build
  newly_built_packages = bundlers[pkg_type](output_d, m, env, stats)
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 1974, in bundle_conda
  files = post_process_files(metadata, initial_files)
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 1782, in post_process_files
  post_build(m, new_files, build_python=python)
File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 1729, in post_build
  check_overlinking(m, files, host_prefix)
File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 1554, in check_overlinking
  return check_overlinking_impl(
File "/opt/conda/lib/python3.10/site-packages/conda_build/post.py", line 1531, in check_overlinking_impl
  raise OverLinkingError(overlinking_errors)
conda_build.exceptions.OverLinkingError: overlinking check failed 
['  ERROR (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): $RPATH/libtorch_python.so not found in packages, sysroot(s) nor the missing_dso_whitelist.\n.. is this binary repackaging?', "  ERROR (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): .. but ['conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8'] not in reqs/run, (i.e. it is overlinking) (likely) or a missing dependency (less likely)"]
Traceback (most recent call last):
File "/home/user/projects/staged-recipes/build-locally.py", line 101, in <module>
  main()
File "/home/user/projects/staged-recipes/build-locally.py", line 95, in main
  run_docker_build(ns)
File "/home/user/projects/staged-recipes/build-locally.py", line 33, in run_docker_build
  subprocess.check_call([script])
File "/home/user/mambaforge/envs/condalock/lib/python3.11/subprocess.py", line 413, in check_call
  raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['.scripts/run_docker_build.sh']' returned non-zero exit status 1.

Looks like we might need to add a build/ignore_run_exports section to handle the conda_build.exceptions.OverLinkingError: overlinking check failed error?

Comment on lines +39 to +41
- libcublas-dev # [(cuda_compiler_version or "").startswith("12")]
- libcusolver-dev # [(cuda_compiler_version or "").startswith("12")]
- libcusparse-dev # [(cuda_compiler_version or "").startswith("12")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these deps listed if conda-build cannot detect any links to these libraries? It also doesn't look like the upstream library has any linking flags in the setup script CUDAExtension module. If they are needed, then the recipe needs a patch to switch from static to dynamic linking... but these packages don't contain the static libraries, so I'm not sure how static linking could be happening.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these because I was getting errors like the following:

[1/49] /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/nvcc  -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/x86_64-conda-linux-gnu-cc
FAILED: /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/nvcc  -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -ccbin /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/x86_64-conda-linux-gnu-cc
nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
In file included from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_launch_template.h:7,
                 from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu:5:
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
    6 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.
In file included from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_launch_template.h:7,
                 from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu:5:
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
    6 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.
In file included from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_launch_template.h:7,
                 from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu:5:
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
    6 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.
[2/49] /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/x86_64-conda-linux-gnu-c++ -MMD -MF /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o.d -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fPIC -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work=/usr/local/src/conda/flash-attn-2.5.8 -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_=/usr/local/src/conda-prefix -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib/stubs -fPIC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/flash_api.cpp -o /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
FAILED: /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/bin/x86_64-conda-linux-gnu-c++ -MMD -MF /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o.d -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fPIC -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work=/usr/local/src/conda/flash-attn-2.5.8 -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_=/usr/local/src/conda-prefix -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/targets/x86_64-linux/lib/stubs -fPIC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/flash_api.cpp -o /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
In file included from /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/csrc/flash_attn/flash_api.cpp:8:
/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
    6 | #include <cusparse.h>
      |          ^~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '2']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "<pip-setuptools-caller>", line 34, in <module>
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/setup.py", line 311, in <module>
    setup(
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
    dist.run_commands()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/setup.py", line 266, in run
    return super().run()
           ^^^^^^^^^^^^^
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
    self.run_command("build")
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
    self.run_command(cmd_name)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 91, in run
    _build_ext.run(self)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
    self.build_extensions()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
    self._build_extensions_serial()
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/bin/python -u -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
#   import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
#   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
#     manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize

try:
    import setuptools
except ImportError as error:
    print(
        "ERROR: Can not execute `setup.py` since setuptools is not available in "
        "the build environment.",
        file=sys.stderr,
    )
    sys.exit(1)

__file__ = %r
sys.argv[0] = __file__

if os.path.exists(__file__):
    filename = __file__
    with tokenize.open(__file__) as f:
        setup_py_code = f.read()
else:
    filename = "<auto-generated setuptools caller>"
    setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-7hn5_c75
cwd: /home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/
Building wheel for flash_attn (setup.py): finished with status 'error'
ERROR: Failed building wheel for flash_attn
Running setup.py clean for flash_attn
Running command python setup.py clean
No CUDA runtime is found, using CUDA_HOME='/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_build_env'
error: pathspec 'csrc/cutlass' did not match any file(s) known to git


torch.__version__  = 2.1.2.post303


/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/setuptools/__init__.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!

        ********************************************************************************
        Requirements should be satisfied by a PEP 517 installer.
        If you are using pip, you can try `pip install --use-pep517`.
        ********************************************************************************

!!
  dist.fetch_build_eggs(dist.setup_requires)
running clean
removing 'build/temp.linux-x86_64-cpython-311' (and everything under it)
removing 'build/lib.linux-x86_64-cpython-311' (and everything under it)
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.11' does not exist -- can't clean it
removing 'build'
Failed to build flash_attn
ERROR: Could not build wheels for flash_attn, which is required to install pyproject.toml-based projects
Exception information:
Traceback (most recent call last):
File "$PREFIX/lib/python3.11/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
  status = run_func(*args)
           ^^^^^^^^^^^^^^^
File "$PREFIX/lib/python3.11/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
  return func(self, options, args)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
File "$PREFIX/lib/python3.11/site-packages/pip/_internal/commands/install.py", line 429, in run
  raise InstallationError(
pip._internal.exceptions.InstallationError: Could not build wheels for flash_attn, which is required to install pyproject.toml-based projects
Removed build tracker: '/tmp/pip-build-tracker-vjr8_wy1'
Traceback (most recent call last):
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 261, in <module>
  build_all(os.path.join(root_dir, "recipes"), args.arch)
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 151, in build_all
  build_folders(recipes_dir, folders, arch, channel_urls)
File "/home/conda/staged-recipes-copy/.ci_support/build_all.py", line 207, in build_folders
  conda_build.api.build([recipe], config=get_config(arch, channel_urls))
File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
  return build_tree(
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3762, in build_tree
  packages_from_this = build(
File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2634, in build
  utils.check_call_env(
File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 408, in check_call_env
  return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 384, in _func_defaulting_env_to_os_environ
  raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/staged-recipes/build_artifacts/flash-attn_1715034607333/work/conda_build.sh']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/home/weiji/Documents/github/staged-recipes/build-locally.py", line 101, in <module>
  main()
File "/home/weiji/Documents/github/staged-recipes/build-locally.py", line 95, in main
  run_docker_build(ns)
File "/home/weiji/Documents/github/staged-recipes/build-locally.py", line 33, in run_docker_build
  subprocess.check_call([script])
File "/home/weiji/mambaforge/envs/condalock/lib/python3.11/subprocess.py", line 413, in check_call
  raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['.scripts/run_docker_build.sh']' returned non-zero exit status 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yep, we're getting that same #include <cusparse.h> ... compilation terminated error after removing libcusparse-dev from host deps at https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=928515&view=logs&j=67448ffb-e003-5bfa-c062-cee3af60fcba&t=818ff20d-11b7-59db-6ce1-bb4df921454a&l=1016

Copy link
Member

@carterbox carterbox May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! It's because Pytorch's ATen includes the cusparse header in its headers.

OK. Then we need to list these deps in both requirments/host and build/ignore_run_exports_from. Because we are not linking to these libs, but we need to know about them for the ATen API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added build/ignore_run_exports_from in 37e676a. Assuming that only libcublas-dev, libcusolver-dev, and libcusparse-dev needs to be added, judging from the warnings at #26239 (comment).

carterbox and others added 3 commits May 6, 2024 11:14
This simpler script doesn't have unused features and doesn't set -O3 because our channel defaults are -O2
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

test:
imports:
- flash_attn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may need to be commented out because the test runners don't have a GPU, so imports might fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports seemed to have worked at https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=928722&view=logs&j=4f860608-e5f8-5c9c-4eb0-308a99ecb61e&t=02ef1a5c-d960-5c54-fcea-983775f057bb&l=1352

done
export PREFIX=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047997366/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
export SRC_DIR=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047997366/test_tmp
import: 'flash_attn'
import: 'flash_attn'
+ pip check
No broken requirements found.
+ exit 0

recipes/flash-attn/meta.yaml Outdated Show resolved Hide resolved
recipes/flash-attn/meta.yaml Show resolved Hide resolved
Silence warnings like:

```
WARNING (flash-attn): dso library package conda-forge/linux-64::libcublas==12.0.1.189=hd3aeb46_3 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): dso library package conda-forge/linux-64::libcusparse==12.0.0.76=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
WARNING (flash-attn): dso library package conda-forge/linux-64::libcusolver==11.4.2.57=hd3aeb46_2 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
```
Trying to reduce CPU load on Azure CI to debug build.
Comment on lines 18 to 23
script_env:
# Temporarily reduce ARCHs and JOBS to debug build
# - MAX_JOBS=$CPU_COUNT
- MAX_JOBS=1
# - TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX
- TORCH_CUDA_ARCH_LIST=8.6+PTX
Copy link
Member Author

@weiji14 weiji14 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set MAX_JOBS=1 again as in ef03f90. Looks like CI can run up to 6 hours now (without crashing due to out of memory), though that's still not enough to finish compiling 😅 See e.g. https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=928208&view=logs&jobId=67448ffb-e003-5bfa-c062-cee3af60fcba&j=4f860608-e5f8-5c9c-4eb0-308a99ecb61e&t=02ef1a5c-d960-5c54-fcea-983775f057bb where the CUDA 11.8 build got as far as 30/49.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oo, it looks the compilation finally finished, at least a single Python version (Python 3.11). The CI check shows that the job was cancelled after 6 hours, but it's because it continued to try to build for another Python version (Python 3.10) for some reason (maybe because we're not using noarch).

Logs from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=928722&view=logs&j=67448ffb-e003-5bfa-c062-cee3af60fcba&t=818ff20d-11b7-59db-6ce1-bb4df921454a&l=1226 showing successful build

[49/49] /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/bin/nvcc  -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/csrc/flash_attn -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/csrc/flash_attn/src -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/csrc/cutlass/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/TH -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/include/THC -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include/python3.11 -c -c /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu -o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -ccbin /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/bin/x86_64-conda-linux-gnu-cc
2024-05-07T07:10:57.5843417Z   nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2024-05-07T07:10:57.7918559Z   /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -Wl,-rpath-link,/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib/stubs -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work=/usr/local/src/conda/flash-attn-2.5.8 -fdebug-prefix-map=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_=/usr/local/src/conda-prefix -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/include -I/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/include -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/targets/x86_64-linux/lib/stubs -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/targets/x86_64-linux/lib/stubs /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim96_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim224_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim224_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.o /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.o -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.11/site-packages/torch/lib -L/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  copying build/lib.linux-x86_64-cpython-311/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/flash_attn
  creating build/bdist.linux-x86_64/wheel/flash_attn/ops
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/activations.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/rms_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/fused_dense.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/layer_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops
  creating build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/cross_entropy.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/rotary.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/mlp.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/k_activations.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/layer_norm.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  copying build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton/linear.py -> build/bdist.linux-x86_64/wheel/flash_attn/ops/triton
  creating build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/mha.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/mlp.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/embedding.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/modules/block.py -> build/bdist.linux-x86_64/wheel/flash_attn/modules
  copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_blocksparse_attention.py -> build/bdist.linux-x86_64/wheel/flash_attn
  copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_interface.py -> build/bdist.linux-x86_64/wheel/flash_attn
  copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_triton_og.py -> build/bdist.linux-x86_64/wheel/flash_attn
  copying build/lib.linux-x86_64-cpython-311/flash_attn/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn
  creating build/bdist.linux-x86_64/wheel/flash_attn/utils
  copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/distributed.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
  copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/benchmark.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
  copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/pretrained.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
  copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
  copying build/lib.linux-x86_64-cpython-311/flash_attn/utils/generation.py -> build/bdist.linux-x86_64/wheel/flash_attn/utils
  creating build/bdist.linux-x86_64/wheel/flash_attn/layers
  copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/patch_embed.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
  copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/rotary.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
  copying build/lib.linux-x86_64-cpython-311/flash_attn/layers/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/layers
  copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_blocksparse_attn_interface.py -> build/bdist.linux-x86_64/wheel/flash_attn
  creating build/bdist.linux-x86_64/wheel/flash_attn/losses
  copying build/lib.linux-x86_64-cpython-311/flash_attn/losses/cross_entropy.py -> build/bdist.linux-x86_64/wheel/flash_attn/losses
  copying build/lib.linux-x86_64-cpython-311/flash_attn/losses/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/losses
  copying build/lib.linux-x86_64-cpython-311/flash_attn/flash_attn_triton.py -> build/bdist.linux-x86_64/wheel/flash_attn
  copying build/lib.linux-x86_64-cpython-311/flash_attn/fused_softmax.py -> build/bdist.linux-x86_64/wheel/flash_attn
  copying build/lib.linux-x86_64-cpython-311/flash_attn/bert_padding.py -> build/bdist.linux-x86_64/wheel/flash_attn
  creating build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/falcon.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/btlm.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/vit.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/__init__.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/bert.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/baichuan.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/bigcode.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/opt.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gpt_neox.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gpt.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/llama.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  copying build/lib.linux-x86_64-cpython-311/flash_attn/models/gptj.py -> build/bdist.linux-x86_64/wheel/flash_attn/models
  running install_egg_info
  Copying flash_attn.egg-info to build/bdist.linux-x86_64/wheel/flash_attn-2.5.8-py3.11.egg-info
  running install_scripts
  creating build/bdist.linux-x86_64/wheel/flash_attn-2.5.8.dist-info/WHEEL
  creating '/tmp/pip-wheel-7_xhpf61/.tmp-4qdnj8g3/flash_attn-2.5.8-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
  adding 'flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so'
  adding 'flash_attn/__init__.py'
  adding 'flash_attn/bert_padding.py'
  adding 'flash_attn/flash_attn_interface.py'
  adding 'flash_attn/flash_attn_triton.py'
  adding 'flash_attn/flash_attn_triton_og.py'
  adding 'flash_attn/flash_blocksparse_attention.py'
  adding 'flash_attn/flash_blocksparse_attn_interface.py'
  adding 'flash_attn/fused_softmax.py'
  adding 'flash_attn/layers/__init__.py'
  adding 'flash_attn/layers/patch_embed.py'
  adding 'flash_attn/layers/rotary.py'
  adding 'flash_attn/losses/__init__.py'
  adding 'flash_attn/losses/cross_entropy.py'
  adding 'flash_attn/models/__init__.py'
  adding 'flash_attn/models/baichuan.py'
  adding 'flash_attn/models/bert.py'
  adding 'flash_attn/models/bigcode.py'
  adding 'flash_attn/models/btlm.py'
  adding 'flash_attn/models/falcon.py'
  adding 'flash_attn/models/gpt.py'
  adding 'flash_attn/models/gpt_neox.py'
  adding 'flash_attn/models/gptj.py'
  adding 'flash_attn/models/llama.py'
  adding 'flash_attn/models/opt.py'
  adding 'flash_attn/models/vit.py'
  adding 'flash_attn/modules/__init__.py'
  adding 'flash_attn/modules/block.py'
  adding 'flash_attn/modules/embedding.py'
  adding 'flash_attn/modules/mha.py'
  adding 'flash_attn/modules/mlp.py'
  adding 'flash_attn/ops/__init__.py'
  adding 'flash_attn/ops/activations.py'
  adding 'flash_attn/ops/fused_dense.py'
  adding 'flash_attn/ops/layer_norm.py'
  adding 'flash_attn/ops/rms_norm.py'
  adding 'flash_attn/ops/triton/__init__.py'
  adding 'flash_attn/ops/triton/cross_entropy.py'
  adding 'flash_attn/ops/triton/k_activations.py'
  adding 'flash_attn/ops/triton/layer_norm.py'
  adding 'flash_attn/ops/triton/linear.py'
  adding 'flash_attn/ops/triton/mlp.py'
  adding 'flash_attn/ops/triton/rotary.py'
  adding 'flash_attn/utils/__init__.py'
  adding 'flash_attn/utils/benchmark.py'
  adding 'flash_attn/utils/distributed.py'
  adding 'flash_attn/utils/generation.py'
  adding 'flash_attn/utils/pretrained.py'
  adding 'flash_attn-2.5.8.dist-info/AUTHORS'
  adding 'flash_attn-2.5.8.dist-info/LICENSE'
  adding 'flash_attn-2.5.8.dist-info/METADATA'
  adding 'flash_attn-2.5.8.dist-info/WHEEL'
  adding 'flash_attn-2.5.8.dist-info/top_level.txt'
  adding 'flash_attn-2.5.8.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
  Building wheel for flash_attn (pyproject.toml): finished with status 'done'
  Created wheel for flash_attn: filename=flash_attn-2.5.8-cp311-cp311-linux_x86_64.whl size=161615679 sha256=019cabb84a0f37b55ff08e14959ee34184e658cbc5d6edc3622e44779820b595
  Stored in directory: /tmp/pip-ephem-wheel-cache-g138790y/wheels/d8/91/a6/6160216e602ad906106939541f06f84e2dbc50fbc04b44036d
Successfully built flash_attn
Installing collected packages: flash_attn

Successfully installed flash_attn-2.5.8
Removed build tracker: '/tmp/pip-build-tracker-r7fvrno4'

Resource usage statistics from building flash-attn:
   Process count: 8
   CPU time: Sys=0:03:26.0, User=4:47:19.4
   Memory: 5.6G
   Disk usage: 1.1M
   Time elapsed: 4:56:11.4


Packaging flash-attn
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'MAX_JOBS' is being passed through with value '1'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'TORCH_CUDA_ARCH_LIST' is being passed through with value '8.6+PTX'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.
  warnings.warn(
Packaging flash-attn-2.5.8-py311h379968c_0
compiling .pyc files...
number of files: 104
Warning: rpath /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/lib is outside prefix /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_ (removing it)
   INFO: sysroot: '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/x86_64-conda-linux-gnu/sysroot/' files: '['/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/zone1970.tab', '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/zone.tab', '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/tzdata.zi', '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_build_env/x86_64-conda-linux-gnu/sysroot/usr/share/zoneinfo/right/Zulu']'
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libc10.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libtorch_cpu.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/python3.11/site-packages/torch/lib/libtorch_python.so found in conda-forge/linux-64::pytorch==2.1.2=cuda120_py311h25b6552_303
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libcudart.so.12 found in conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libc10_cuda.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libtorch_cuda.so found in conda-forge/linux-64::libtorch==2.1.2=cuda120_h2aa5df7_303
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libstdc++.so.6 found in conda-forge/linux-64::libstdcxx-ng==13.2.0=hc0a3c3a_7
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO lib/libgcc_s.so.1 found in conda-forge/linux-64::libgcc-ng==13.2.0=h77fa898_7
   INFO (flash-attn,lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so): Needed DSO x86_64-conda-linux-gnu/sysroot/lib64/libc.so.6 found in CDT/compiler package conda-forge/noarch::sysroot_linux-64==2.17=h4a8ded7_14
WARNING (flash-attn): interpreter (Python) package conda-forge/linux-64::python==3.11.9=hb806964_0_cpython in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)
Fixing permissions
Packaged license file/s.
INFO :: Time taken to mark (prefix)
        0 replacements in 0 files was 1.18 seconds
Files containing CONDA_PREFIX
-----------------------------
lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so (binary): Patching
WARNING: Importing conda-verify failed.  Please be sure to test your packages.  conda install conda-verify to make this message go away.
TEST START: /home/conda/staged-recipes/build_artifacts/linux-64/flash-attn-2.5.8-py311h379968c_0.conda
Renaming work directory '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work' to '/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work_moved_flash-attn-2.5.8-py311h379968c_0_linux-64'
shutil.move(work)=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work, dest=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/work_moved_flash-attn-2.5.8-py311h379968c_0_linux-64)
Reloading output folder (local): ...working... done
Solving environment (_test_env): ...working... done

## Package Plan ##

  environment location: /home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place


The following NEW packages will be INSTALLED:

    _libgcc_mutex:        0.1-conda_forge                 conda-forge
    _openmp_mutex:        4.5-2_kmp_llvm                  conda-forge
    bzip2:                1.0.8-hd590300_5                conda-forge
    ca-certificates:      2024.2.2-hbcca054_0             conda-forge
    cuda-cudart:          12.4.127-hd3aeb46_0             conda-forge
    cuda-cudart_linux-64: 12.4.127-h59595ed_0             conda-forge
    cuda-nvrtc:           12.4.127-hd3aeb46_1             conda-forge
    cuda-nvtx:            12.4.127-h59595ed_1             conda-forge
    cuda-version:         12.4-h3060b56_3                 conda-forge
    cudnn:                8.9.7.29-h092f7fd_3             conda-forge
    einops:               0.8.0-pyhd8ed1ab_0              conda-forge
    filelock:             3.14.0-pyhd8ed1ab_0             conda-forge
    flash-attn:           2.5.8-py311h379968c_0           local      
    fsspec:               2024.3.1-pyhca7485f_0           conda-forge
    gmp:                  6.3.0-h59595ed_1                conda-forge
    gmpy2:                2.1.5-py311he48d604_0           conda-forge
    icu:                  73.2-h59595ed_0                 conda-forge
    jinja2:               3.1.3-pyhd8ed1ab_0              conda-forge
    ld_impl_linux-64:     2.40-h55db66e_0                 conda-forge
    libabseil:            20230802.1-cxx17_h59595ed_0     conda-forge
    libblas:              3.9.0-22_linux64_openblas       conda-forge
    libcblas:             3.9.0-22_linux64_openblas       conda-forge
    libcublas:            12.4.5.8-hd3aeb46_1             conda-forge
    libcufft:             11.2.1.3-hd3aeb46_1             conda-forge
    libcurand:            10.3.5.147-hd3aeb46_1           conda-forge
    libcusolver:          11.6.1.9-hd3aeb46_1             conda-forge
    libcusparse:          12.3.1.170-hd3aeb46_1           conda-forge
    libexpat:             2.6.2-h59595ed_0                conda-forge
    libffi:               3.4.2-h7f98852_5                conda-forge
    libgcc-ng:            13.2.0-h77fa898_7               conda-forge
    libgfortran-ng:       13.2.0-h69a702a_7               conda-forge
    libgfortran5:         13.2.0-hca663fb_7               conda-forge
    libhwloc:             2.10.0-default_h2fb2949_1000    conda-forge
    libiconv:             1.17-hd590300_2                 conda-forge
    liblapack:            3.9.0-22_linux64_openblas       conda-forge
    libmagma:             2.7.2-h173bb3b_2                conda-forge
    libmagma_sparse:      2.7.2-h173bb3b_3                conda-forge
    libnsl:               2.0.1-hd590300_0                conda-forge
    libnvjitlink:         12.4.127-hd3aeb46_1             conda-forge
    libopenblas:          0.3.27-pthreads_h413a1c8_0      conda-forge
    libprotobuf:          4.25.1-hf27288f_2               conda-forge
    libsqlite:            3.45.3-h2797004_0               conda-forge
    libstdcxx-ng:         13.2.0-hc0a3c3a_7               conda-forge
    libtorch:             2.1.2-cuda120_h2aa5df7_303      conda-forge
    libuuid:              2.38.1-h0b41bf4_0               conda-forge
    libuv:                1.48.0-hd590300_0               conda-forge
    libxcrypt:            4.4.36-hd590300_1               conda-forge
    libxml2:              2.12.6-h232c23b_2               conda-forge
    libzlib:              1.2.13-hd590300_5               conda-forge
    llvm-openmp:          18.1.5-ha31de31_0               conda-forge
    magma:                2.7.2-h51420fd_3                conda-forge
    markupsafe:           2.1.5-py311h459d7ec_0           conda-forge
    mkl:                  2023.2.0-h84fe81f_50496         conda-forge
    mpc:                  1.3.1-hfe3b2da_0                conda-forge
    mpfr:                 4.2.1-h9458935_1                conda-forge
    mpmath:               1.3.0-pyhd8ed1ab_0              conda-forge
    nccl:                 2.21.5.1-h3a97aeb_0             conda-forge
    ncurses:              6.4.20240210-h59595ed_0         conda-forge
    networkx:             3.3-pyhd8ed1ab_1                conda-forge
    numpy:                1.26.4-py311h64a7726_0          conda-forge
    openssl:              3.3.0-hd590300_0                conda-forge
    pip:                  24.0-pyhd8ed1ab_0               conda-forge
    python:               3.11.9-hb806964_0_cpython       conda-forge
    python_abi:           3.11-4_cp311                    conda-forge
    pytorch:              2.1.2-cuda120_py311h25b6552_303 conda-forge
    readline:             8.2-h8228510_1                  conda-forge
    setuptools:           69.5.1-pyhd8ed1ab_0             conda-forge
    sleef:                3.5.1-h9b69904_2                conda-forge
    sympy:                1.12-pypyh9d50eac_103           conda-forge
    tbb:                  2021.12.0-h00ab1b0_0            conda-forge
    tk:                   8.6.13-noxft_h4845f30_101       conda-forge
    typing_extensions:    4.11.0-pyha770c72_0             conda-forge
    tzdata:               2024a-h0c530f3_0                conda-forge
    wheel:                0.43.0-pyhd8ed1ab_1             conda-forge
    xz:                   5.2.6-h166bdaf_0                conda-forge
    zstd:                 1.5.6-ha6fb4c9_0                conda-forge

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -
  https://docs.nvidia.com/deeplearning/cudnn/sla/index.html

done
export PREFIX=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
export SRC_DIR=/home/conda/staged-recipes/build_artifacts/flash-attn_1715047963337/test_tmp
import: 'flash_attn'
import: 'flash_attn'
+ pip check
No broken requirements found.
+ exit 0

Resource usage statistics from testing flash-attn:
   Process count: 4
   CPU time: Sys=0:00:00.3, User=0:00:01.1
   Memory: 337.3M
   Disk usage: 24B
   Time elapsed: 0:00:06.0


TEST END: /home/conda/staged-recipes/build_artifacts/linux-64/flash-attn-2.5.8-py311h379968c_0.conda

@carterbox, did you want to keep using the simplified setup.py and pyproject.toml file, or try reverting back to upstream one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's because it continued to try to build for another Python version (Python 3.10) for some reason (maybe because we're not using noarch).

In staged recipes, there is only one runner per platform, every python variant is built on the same runner.

@carterbox, did you want to keep using the simplified setup.py and pyproject.toml file, or try reverting back to upstream one?

I want to keep the simplified scripts for now. The only drawback is we have to manually update the source files list, some compile args, and dependencies. However, I think that is better than working around all of special operations upstream has added to their build script which are not compatible with our build environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14, once the feedstock is allocated, please do some experiments to see how much time it takes to run with MAX_JOBS=$CPU_COUNT and how many CUDA archs can be added to the arch list. We want to build for as many of 8.0;8.6;8.9;9.0+PTX as we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much @carterbox! The initial feedstock commit actually failed due to running out of disk space 😅 But I've opened a PR now at conda-forge/flash-attn-feedstock#1, so we can continue discussion there.

@carterbox carterbox marked this pull request as ready for review May 7, 2024 21:16
@carterbox carterbox merged commit 16d1974 into conda-forge:main May 7, 2024
1 of 5 checks passed
@weiji14 weiji14 deleted the add-flash-attn branch May 7, 2024 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants