Description
When cross-compiling for Android using NDK toolchain, Flash Attention fails to build in CPU-only mode but succeeds when Vulkan backend is enabled, despite being documented as CPU-only feature.
Environment:
- Android NDK: 28.0.12433566
- Target: arm64-v8a (Android 28)
- Build system: CMake with Ninja
- Host OS: Windows
Build command that fails:
cmake .. -G "Ninja" -DCMAKE_TOOLCHAIN_FILE=D:\Android_Studio_SDK\ndk\28.0.12433566\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DCMAKE_MAKE_PROGRAM=D:\Android_Studio_SDK\cmake\3.6.4111459\bin\ninja.exe -DSD_BUILD_SHARED_LIBS=ON -DSD_FLASH_ATTN=ON
Error:
D:/Building_test/stable-diffusion.cpp/ggml_extend.hpp:679:31: error: use of undeclared identifier 'ggml_flash_attn'; did you mean 'ggml_hash_set'? 679 | struct ggml_tensor* kqv = ggml_flash_attn(ctx, q, k, v, false); | ^
Same build command succeeds when adding `-DSD_VULKAN=ON
Expected behavior: Flash Attention should build successfully in CPU-only mode since it's documented as a CPU-only feature.
Actual behavior: Flash Attention only builds when Vulkan backend is enabled, suggesting the implementation may be incorrectly tied to GPU backend definitions.
EDIT : Nevermind, I just came accross this PR #386 (comment)