Skip to content

Flash Attention CPU-only build fails in cross-compilation for Android, succeeds with Vulkan backend #463

Open
@rmatif

Description

@rmatif

When cross-compiling for Android using NDK toolchain, Flash Attention fails to build in CPU-only mode but succeeds when Vulkan backend is enabled, despite being documented as CPU-only feature.

Environment:

- Android NDK: 28.0.12433566
- Target: arm64-v8a (Android 28)
- Build system: CMake with Ninja
- Host OS: Windows

Build command that fails:

cmake .. -G "Ninja" -DCMAKE_TOOLCHAIN_FILE=D:\Android_Studio_SDK\ndk\28.0.12433566\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-28 -DCMAKE_MAKE_PROGRAM=D:\Android_Studio_SDK\cmake\3.6.4111459\bin\ninja.exe -DSD_BUILD_SHARED_LIBS=ON -DSD_FLASH_ATTN=ON
Error:

D:/Building_test/stable-diffusion.cpp/ggml_extend.hpp:679:31: error: use of undeclared identifier 'ggml_flash_attn'; did you mean 'ggml_hash_set'? 679 | struct ggml_tensor* kqv = ggml_flash_attn(ctx, q, k, v, false); | ^

Same build command succeeds when adding `-DSD_VULKAN=ON

Expected behavior: Flash Attention should build successfully in CPU-only mode since it's documented as a CPU-only feature.

Actual behavior: Flash Attention only builds when Vulkan backend is enabled, suggesting the implementation may be incorrectly tied to GPU backend definitions.

EDIT : Nevermind, I just came accross this PR #386 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions