Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llamafile : tmp disable + build sgemm.o when needed #6716

Merged
merged 2 commits into from
Apr 17, 2024
Merged

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 17, 2024

MoE ppl is currently abnormally high indicating some issue:

# build without LLAMA_NO_LLAMAFILE or right before this commit
make -j perplexity && ./perplexity -f build/wikitext-2-raw/wiki.test.raw -m models/mixtral-8x7b-32k-fast/ggml-model-f16.gguf -ngl 0

...

perplexity: tokenizing the input ..
perplexity: tokenization took 573.843 ms
perplexity: calculating perplexity over 642 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 71.87 seconds per pass - ETA 3 hours 12.23 minutes
[1]923.5180,[2]1248.7441,[3]1137.4388,[4]1598.2650,^C

Need to fix this before re-enabling by default

@cebtenzzre
Copy link
Collaborator

This error is because vaddvq_f32 is only available on armv8/aarch64, but the build targets armv7 (armv7-none-linux-androideabi33). I doubt this is specific to Android, I'm sure you would hit the same error when building for a 32-bit Raspberry Pi.

Aren't most Android devices armv8 these days, anyway?

@ggerganov
Copy link
Member Author

Yes, I think so. On armv7 people can always disable the code manually via LLAMA_NO_LLAMAFILE. For now I'm just interested in fixing the CI and was hoping that CMAKE_SYSTEM_NAME would do the job, but unfortunately it does not

@cebtenzzre
Copy link
Collaborator

The CI is still failing because the Android build pulls from llama.cpp master: https://github.com/ggerganov/llama.cpp/blob/8dd1ec8b3ffbfa2d26e82e672cea89f5eeb2f141/examples/llama.android/app/src/main/cpp/CMakeLists.txt#L16-L20

When CMAKE_SYSTEM_NAME is Android, you can check CMAKE_ANDROID_ARCH_ABI to determine the target architecture.

@ggerganov ggerganov changed the title build : sgemm.o only when needed llamafile : tmp disable + build sgemm.o when needed Apr 17, 2024
@ggerganov ggerganov marked this pull request as ready for review April 17, 2024 20:58
@ggerganov ggerganov merged commit 3b8f1ec into master Apr 17, 2024
45 of 53 checks passed
@ggerganov ggerganov deleted the gg/sgemm-build branch April 17, 2024 20:58
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* build : sgemm.o only when needed

ggml-ci

* llamafile : tmp disable due to MoE bug

ggml-ci
jart added a commit to jart/llama.cpp that referenced this pull request Apr 20, 2024
- Re-enable by default
- Fix issue described in ggml-org#6716
- Make code more abstract, elegant, and maintainable
- Faster handling of weirdly shaped `m` an `n` edge cases
jart added a commit to jart/llama.cpp that referenced this pull request Apr 20, 2024
- Re-enable by default
- Fix issue described in ggml-org#6716
- Make code more abstract, elegant, and maintainable
- Faster handling of weirdly shaped `m` an `n` edge cases
ggerganov pushed a commit that referenced this pull request Apr 22, 2024
* llamafile : improve sgemm.cpp

- Re-enable by default
- Fix issue described in #6716
- Make code more abstract, elegant, and maintainable
- Faster handling of weirdly shaped `m` an `n` edge cases

* Address review comments

* Help clang produce fma instructions

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants