llamafile : tmp disable + build sgemm.o when needed #6716

ggerganov · 2024-04-17T06:26:08Z

MoE ppl is currently abnormally high indicating some issue:

# build without LLAMA_NO_LLAMAFILE or right before this commit
make -j perplexity && ./perplexity -f build/wikitext-2-raw/wiki.test.raw -m models/mixtral-8x7b-32k-fast/ggml-model-f16.gguf -ngl 0

...

perplexity: tokenizing the input ..
perplexity: tokenization took 573.843 ms
perplexity: calculating perplexity over 642 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 71.87 seconds per pass - ETA 3 hours 12.23 minutes
[1]923.5180,[2]1248.7441,[3]1137.4388,[4]1598.2650,^C

Need to fix this before re-enabling by default

ggml-ci

cebtenzzre · 2024-04-17T13:45:03Z

This error is because vaddvq_f32 is only available on armv8/aarch64, but the build targets armv7 (armv7-none-linux-androideabi33). I doubt this is specific to Android, I'm sure you would hit the same error when building for a 32-bit Raspberry Pi.

Aren't most Android devices armv8 these days, anyway?

ggerganov · 2024-04-17T14:06:34Z

Yes, I think so. On armv7 people can always disable the code manually via LLAMA_NO_LLAMAFILE. For now I'm just interested in fixing the CI and was hoping that CMAKE_SYSTEM_NAME would do the job, but unfortunately it does not

cebtenzzre · 2024-04-17T15:13:31Z

The CI is still failing because the Android build pulls from llama.cpp master: https://github.com/ggerganov/llama.cpp/blob/8dd1ec8b3ffbfa2d26e82e672cea89f5eeb2f141/examples/llama.android/app/src/main/cpp/CMakeLists.txt#L16-L20

When CMAKE_SYSTEM_NAME is Android, you can check CMAKE_ANDROID_ARCH_ABI to determine the target architecture.

ggml-ci

* build : sgemm.o only when needed ggml-ci * llamafile : tmp disable due to MoE bug ggml-ci

- Re-enable by default - Fix issue described in ggml-org#6716 - Make code more abstract, elegant, and maintainable - Faster handling of weirdly shaped `m` an `n` edge cases

* llamafile : improve sgemm.cpp - Re-enable by default - Fix issue described in #6716 - Make code more abstract, elegant, and maintainable - Faster handling of weirdly shaped `m` an `n` edge cases * Address review comments * Help clang produce fma instructions * Address review comments

build : sgemm.o only when needed

196e54f

ggml-ci

ggerganov changed the title ~~build : sgemm.o only when needed~~ llamafile : tmp disable + build sgemm.o when needed Apr 17, 2024

llamafile : tmp disable due to MoE bug

0dd7505

ggml-ci

ggerganov marked this pull request as ready for review April 17, 2024 20:58

ggerganov merged commit 3b8f1ec into master Apr 17, 2024
45 of 53 checks passed

ggerganov deleted the gg/sgemm-build branch April 17, 2024 20:58

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

llamafile : tmp disable + build sgemm.o when needed (ggml-org#6716)

093847a

* build : sgemm.o only when needed ggml-ci * llamafile : tmp disable due to MoE bug ggml-ci

jart mentioned this pull request Apr 20, 2024

llamafile : improve sgemm.cpp #6796

Merged

kinchahoy mentioned this pull request Apr 22, 2024

llama cpp python server for llava slow token per second abetlen/llama-cpp-python#1354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamafile : tmp disable + build sgemm.o when needed #6716

llamafile : tmp disable + build sgemm.o when needed #6716

ggerganov commented Apr 17, 2024 •

edited

Loading

cebtenzzre commented Apr 17, 2024

ggerganov commented Apr 17, 2024

cebtenzzre commented Apr 17, 2024

llamafile : tmp disable + build sgemm.o when needed #6716

llamafile : tmp disable + build sgemm.o when needed #6716

Conversation

ggerganov commented Apr 17, 2024 • edited Loading

cebtenzzre commented Apr 17, 2024

ggerganov commented Apr 17, 2024

cebtenzzre commented Apr 17, 2024

ggerganov commented Apr 17, 2024 •

edited

Loading