-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k-quants : fix build on armv7 (android only) #2920
Conversation
Ugh, code duplication is annoying. I would like if we made a header file to put all of our shared inline functions into, so we don't have to copy-paste them around. I have a few devices I might be able to get an armv7 toolchain to run on. |
I also found some unused functions we can cleanup in ggml.c, then we only have duplicated
Thanks! |
I get this error compiling on my BeagleBone White, running Arch Linux ARM:
Compiler is And besides that (which can be worked around by replacing the type with uint16_t), I get these:
For each of vld1q_s16_x2, vld1q_u8_x2, vld1q_s8_x2, vld1q_s8_x4, and vld1q_u8_x4. |
Looks like these methods are not in list of arm/arm_neon.h in gcc, but Arm mentioned these support v7. It might be an issue related to toolchain. I think this PR is not fully fixed for #2075, and can confirm it fixed the build of Android armeabi-v7a. |
You say "Android only", but the original issue reporter is running Raspbian, a Debian derivative. I'll try clang as well, and if that doesn't work I have an armv7 Moto X 2014 that's a few years newer. edit: I was able to build successfully with clang. I still get the vzero/mzero warnings, along with a place where the allocator assumes pointers and size_t are 64 bits wide (they are 32 bits on this platform):
main fails with this error at runtime:
|
@slaren 32-bit platforms are currently broken - just try replacing off_t being too small isn't an issue in practice because the program can only address 4 GB of memory anyway. |
@cebtenzzre as you probably noticed, |
Apparently not, unless I need to try a smaller model. I get this with commit 11f3ca0:
|
Yeah I've update the description. I was found the issue on Android early, and found a related issue.
Thanks a lot! I also got a RPi 3B (1gb ram) with That works with a 110m llama2c model in q8_0, unfortunately I haven't been able to convert stories110m to a qk model even with QKK_64. Also, use |
// address and size of the buffer when measuring | ||
// it needs to be large enough to fit all the tensors, but it cannot overlap with other existing buffers | ||
static void * const MEASURE_BASE_ADDR = (void *) 0x1000; | ||
#if defined(__ARM_NEON) && !defined(__aarch64__) | ||
// 32-bit | ||
// TODO: Use for 32-bit x86 as well | ||
static const size_t MEASURE_MAX_SIZE = (1ULL<<32) - 1; // 4 GB | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is going to work for the reasons explained in the comment. ggml-alloc
uses the address and size of the buffer to determine if a tensor was allocated by it, and that also applies when measuring. If you set the measure buffer to the entire address space, it will try to free and reuse the memory of external tensors such as the weights and the KV cache. I will work on a better solution.
@jhen0409 could you please tell me what the appropriate steps are to build for > mkdir build-android
> cd build-android
> export NDK=<your_ndk_directory>
> cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
> make I tried this approach here (#4421) but it gave me build errors. |
Fix #2075.(No, see #2920 (comment))Add the missing methods if not defined
__aarch64__
. Exceptvpaddq_s16
, others are fromggml.c
.Tested the build on Android armeabi-v7a.
I don't have a device and a small qk model to test at the moment, so I only tested the functions by remove
__aarch64__
def + rename functions on a M2 Mac.