Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-quants : fix build on armv7 (android only) #2920

Merged
merged 5 commits into from
Sep 2, 2023

Conversation

jhen0409
Copy link
Collaborator

@jhen0409 jhen0409 commented Aug 31, 2023

Fix #2075. (No, see #2920 (comment))

Add the missing methods if not defined __aarch64__. Except vpaddq_s16, others are from ggml.c.

Tested the build on Android armeabi-v7a.

I don't have a device and a small qk model to test at the moment, so I only tested the functions by remove __aarch64__ def + rename functions on a M2 Mac.

@jhen0409 jhen0409 requested a review from ikawrakow August 31, 2023 01:57
@cebtenzzre
Copy link
Collaborator

Ugh, code duplication is annoying. I would like if we made a header file to put all of our shared inline functions into, so we don't have to copy-paste them around.

I have a few devices I might be able to get an armv7 toolchain to run on.

@jhen0409
Copy link
Collaborator Author

Ugh, code duplication is annoying. I would like if we made a header file to put all of our shared inline functions into, so we don't have to copy-paste them around.

I also found some unused functions we can cleanup in ggml.c, then we only have duplicated vaddvq_s32 for now. Maybe it's not worth enough to have a header file for now.

I have a few devices I might be able to get an armv7 toolchain to run on.

Thanks!

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 1, 2023

I get this error compiling on my BeagleBone White, running Arch Linux ARM:

In file included from llama.h:4,                                                                                               
                 from llama.cpp:6:                                                                                             
ggml.h:274:13: error: ‘__fp16’ does not name a type; did you mean ‘__bf16’?                                                    
  274 |     typedef __fp16 ggml_fp16_t;                                                                                        
      |             ^~~~~~                                                                                                     
      |             __bf16

Compiler is gcc (GCC) 12.1.0 targeting armv7l-unknown-linux-gnueabihf.
-march=native -mtune=native resolves to -mfloat-abi=hard -mfpu=neon -mtls-dialect=gnu -mtune=cortex-a8 -marm -mlibarch=armv7-a+sec+simd -march=armv7-a+sec+simd.

And besides that (which can be worked around by replacing the type with uint16_t), I get these:

k_quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:                                                                              
k_quants.c:1350:36: warning: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Wimplicit-function-de
claration]                                                                                                                     
 1350 |         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);                                                           
      |                                    ^~~~~~~~~~~~                                                                        
      |                                    vld1q_s16                                                                           
k_quants.c:1350:36: error: invalid initializer

For each of vld1q_s16_x2, vld1q_u8_x2, vld1q_s8_x2, vld1q_s8_x4, and vld1q_u8_x4.
Also a few unused variables: vzero on k_quants.c lines 1329 and 3683 and mzero on line 3119.

@jhen0409
Copy link
Collaborator Author

jhen0409 commented Sep 1, 2023

For each of vld1q_s16_x2, vld1q_u8_x2, vld1q_s8_x2, vld1q_s8_x4, and vld1q_u8_x4. Also a few unused variables: vzero on k_quants.c lines 1329 and 3683 and mzero on line 3119.

Looks like these methods are not in list of arm/arm_neon.h in gcc, but Arm mentioned these support v7. It might be an issue related to toolchain.

I think this PR is not fully fixed for #2075, and can confirm it fixed the build of Android armeabi-v7a.

@jhen0409 jhen0409 changed the title k-quants : fix build on armv7 k-quants : fix build on armv7 (android only) Sep 1, 2023
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 1, 2023

You say "Android only", but the original issue reporter is running Raspbian, a Debian derivative.

I'll try clang as well, and if that doesn't work I have an armv7 Moto X 2014 that's a few years newer.

edit: I was able to build successfully with clang. I still get the vzero/mzero warnings, along with a place where the allocator assumes pointers and size_t are 64 bits wide (they are 32 bits on this platform):

ggml-alloc.c:287:45: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned int') changes value from 1099511627776 to 0 [-Wconstant-conversion]
static const size_t MEASURE_MAX_SIZE  = 1ULL<<40; // 1 TB
                    ~~~~~~~~~~~~~~~~    ~~~~^~~~

main fails with this error at runtime:

ggml_allocr_alloc: not enough space in the buffer (needed 2048, largest block available 0)
GGML_ASSERT: ggml-alloc.c:144: !"not enough space in the buffer"

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 1, 2023

@slaren 32-bit platforms are currently broken - just try replacing -march=native -mtune=native with -m32 in the Makefile, and running the 'main' example to generate some text. Make sure you load a model smaller than 2GB so you don't get EOVERFLOW, as off_t is 32 bits on Linux. I'm testing with a Q2_K quant of OpenLLaMA-3B.

off_t being too small isn't an issue in practice because the program can only address 4 GB of memory anyway.

@slaren
Copy link
Collaborator

slaren commented Sep 2, 2023

@cebtenzzre as you probably noticed, ggml-alloc assumes a 64-bit address space. Is that the only issue currently preventing 32-bit builds?

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 2, 2023

Is that the only issue currently preventing 32-bit builds?

Apparently not, unless I need to try a smaller model. I get this with commit 11f3ca0:

error loading model: overflow multiplying 172800 * 32000

@cebtenzzre
Copy link
Collaborator

@slaren after figuring out how LLAMA_QKK_64 works, and replacing size_t with uint64_t in llama_calc_tensor_size, I get reasonable output from a 32-bit build on x86 with commit 11f3ca0, which is right before ggml-alloc was introduced.

@jhen0409
Copy link
Collaborator Author

jhen0409 commented Sep 2, 2023

You say "Android only", but the original issue reporter is running Raspbian, a Debian derivative.

Yeah I've update the description. I was found the issue on Android early, and found a related issue.

I'll try clang as well, and if that doesn't work I have an armv7 Moto X 2014 that's a few years newer.

edit: I was able to build successfully with clang. I still get the vzero/mzero warnings, along with a place where the allocator assumes pointers and size_t are 64 bits wide (they are 32 bits on this platform):

ggml-alloc.c:287:45: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned int') changes value from 1099511627776 to 0 [-Wconstant-conversion]
static const size_t MEASURE_MAX_SIZE  = 1ULL<<40; // 1 TB
                    ~~~~~~~~~~~~~~~~    ~~~~^~~~

main fails with this error at runtime:

ggml_allocr_alloc: not enough space in the buffer (needed 2048, largest block available 0)
GGML_ASSERT: ggml-alloc.c:144: !"not enough space in the buffer"

Thanks a lot! I also got a RPi 3B (1gb ram) with uname -m: armv7l, and use -mcpu=cortex-a53 with clang, but it not able run the open-llama-3b q2_k even increased swap.

That works with a 110m llama2c model in q8_0, unfortunately I haven't been able to convert stories110m to a qk model even with QKK_64.

Also, use (1ULL<<32) - 1 for MEASURE_MAX_SIZE is works to me, so I think we probably can use it for armv7. I'll leave 32-bit x86 as a TODO for now.

@ggerganov ggerganov merged commit 21f3d1b into ggerganov:master Sep 2, 2023
Comment on lines 284 to +291
// address and size of the buffer when measuring
// it needs to be large enough to fit all the tensors, but it cannot overlap with other existing buffers
static void * const MEASURE_BASE_ADDR = (void *) 0x1000;
#if defined(__ARM_NEON) && !defined(__aarch64__)
// 32-bit
// TODO: Use for 32-bit x86 as well
static const size_t MEASURE_MAX_SIZE = (1ULL<<32) - 1; // 4 GB
#else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is going to work for the reasons explained in the comment. ggml-alloc uses the address and size of the buffer to determine if a tensor was allocated by it, and that also applies when measuring. If you set the measure buffer to the entire address space, it will try to free and reuse the memory of external tensors such as the weights and the KV cache. I will work on a better solution.

@itsPreto
Copy link

@jhen0409 could you please tell me what the appropriate steps are to build for armeabi-v7a?

> mkdir build-android
> cd build-android
> export NDK=<your_ndk_directory>
> cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
> make

I tried this approach here (#4421) but it gave me build errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k-quant causes builds to fail on ARM32
6 participants