k-quants : fix build on armv7 (android only) #2920

jhen0409 · 2023-08-31T01:57:03Z

~~Fix #2075.~~ (No, see #2920 (comment))

Add the missing methods if not defined __aarch64__. Except vpaddq_s16, others are from ggml.c.

Tested the build on Android armeabi-v7a.

I don't have a device and a small qk model to test at the moment, so I only tested the functions by remove __aarch64__ def + rename functions on a M2 Mac.

cebtenzzre · 2023-08-31T02:27:40Z

Ugh, code duplication is annoying. I would like if we made a header file to put all of our shared inline functions into, so we don't have to copy-paste them around.

I have a few devices I might be able to get an armv7 toolchain to run on.

jhen0409 · 2023-08-31T03:10:16Z

Ugh, code duplication is annoying. I would like if we made a header file to put all of our shared inline functions into, so we don't have to copy-paste them around.

I also found some unused functions we can cleanup in ggml.c, then we only have duplicated vaddvq_s32 for now. Maybe it's not worth enough to have a header file for now.

I have a few devices I might be able to get an armv7 toolchain to run on.

Thanks!

cebtenzzre · 2023-09-01T04:19:37Z

I get this error compiling on my BeagleBone White, running Arch Linux ARM:

In file included from llama.h:4,                                                                                               
                 from llama.cpp:6:                                                                                             
ggml.h:274:13: error: ‘__fp16’ does not name a type; did you mean ‘__bf16’?                                                    
  274 |     typedef __fp16 ggml_fp16_t;                                                                                        
      |             ^~~~~~                                                                                                     
      |             __bf16

Compiler is gcc (GCC) 12.1.0 targeting armv7l-unknown-linux-gnueabihf.
-march=native -mtune=native resolves to -mfloat-abi=hard -mfpu=neon -mtls-dialect=gnu -mtune=cortex-a8 -marm -mlibarch=armv7-a+sec+simd -march=armv7-a+sec+simd.

And besides that (which can be worked around by replacing the type with uint16_t), I get these:

k_quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:                                                                              
k_quants.c:1350:36: warning: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Wimplicit-function-de
claration]                                                                                                                     
 1350 |         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);                                                           
      |                                    ^~~~~~~~~~~~                                                                        
      |                                    vld1q_s16                                                                           
k_quants.c:1350:36: error: invalid initializer

For each of vld1q_s16_x2, vld1q_u8_x2, vld1q_s8_x2, vld1q_s8_x4, and vld1q_u8_x4.
Also a few unused variables: vzero on k_quants.c lines 1329 and 3683 and mzero on line 3119.

jhen0409 · 2023-09-01T08:30:35Z

For each of vld1q_s16_x2, vld1q_u8_x2, vld1q_s8_x2, vld1q_s8_x4, and vld1q_u8_x4. Also a few unused variables: vzero on k_quants.c lines 1329 and 3683 and mzero on line 3119.

Looks like these methods are not in list of arm/arm_neon.h in gcc, but Arm mentioned these support v7. It might be an issue related to toolchain.

I think this PR is not fully fixed for #2075, and can confirm it fixed the build of Android armeabi-v7a.

cebtenzzre · 2023-09-01T17:50:08Z

You say "Android only", but the original issue reporter is running Raspbian, a Debian derivative.

I'll try clang as well, and if that doesn't work I have an armv7 Moto X 2014 that's a few years newer.

edit: I was able to build successfully with clang. I still get the vzero/mzero warnings, along with a place where the allocator assumes pointers and size_t are 64 bits wide (they are 32 bits on this platform):

ggml-alloc.c:287:45: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned int') changes value from 1099511627776 to 0 [-Wconstant-conversion]
static const size_t MEASURE_MAX_SIZE  = 1ULL<<40; // 1 TB
                    ~~~~~~~~~~~~~~~~    ~~~~^~~~

main fails with this error at runtime:

ggml_allocr_alloc: not enough space in the buffer (needed 2048, largest block available 0)
GGML_ASSERT: ggml-alloc.c:144: !"not enough space in the buffer"

cebtenzzre · 2023-09-01T20:00:36Z

@slaren 32-bit platforms are currently broken - just try replacing -march=native -mtune=native with -m32 in the Makefile, and running the 'main' example to generate some text. Make sure you load a model smaller than 2GB so you don't get EOVERFLOW, as off_t is 32 bits on Linux. I'm testing with a Q2_K quant of OpenLLaMA-3B.

off_t being too small isn't an issue in practice because the program can only address 4 GB of memory anyway.

slaren · 2023-09-02T00:06:32Z

@cebtenzzre as you probably noticed, ggml-alloc assumes a 64-bit address space. Is that the only issue currently preventing 32-bit builds?

cebtenzzre · 2023-09-02T01:55:54Z

Is that the only issue currently preventing 32-bit builds?

Apparently not, unless I need to try a smaller model. I get this with commit 11f3ca0:

error loading model: overflow multiplying 172800 * 32000

cebtenzzre · 2023-09-02T03:08:57Z

@slaren after figuring out how LLAMA_QKK_64 works, and replacing size_t with uint64_t in llama_calc_tensor_size, I get reasonable output from a 32-bit build on x86 with commit 11f3ca0, which is right before ggml-alloc was introduced.

jhen0409 · 2023-09-02T03:25:45Z

You say "Android only", but the original issue reporter is running Raspbian, a Debian derivative.

Yeah I've update the description. I was found the issue on Android early, and found a related issue.

I'll try clang as well, and if that doesn't work I have an armv7 Moto X 2014 that's a few years newer.

edit: I was able to build successfully with clang. I still get the vzero/mzero warnings, along with a place where the allocator assumes pointers and size_t are 64 bits wide (they are 32 bits on this platform):
ggml-alloc.c:287:45: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned int') changes value from 1099511627776 to 0 [-Wconstant-conversion]
static const size_t MEASURE_MAX_SIZE  = 1ULL<<40; // 1 TB
                    ~~~~~~~~~~~~~~~~    ~~~~^~~~
main fails with this error at runtime:
ggml_allocr_alloc: not enough space in the buffer (needed 2048, largest block available 0)
GGML_ASSERT: ggml-alloc.c:144: !"not enough space in the buffer"

Thanks a lot! I also got a RPi 3B (1gb ram) with uname -m: armv7l, and use -mcpu=cortex-a53 with clang, but it not able run the open-llama-3b q2_k even increased swap.

That works with a 110m llama2c model in q8_0, unfortunately I haven't been able to convert stories110m to a qk model even with QKK_64.

Also, use (1ULL<<32) - 1 for MEASURE_MAX_SIZE is works to me, so I think we probably can use it for armv7. I'll leave 32-bit x86 as a TODO for now.

slaren · 2023-09-02T16:43:19Z

ggml-alloc.c

 // address and size of the buffer when measuring
 // it needs to be large enough to fit all the tensors, but it cannot overlap with other existing buffers
 static void * const MEASURE_BASE_ADDR = (void *) 0x1000;
+#if defined(__ARM_NEON) && !defined(__aarch64__)
+// 32-bit
+// TODO: Use for 32-bit x86 as well
+static const size_t MEASURE_MAX_SIZE  = (1ULL<<32) - 1; // 4 GB
+#else


I don't think that this is going to work for the reasons explained in the comment. ggml-alloc uses the address and size of the buffer to determine if a tensor was allocated by it, and that also applies when measuring. If you set the measure buffer to the entire address space, it will try to free and reuse the memory of external tensors such as the weights and the KV cache. I will work on a better solution.

itsPreto · 2023-12-13T16:21:01Z

@jhen0409 could you please tell me what the appropriate steps are to build for armeabi-v7a?

> mkdir build-android
> cd build-android
> export NDK=<your_ndk_directory>
> cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
> make

I tried this approach here (#4421) but it gave me build errors.

k-quants : fix build on armv7

ac67192

jhen0409 requested a review from ikawrakow August 31, 2023 01:57

ggml : cleanup unused arm32 specific impl

d5d5c52

ikawrakow approved these changes Aug 31, 2023

View reviewed changes

jhen0409 changed the title ~~k-quants : fix build on armv7~~ k-quants : fix build on armv7 (android only) Sep 1, 2023

jhen0409 added 2 commits September 2, 2023 08:24

Merge branch 'master' into armv7-qk

9c6ea3b

k-quants : avoid some unused vzero / mzero define

6f5dc9b

ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm

e5fe497

jhen0409 force-pushed the armv7-qk branch from b93553b to e5fe497 Compare September 2, 2023 03:31

ggerganov approved these changes Sep 2, 2023

View reviewed changes

ggerganov merged commit 21f3d1b into ggerganov:master Sep 2, 2023

slaren reviewed Sep 2, 2023

View reviewed changes

jhen0409 deleted the armv7-qk branch September 3, 2023 23:21

jprendes mentioned this pull request Sep 20, 2023

Spin 1.5.0 fails to build on aarch64+musl with cross fermyon/spin#1786

Closed

BarfingLemurs mentioned this pull request Dec 12, 2023

unable to build for armeabi-v7a #4421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k-quants : fix build on armv7 (android only) #2920

k-quants : fix build on armv7 (android only) #2920

jhen0409 commented Aug 31, 2023 •

edited

Loading

cebtenzzre commented Aug 31, 2023

jhen0409 commented Aug 31, 2023

cebtenzzre commented Sep 1, 2023 •

edited

Loading

jhen0409 commented Sep 1, 2023 •

edited

Loading

cebtenzzre commented Sep 1, 2023 •

edited

Loading

cebtenzzre commented Sep 1, 2023 •

edited

Loading

slaren commented Sep 2, 2023

cebtenzzre commented Sep 2, 2023 •

edited

Loading

cebtenzzre commented Sep 2, 2023

jhen0409 commented Sep 2, 2023

slaren Sep 2, 2023

itsPreto commented Dec 13, 2023

k-quants : fix build on armv7 (android only) #2920

k-quants : fix build on armv7 (android only) #2920

Conversation

jhen0409 commented Aug 31, 2023 • edited Loading

cebtenzzre commented Aug 31, 2023

jhen0409 commented Aug 31, 2023

cebtenzzre commented Sep 1, 2023 • edited Loading

jhen0409 commented Sep 1, 2023 • edited Loading

cebtenzzre commented Sep 1, 2023 • edited Loading

cebtenzzre commented Sep 1, 2023 • edited Loading

slaren commented Sep 2, 2023

cebtenzzre commented Sep 2, 2023 • edited Loading

cebtenzzre commented Sep 2, 2023

jhen0409 commented Sep 2, 2023

slaren Sep 2, 2023

Choose a reason for hiding this comment

itsPreto commented Dec 13, 2023

jhen0409 commented Aug 31, 2023 •

edited

Loading

cebtenzzre commented Sep 1, 2023 •

edited

Loading

jhen0409 commented Sep 1, 2023 •

edited

Loading

cebtenzzre commented Sep 1, 2023 •

edited

Loading

cebtenzzre commented Sep 1, 2023 •

edited

Loading

cebtenzzre commented Sep 2, 2023 •

edited

Loading