Fix crash for 65B model after recent pre-allocated memory change #485

chriskuehl · 2023-03-25T00:43:54Z

This fixes a crash which started with #473 for me when using the 65B model and -c 2048.

On master, I get a crash in the call to std::vector::resize():

(gdb) run -m models/65B/ggml-model-q4_0.bin -t 1 -c 2048 -p Hello
[...]
llama_model_load: loading model part 8/8 from 'models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7aad537 in __GI_abort () at abort.c:79
#2  0x00007ffff7e7a7ec in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7e85966 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7e859d1 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7e85c65 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7e7d09a in std::__throw_length_error(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x0000555555582b01 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_check_len (this=0x555555660998, this=0x555555660998, 
    __s=0x55555559ea14 "vector::_M_default_append", __n=18446744070490423296) at /usr/include/c++/10/bits/stl_vector.h:1759
#8  std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append (this=0x555555660998, __n=18446744070490423296)
    at /usr/include/c++/10/bits/vector.tcc:634
#9  0x000055555557b564 in std::vector<unsigned char, std::allocator<unsigned char> >::resize (__new_size=<optimized out>, this=0x555555660998)
    at /usr/include/c++/10/bits/stl_vector.h:940
#10 kv_cache_init (hparams=..., hparams=..., n_ctx=<optimized out>, wtype=GGML_TYPE_F16, cache=...) at llama.cpp:242
#11 llama_init_from_file (path_model=<optimized out>, params=...) at llama.cpp:1638
#12 0x000055555555b96f in main (argc=9, argv=0x7fffffffe588) at /usr/include/c++/10/bits/basic_string.h:186

It seems like it is trying to resize the vector to an absurd number of elements:

(gdb) frame 8
#8  std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append (this=0x555555660998, __n=18446744070490423296) at /usr/include/c++/10/bits/vector.tcc:634
634                     _M_check_len(__n, "vector::_M_default_append");
(gdb) info args
this = 0x555555660998
__n = 18446744070490423296

After this change, it no longer crashes:

$ ./main -m models/65B/ggml-model-q4_0.bin -t 1 -c 2048 -p Hello                                                                                                                                                                                 
[...]
llama_model_load: loading model part 8/8 from 'models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_init_from_file: kv self size  = 5120.00 MB

system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: prompt: ' Hello'
main: number of tokens in prompt = 2
     1 -> ''
 15043 -> ' Hello'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000


 Hello and welcome

With a breakpoint right after the successful resize() call, I can inspect the vector's size() and it looks reasonable (about 5 GB):

(gdb) p cache.buf.size()
$5 = 5370806272

expose RoPE param to server start

Fix crash for 65B model with pre-allocated memory

743ec9b

ggerganov merged commit 6f1ee4b into ggerganov:master Mar 25, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Merge pull request ggerganov#485 from callMeMakerRen/main

071ac79

expose RoPE param to server start

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash for 65B model after recent pre-allocated memory change #485

Fix crash for 65B model after recent pre-allocated memory change #485

chriskuehl commented Mar 25, 2023

Fix crash for 65B model after recent pre-allocated memory change #485

Fix crash for 65B model after recent pre-allocated memory change #485

Conversation

chriskuehl commented Mar 25, 2023