Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash for 65B model after recent pre-allocated memory change #485

Merged
merged 1 commit into from
Mar 25, 2023

Conversation

chriskuehl
Copy link
Contributor

This fixes a crash which started with #473 for me when using the 65B model and -c 2048.

On master, I get a crash in the call to std::vector::resize():

(gdb) run -m models/65B/ggml-model-q4_0.bin -t 1 -c 2048 -p Hello
[...]
llama_model_load: loading model part 8/8 from 'models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7aad537 in __GI_abort () at abort.c:79
#2  0x00007ffff7e7a7ec in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7e85966 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7e859d1 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7e85c65 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7e7d09a in std::__throw_length_error(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x0000555555582b01 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_check_len (this=0x555555660998, this=0x555555660998, 
    __s=0x55555559ea14 "vector::_M_default_append", __n=18446744070490423296) at /usr/include/c++/10/bits/stl_vector.h:1759
#8  std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append (this=0x555555660998, __n=18446744070490423296)
    at /usr/include/c++/10/bits/vector.tcc:634
#9  0x000055555557b564 in std::vector<unsigned char, std::allocator<unsigned char> >::resize (__new_size=<optimized out>, this=0x555555660998)
    at /usr/include/c++/10/bits/stl_vector.h:940
#10 kv_cache_init (hparams=..., hparams=..., n_ctx=<optimized out>, wtype=GGML_TYPE_F16, cache=...) at llama.cpp:242
#11 llama_init_from_file (path_model=<optimized out>, params=...) at llama.cpp:1638
#12 0x000055555555b96f in main (argc=9, argv=0x7fffffffe588) at /usr/include/c++/10/bits/basic_string.h:186

It seems like it is trying to resize the vector to an absurd number of elements:

(gdb) frame 8
#8  std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append (this=0x555555660998, __n=18446744070490423296) at /usr/include/c++/10/bits/vector.tcc:634
634                     _M_check_len(__n, "vector::_M_default_append");
(gdb) info args
this = 0x555555660998
__n = 18446744070490423296

After this change, it no longer crashes:

$ ./main -m models/65B/ggml-model-q4_0.bin -t 1 -c 2048 -p Hello                                                                                                                                                                                 
[...]
llama_model_load: loading model part 8/8 from 'models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723
llama_init_from_file: kv self size  = 5120.00 MB

system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: prompt: ' Hello'
main: number of tokens in prompt = 2
     1 -> ''
 15043 -> ' Hello'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000


 Hello and welcome

With a breakpoint right after the successful resize() call, I can inspect the vector's size() and it looks reasonable (about 5 GB):

(gdb) p cache.buf.size()
$5 = 5370806272

@ggerganov ggerganov merged commit 6f1ee4b into ggerganov:master Mar 25, 2023
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023
expose RoPE param to server start
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants