Skip to content

MPT: tokenization crashes #3604

Closed
Closed
@jploski

Description

@jploski

Testing against 5974d61 and #3538

While running

bin/main --mlock -m /mnt/f2fs/mpt/ggml-model-mpt-7b-storywriter-f16-q5_1.gguf -t 1 -ngl 999 -p 'Once upon a time' --temp 0.8 --top_p 0.98 -c 2048 --keep -1 --repeat_penalty 1 -n 1024

It consistently crashes after a few (~hundred) tokens with this backtrace:

Thread 1 "main" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fffcefae535 in __GI_abort () at abort.c:79
#2  0x00007fffcf376983 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fffcf37c8c6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fffcf37c901 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fffcf37cb34 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffcf37886b in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x0000555555612915 in std::__detail::_Map_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned char>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned char> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::at (this=0x555555b2fa20 <unicode_to_bytes_bpe(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::map>, __k=" ")
    at /usr/include/c++/8/bits/hashtable_policy.h:760
#8  0x000055555560b295 in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned char> > >::at (this=0x555555b2fa20 <unicode_to_bytes_bpe(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::map>, __k=" ") at /usr/include/c++/8/bits/unordered_map.h:991
#9  0x00005555555cc65a in unicode_to_bytes_bpe (utf8=" ") at /mnt/seagate/dalai/llama.cpp.bak/unicode.h:460
#10 0x00005555555f8398 in llama_decode_text (text="  ") at /mnt/seagate/dalai/llama.cpp.bak/llama.cpp:9703
#11 0x00005555555f8663 in llama_token_to_piece (model=0x5555a4966c40, token=50276, buf=0x555555e3c6b0 "", length=8) at /mnt/seagate/dalai/llama.cpp.bak/llama.cpp:9746
#12 0x000055555557e0ce in llama_token_to_piece[abi:cxx11](llama_context const*, int) (ctx=0x5555b579e350, token=50276) at /mnt/seagate/dalai/llama.cpp.bak/common/common.cpp:894
#13 0x00005555555b028b in llama_sampling_sample (ctx=0x5555b579e350, ctx_guidance=0x0, ctx_sampling=..., last_tokens=std::vector of length 2048, capacity 2048 = {...}, 
    candidates=std::vector of length 50432, capacity 50432 = {...}, idx=0, seq=0) at /mnt/seagate/dalai/llama.cpp.bak/common/sampling.cpp:151
#14 0x000055555556a2aa in main (argc=22, argv=0x7fffffffdcd8) at /mnt/seagate/dalai/llama.cpp.bak/examples/main/main.cpp:648

(The GGUF/vocab was exported using convert-mpt-hf-to-gguf.py from the aforementioned commit.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions