Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash on ftype (formerly f16) == 4 #917

Merged
merged 1 commit into from
Apr 12, 2023
Merged

Don't crash on ftype (formerly f16) == 4 #917

merged 1 commit into from
Apr 12, 2023

Conversation

sw
Copy link
Contributor

@sw sw commented Apr 12, 2023

In #709 I did not add an enum value 4 for ftype (formerly f16), and made the function llama_ftype_name strict in that a LLAMA_ASSERT was raised. This made llama.cpp crash with such GPTQ model files.

Add LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 = 4 and relax llama_ftype_name so that it returns a helpful string, which will also help future experiments with file types.

Should fix #903

@sw
Copy link
Contributor Author

sw commented Apr 12, 2023

@softzer0 @thestamp @funnbot @TheBloke @wbpxre150 Inviting you to test this PR as a fix for #903

@TheBloke
Copy link
Contributor

Inviting you to test this PR as a fix for #903

Working, thanks for the quick resolution!

tomj@Eddie ~/src/llama.cpp (master●●)$ time ./main -t 18 -m ../huggingface/koala-13B-GPTQ-4bit-128g-GGML/koala-13B-4bit-128g.GGML.bin --color  -c 2048 --temp 0.3 --repeat_penalty 1.2 -n -1 -i -ins
main: seed = 1681311571
llama.cpp: loading model from ../huggingface/koala-13B-GPTQ-4bit-128g-GGML/koala-13B-4bit-128g.GGML.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 4 (mostly Q4_1, some F16)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11749.65 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 18 / 36 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.300000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.200000
generate: n_ctx = 2048, n_batch = 8, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.


> What are llamas?
Llamas are a type of animal that is native to South America. They have long necks, short legs and curved ears, and they are often used as pack animals in the Andes Mountains. Llamas are known for their calm temperament and ability to carry heavy loads over difficult terrain.
>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Latest release crashes on start
3 participants