[User] ./server failed to eval

# Prerequisites
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

Hi, thanks for the continued effort with llama.cpp. I cloned the repo, then built with make as usual. 

# Expected Behavior

Run ./server without error messages. This issue was not present in #2009. Unfortunately, I'm receiving errors during inference with ./server with commit #2116. I'll test other builds..

# Current Behavior

Errors during ./server inference:
```
llama_eval_internal: first token must be BOS
llama_eval: failed to eval
```

It's abrupt, and cuts off response in the middle of a sentence. Here's an example:
```
~/ollama (master)> ./server -m ~/wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin -t 4 -b 10

{"timestamp":1688607679,"level":"INFO","function":"main","line":1085,"message":"build info","build":796,"commit":"31cfbb1"}
{"timestamp":1688607679,"level":"INFO","function":"main","line":1090,"message":"

system info","n_threads":4,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | "}

llama.cpp: loading model from /data/data/com.termux/files/home/wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5407.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB

llama server listening at http://127.0.0.1:8080)

{"timestamp":1688607679,"level":"INFO","function":"main","line":1305,"message":"HTTP server listening","hostname":"127.0.0.1","port":8080}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":200,"method":"GET","path":"/","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":200,"method":"GET","path":"/completion.js","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37212,"status":200,"method":"GET","path":"/index.js","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":404,"method":"GET","path":"/favicon.ico","params":{}}

llama_print_timings:        load time =  2102.17 ms
llama_print_timings:      sample time =  3291.18 ms /   355 runs   (    9.27 ms per token,   107.86 tokens per second) 
llama_print_timings: prompt eval time = 10480.78 ms /    49 tokens (  213.89 ms per token,     4.68 tokens per second) 
llama_print_timings:        eval time = 124335.87 ms /   354 runs   (  351.23 ms per token,     2.85 tokens per second)
llama_print_timings:       total time = 138282.27 ms

{"timestamp":1688607964,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37214,"status":200,"method":"POST","path":"/completion","params":{}}

llama_eval_internal: first token must be BOS
llama_eval: failed to eval
{"timestamp":1688608023,"level":"ERROR","function":"nextToken","line":360,"message":"failed to eval","n_eval":10,"n_past":0,"n_threads":4,"embd":" 
rare ingredients for potions, and even delved into dangerous dungeons filled with 
treacherous monsters. Along the way, she made friends with other creatures who shared her passion for knowledge and 
adventure, including dragons, unicorns, and even mermaids.\nAs time passed, Luna grew stronger both physically and mentally, 
becoming an extraordinary creature capable of performing incredible feats. And yet, 
despite all her newfound powers, she never forgot where she came from or the humble roots that first led her down this path. 
For Luna always remained true to her llama nature, using her abilities only for good and spreading joy wherever she went.\n
User: Thanks. Describe Lunas appearance please.\n
llama: As a young llama, Luna was adorable with soft brown fur, long eyelashes, and a friendly smile. But as she embarked on her 
journey towards greatness, her physical features began to change in mysterious ways. Her eyes 
became more intense, glowing like crystals themselves, while her body developed powerful 
muscles and a shimmering golden coat. She now stood taller than any ordinary ll"}

llama_print_timings:        load time =  2102.17 ms
llama_print_timings:      sample time =   936.31 ms /    93 runs   (   10.07 ms per token,    99.33 tokens per second)
llama_print_timings: prompt eval time =  4246.50 ms /    16 tokens (  265.41 ms per token,     3.77 tokens per second)
llama_print_timings:        eval time = 29930.84 ms /    92 runs   (  325.34 ms per token,     3.07 tokens per second)
llama_print_timings:       total time = 35164.16 ms

{"timestamp":1688608023,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37216,"status":200,"method":"POST","path":"/completion","params":{}}
^C
```
# Environment and Context

uname -a
```
Linux localhost 4.14.190-23725627-abG975WVLS8IWD1 #2 SMP PREEMPT Mon Apr 10 18:16:39 KST 2023 aarch64 Android
```
```
lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              Qualcomm
  Model name:           Kryo-4XX-Silver
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 49
    Socket(s):          1
    Stepping:           0xd
    CPU(s) scaling MHz: 62%
    CPU max MHz:        1785.6000
    CPU min MHz:        300.0000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
  Model name:           Kryo-4XX-Gold
    Model:              14
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          2
    Stepping:           0xd
    CPU(s) scaling MHz: 74%
    CPU max MHz:        2841.6001
    CPU min MHz:        710.4000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull
                         sha1 sha2 crc32 atomics f
                        php asimdhp cpuid asimdrdm
                         lrcpc dcpop asimddp
```
```
$ Python 3.11.4
$ GNU Make 4.4.1
$clang version 16.0.6
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /data/data/com.termux/files/usr/bin
```

# Failure Information (for bugs)

llama_eval_internal: first token must be BOS
llama_eval: failed to eval

# Steps to Reproduce

1. git clone https://github.com/ggerganov/llama.cpp
2. Make
3. ./server -m ~/wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin -t 4 -b 10
4. Then interact with the model 2-3 times.

```
git log | head -1
commit 31cfbb1013a482e89c72146e2063ac4362becae7
```

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User] ./server failed to eval #2122

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User] ./server failed to eval #2122

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions