Closed
Description
If I run perplexity as follows:
./perplexity -m ~/weights/mistral-7b-instruct-v0.2.Q5_K_M.gguf -f ~/vendor/wiki.test.raw -s 31337
I get an assertion failure on the logits_valid
array.
perplexity: tokenizing the input ..
perplexity: tokenization took 546.595 ms
perplexity: calculating perplexity over 642 chunks, n_ctx=512, batch_size=2048, n_seq=4
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
ggml_gallocr_needs_realloc: node CUDA0#k-0#0 is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
ggml_gallocr_needs_realloc: node CUDA0#k-0#0 is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
ggml_gallocr_needs_realloc: node CUDA0#k-0#0 is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving
perplexity: 2.44 seconds per pass - ETA 6.50 minutes
perplexity: llama.cpp:14296: float* llama_get_logits_ith(llama_context*, int32_t): Assertion `ctx->logits_valid.at(i)' failed.
Aborted (core dumped)