Open
Description
What happened?
It seems that loading llava models crashes entirely. I can reproduce that 100% hit with moondream models.
this issue has been discussed already in #9066 (comment) and in #9294 (comment), this ticket is just a tracker to discuss about the issue
Name and Version
Commit still working here: 815b1fb
Commit which is not working: e6b7801 (which includes #9082 ), also daa9623 is not working (which is older)
What operating system are you seeing the problem on?
Linux
Relevant log output
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stderr /home/mudler/_git/LocalAI/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml.c:13835: GGML_ASSERT(i01 >= 0 &
& i01 < ne01) failed
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout [Thread debugging using libthread_db enabled]
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout Using host libthread_db library "/lib64/libthread_db.so.1".
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout 0x00007f989b8e94a3 in ?? () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #0 0x00007f989b8e94a3 in ?? () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #1 0x00000000008222e5 in ggml_graph_compute_thread.isra ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #2 0x00007f989b8dcd16 in GOMP_parallel () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #3 0x0000000000825a2a in ggml_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #4 0x0000000000834010 in ggml_backend_cpu_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #5 0x000000000083784c in ggml_backend_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #6 0x0000000000652b63 in clip_image_batch_encode.constprop ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #7 0x0000000000653553 in clip_image_encode ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #8 0x0000000000657ac8 in llava_image_embed_make_with_clip_img ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #9 0x00000000004e2c09 in llama_server_context::update_slots() [clone .isra.0] ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #10 0x00000000004d7629 in llama_server_queue::start_loop() ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #11 0x000000000048b040 in main ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout [Inferior 1 (process 13029) detached]
Note
- Flagged as critical as it completely crashes
llama.cpp
llama.cpp
is being used as a library ( chore(deps): update llama.cpp mudler/LocalAI#3497 )- Applying the suggestion described in Bug: MiniCPM-V-2.6 commit d565bb2fd5a2a58b9924a7a34e77a87c78c52137 causing crash in moondream #9066 (comment) seems to workaround the issue for me
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index 342042ff..224db9b5 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2419,7 +2419,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
struct ggml_tensor * patches = ggml_graph_get_tensor(gf, "patches");
int* patches_data = (int*)malloc(ggml_nbytes(patches));
for (int i = 0; i < num_patches; i++) {
- patches_data[i] = i + 1;
+ patches_data[i] = i;
}
ggml_backend_tensor_set(patches, patches_data, 0, ggml_nbytes(patches));
free(patches_data);