Skip to content

Sycl: No kernel named _ZTSZZL17rms_norm_f32_ was found Intel ARC A770 #3967

Closed
@kkacsh321

Description

@kkacsh321

LocalAI version:

v2.22.1

Environment, CPU architecture, OS, and Version:

Ubuntu 22.04
Linux gpubench 6.8.0-47-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 16:16:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Intel ARC A770 (requires newer drivers/etc to be correctly identified the on the containers or locally)

Have tried Sycl containers, and building locally on the machine

Describe the bug

Trying to get a new ARC A770 working with Sycl through LocalAI, it does require quite new drivers/etc to make it recognized fully. I have tried the containers and building locally on the machine.

I hit this error on trying to run any model:
Screenshot 2024-10-25 at 3 36 03 PM

However running just llama.cpp on same machine/setup/model it works perfectly fine.
Screenshot 2024-10-25 at 3 36 23 PM

To Reproduce

I can reproduce this by running the Sycl based containers and running an 'apt-get update and apt-get upgrade' to get the drivers/intel openapi/etc up to date to recognize this card correctly (else shows up with 256mb of memory and won't run a model)

Or locally by having the same drivers installed - again to correctly recognize the card

Expected behavior

To be able to load and run the models correctly on an Intel Arc A770, as it does run fine on llama.cpp by itself

Logs

9:25PM INF Loading model 'Mistral-v0.3-7B-Q3_K_M' with backend llama-cpp-grpc 9:25PM DBG Loading model in memory from file: /var/localai/models/Mistral-7B-Instruct-v0.3.Q3_K_M.gguf 9:25PM DBG Loading Model Mistral-v0.3-7B-Q3_K_M with gRPC (file: /var/localai/models/Mistral-7B-Instruct-v0.3.Q3_K_M.gguf) (backend: llama-cpp-grpc): {backendString:llama-cpp-grpc model:Mistral-7B-Instruct-v0.3.Q3_K_M.gguf modelID:Mistral-v0.3-7B-Q3_K_M assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000c8c008 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 9:25PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-grpc 9:25PM DBG GRPC Service for Mistral-v0.3-7B-Q3_K_M will be running at: '127.0.0.1:33321' 9:25PM DBG GRPC Service state dir: /tmp/go-processmanager561768969 9:25PM DBG GRPC Service Started 9:25PM DBG Wait for the service to start up 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stdout Server listening on 127.0.0.1:33321 9:25PM DBG GRPC Service Ready 9:25PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:Mistral-7B-Instruct-v0.3.Q3_K_M.gguf ContextSize:12288 Seed:1763834203 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:33 MainGPU: TensorSplit: Threads:10 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/var/localai/models/Mistral-7B-Instruct-v0.3.Q3_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false} 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ggml_sycl_init: SYCL_USE_XMX: yes 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ggml_sycl_init: found 1 SYCL devices: 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: loaded meta data with 29 key-value pairs and 291 tensors from /var/localai/models/Mistral-7B-Instruct-v0.3.Q3_K_M.gguf (version GGUF V3 (latest)) 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 0: general.architecture str = llama 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 1: general.name str = models--mistralai--Mistral-7B-Instruc... 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 2: llama.block_count u32 = 32 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 3: llama.context_length u32 = 32768 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 8: llama.rope.freq_base f32 = 1000000.000000 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 10: general.file_type u32 = 12 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 11: llama.vocab_size u32 = 32768 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 13: tokenizer.ggml.model str = llama 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 14: tokenizer.ggml.pre str = default 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32768] = ["<unk>", "<s>", "</s>", "[INST]", "[... 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32768] = [0.000000, 0.000000, 0.000000, 0.0000... 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32768] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 20: tokenizer.ggml.unknown_token_id u32 = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 24: general.quantization_version u32 = 2 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 25: quantize.imatrix.file str = ./imatrix.dat 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 26: quantize.imatrix.dataset str = group_40.txt 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 27: quantize.imatrix.entries_count i32 = 224 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - kv 28: quantize.imatrix.chunks_count i32 = 74 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - type f32: 65 tensors 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - type q3_K: 129 tensors 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - type q4_K: 92 tensors 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - type q5_K: 4 tensors 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_model_loader: - type q6_K: 1 tensors 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_vocab: special tokens cache size = 771 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_vocab: token to piece cache size = 0.1731 MB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: format = GGUF V3 (latest) 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: arch = llama 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: vocab type = SPM 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_vocab = 32768 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_merges = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: vocab_only = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_ctx_train = 32768 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_embd = 4096 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_layer = 32 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_head = 32 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_head_kv = 8 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_rot = 128 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_swa = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_embd_head_k = 128 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_embd_head_v = 128 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_gqa = 4 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_embd_k_gqa = 1024 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_embd_v_gqa = 1024 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: f_norm_eps = 0.0e+00 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: f_norm_rms_eps = 1.0e-05 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: f_clamp_kqv = 0.0e+00 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: f_logit_scale = 0.0e+00 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_ff = 14336 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_expert = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_expert_used = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: causal attn = 1 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: pooling type = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: rope type = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: rope scaling = linear 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: freq_base_train = 1000000.0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: freq_scale_train = 1 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: n_ctx_orig_yarn = 32768 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: rope_finetuned = unknown 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: ssm_d_conv = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: ssm_d_inner = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: ssm_d_state = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: ssm_dt_rank = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: ssm_dt_b_c_rms = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: model type = 7B 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: model ftype = Q3_K - Medium 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: model params = 7.25 B 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: model size = 3.28 GiB (3.89 BPW) 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: general.name = models--mistralai--Mistral-7B-Instruct-v0.3 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: BOS token = 1 '<s>' 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: EOS token = 2 '</s>' 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: UNK token = 0 '<unk>' 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: LF token = 781 '<0x0A>' 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: EOG token = 2 '</s>' 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_print_meta: max token length = 48 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: ggml ctx size = 0.27 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: offloading 32 repeating layers to GPU 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: offloading non-repeating layers to GPU 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: offloaded 33/33 layers to GPU 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: SYCL0 buffer size = 3304.02 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llm_load_tensors: CPU buffer size = 55.00 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ................................................................................................. 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: n_ctx = 12288 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: n_batch = 512 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: n_ubatch = 512 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: flash_attn = 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: freq_base = 1000000.0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: freq_scale = 1 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr [SYCL] call ggml_check_sycl 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ggml_check_sycl: GGML_SYCL_DEBUG: 0 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr ggml_check_sycl: GGML_SYCL_F16: no 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr found 1 SYCL devices: 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr | | | | |Max | |Max |Global | | 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr | | | | |compute|Max work|sub |mem | | 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr |ID| Device Type| Name|Version|units |group |group|size | Driver version| 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr | 0| [level_zero:gpu:0]| Intel Arc A770 Graphics| 12.55| 512| 1024| 32| 16225M| 1.3.29735+27| 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_kv_cache_init: SYCL0 KV buffer size = 1536.00 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: SYCL_Host output buffer size = 0.12 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: SYCL0 compute buffer size = 824.00 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: SYCL_Host compute buffer size = 32.01 MiB 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: graph nodes = 1030 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr llama_new_context_with_model: graph splits = 2 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 9:25PM DBG GRPC(Mistral-v0.3-7B-Q3_K_M-127.0.0.1:33321): stderr No kernel named _ZTSZZL17rms_norm_f32_syclPKfPfiifPN4sycl3_V15queueEiENKUlRNS3_7handlerEE0_clES7_EUlNS3_7nd_itemILi3EEEE_ was foundException caught at file:/home/ubuntu/Github/LocalAI/backend/cpp/llama-grpc/llama.cpp/ggml/src/ggml-sycl.cpp, line:3546

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions