This examples tries to mimic the simple-vision example which uses the new Vision API. This example uses Llama 3.2 Vision Instruct model which is somewhat different from the simple-vision example which is a "prompt-based" vision model meaning that we first generate the patch embeddings for an image. These are then passed to into llama.cpp using the llama_decode function, so the image is passed like a prompt, hence the term prompt based. The Llama 3.2 Vision model uses cross attention instead where the image patch embeddings are used in the cross attention layers.
This is a work in progress There are a number of short-cuts taking in the code as the main goal was to get something working which can be iterated upon if this is worth pursuing.
To convert the Llama 3.2 Vision Instruct model to GGUF we need to install the required python packages:
$ python3 -m venv venv
$ source venv/bin/activate
(venv) pip install -r requirements.txt
Convert LLaMA 3.2 Vision Instruct model to GGUF:
(venv) python ./convert_hf_to_gguf.py --verbose /path/to/Llama-3.2-11B-Vision-Instruct --outfile models/llama-3-2-11b-f32.gguf --outtype f32
Quantize the model to a lower precision:
(venv) ./build/bin/llama-quantize models/llama-3-2-11b-f32.gguf models/llama-3-2-11b-Q4_1.gguf Q4_K
This can be build with cmake using the following commands.
CUDA:
$ cmake -S . -B build -DGGML_CUDA=On
Metal:
$ cmake -S . -B build
Then build the example:
$ cmake --build build --target llama-simple-vision-mllama -- -j8
An image is required to be passed in this example, and the following image is included in the repository, but any JPEG image should work:
$ ./build/bin/llama-simple-vision-mllama -m models/llama-3-2-11b-Q4_K.gguf -ngl 42 --image examples/simple-vision-mllama/ny.jpg
This image shows a cityscape of New York City. In the center of the image is the Empire State Building,
a skyscraper in Midtown Manhattan, New York City. It is known as "The Empire State" and stands at a
height of 1,454 feet (443 meters). It
main: decoded 60 tokens in 5.79 s, speed: 10.37 t/s
(Note that the example is set to only generate 60 tokens, hence the cut-off)
Detailed output
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M3)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M3)
build: 4428 (88713084) with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0 (debug)
llama_load_model_from_file: using device Metal (Apple M3) - 16383 MiB free
llama_model_loader: loaded meta data with 59 key-value pairs and 908 tensors from models/llama-3-2-11b-Q4_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = mllama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 11B Vision Instruct
llama_model_loader: - kv 3: general.finetune str = Vision-Instruct
llama_model_loader: - kv 4: general.basename str = Llama-3.2
llama_model_loader: - kv 5: general.size_label str = 11B
llama_model_loader: - kv 6: general.license str = llama3.2
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: mllama.image_token_index u32 = 128256
llama_model_loader: - kv 10: mllama.context_length u32 = 131072
llama_model_loader: - kv 11: mllama.block_count u32 = 40
llama_model_loader: - kv 12: mllama.embedding_length u32 = 4096
llama_model_loader: - kv 13: mllama.feed_forward_length u32 = 14336
llama_model_loader: - kv 14: mllama.attention.head_count u32 = 32
llama_model_loader: - kv 15: mllama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 16: mllama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 17: mllama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 18: general.file_type u32 = 15
llama_model_loader: - kv 19: mllama.cross_attention_layers arr[i32,8] = [3, 8, 13, 18, 23, 28, 33, 38]
llama_model_loader: - kv 20: mllama.vocab_size u32 = 128256
llama_model_loader: - kv 21: mllama.rope.dimension_count u32 = 128
llama_model_loader: - kv 22: vision.type str = cross-attn
llama_model_loader: - kv 23: vision.architecture str = mllama_vision_model
llama_model_loader: - kv 24: vision.image_size u32 = 560
llama_model_loader: - kv 25: vision.block_count u32 = 32
llama_model_loader: - kv 26: vision.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 27: vision.embedding_length u32 = 1280
llama_model_loader: - kv 28: vision.cross.mllama.activation_function str = gelu
llama_model_loader: - kv 29: vision.feed_forward_length u32 = 5120
llama_model_loader: - kv 30: vision.cross.mllama.global_block_count u32 = 8
llama_model_loader: - kv 31: vision.cross.mllama.max_num_tiles u32 = 4
llama_model_loader: - kv 32: vision.cross.mllama.channels_count u32 = 3
llama_model_loader: - kv 33: vision.patch_size u32 = 14
llama_model_loader: - kv 34: vision.cross.mllama.intermediate_layers_indices arr[i32,5] = [3, 7, 15, 23, 30]
llama_model_loader: - kv 35: vision.attention.head_count u32 = 16
llama_model_loader: - kv 36: vision.cross.mllama.output_dim u32 = 7680
llama_model_loader: - kv 37: vision.cross.mllama.model_type str = mllama_vision_model
llama_model_loader: - kv 38: vision.clip.max_position_embeddings u32 = 1601
llama_model_loader: - kv 39: vision.cross.mllama.supported_aspect_ratios arr[i32,16] = [1, 1, 1, 2, 1, 3, 1, 4, 2, 1, 2, 2, ...
llama_model_loader: - kv 40: vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211]
llama_model_loader: - kv 41: vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777]
llama_model_loader: - kv 42: vision.clip.projection_dim u32 = 7680
llama_model_loader: - kv 43: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 44: tokenizer.ggml.pre str = mllama
llama_model_loader: - kv 45: tokenizer.ggml.tokens arr[str,128257] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 46: tokenizer.ggml.token_type arr[i32,128257] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 47: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 48: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 49: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 50: tokenizer.ggml.padding_token_id u32 = 128004
llama_model_loader: - kv 51: tokenizer.ggml.start_header_token_id u32 = 128006
llama_model_loader: - kv 52: tokenizer.ggml.end_header_token_id u32 = 128007
llama_model_loader: - kv 53: tokenizer.ggml.eom_token_id u32 = 128008
llama_model_loader: - kv 54: tokenizer.ggml.eot_token_id u32 = 128009
llama_model_loader: - kv 55: tokenizer.ggml.python_tag_token_id u32 = 128010
llama_model_loader: - kv 56: tokenizer.ggml.image_token_id u32 = 128256
llama_model_loader: - kv 57: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 58: general.quantization_version u32 = 2
llama_model_loader: - type f32: 626 tensors
llama_model_loader: - type q4_K: 244 tensors
llama_model_loader: - type q6_K: 38 tensors
using mllama
llm_tokenizer_bpe: using default regex for BPE tokenization pre-processing
special token: 'tokenizer.ggml.bos_token_id' = 128000
special token: 'tokenizer.ggml.eos_token_id' = 128009
special token: 'tokenizer.ggml.eot_token_id' = 128009
special token: 'tokenizer.ggml.eom_token_id' = 128008
special token: 'tokenizer.ggml.padding_token_id' = 128004
special token: 'tokenizer.ggml.image_token_id' = 128256
llm_load_vocab: control token: 128255 '<|reserved_special_token_246|>' is not marked as EOG
llm_load_vocab: control token: 128253 '<|reserved_special_token_244|>' is not marked as EOG
llm_load_vocab: control token: 128252 '<|reserved_special_token_243|>' is not marked as EOG
llm_load_vocab: control token: 128251 '<|reserved_special_token_242|>' is not marked as EOG
llm_load_vocab: control token: 128249 '<|reserved_special_token_240|>' is not marked as EOG
llm_load_vocab: control token: 128248 '<|reserved_special_token_239|>' is not marked as EOG
llm_load_vocab: control token: 128246 '<|reserved_special_token_237|>' is not marked as EOG
llm_load_vocab: control token: 128245 '<|reserved_special_token_236|>' is not marked as EOG
llm_load_vocab: control token: 128244 '<|reserved_special_token_235|>' is not marked as EOG
llm_load_vocab: control token: 128241 '<|reserved_special_token_232|>' is not marked as EOG
llm_load_vocab: control token: 128239 '<|reserved_special_token_230|>' is not marked as EOG
llm_load_vocab: control token: 128236 '<|reserved_special_token_227|>' is not marked as EOG
llm_load_vocab: control token: 128235 '<|reserved_special_token_226|>' is not marked as EOG
llm_load_vocab: control token: 128230 '<|reserved_special_token_221|>' is not marked as EOG
llm_load_vocab: control token: 128228 '<|reserved_special_token_219|>' is not marked as EOG
llm_load_vocab: control token: 128227 '<|reserved_special_token_218|>' is not marked as EOG
llm_load_vocab: control token: 128225 '<|reserved_special_token_216|>' is not marked as EOG
llm_load_vocab: control token: 128224 '<|reserved_special_token_215|>' is not marked as EOG
llm_load_vocab: control token: 128222 '<|reserved_special_token_213|>' is not marked as EOG
llm_load_vocab: control token: 128220 '<|reserved_special_token_211|>' is not marked as EOG
llm_load_vocab: control token: 128219 '<|reserved_special_token_210|>' is not marked as EOG
llm_load_vocab: control token: 128218 '<|reserved_special_token_209|>' is not marked as EOG
llm_load_vocab: control token: 128217 '<|reserved_special_token_208|>' is not marked as EOG
llm_load_vocab: control token: 128216 '<|reserved_special_token_207|>' is not marked as EOG
llm_load_vocab: control token: 128214 '<|reserved_special_token_205|>' is not marked as EOG
llm_load_vocab: control token: 128212 '<|reserved_special_token_203|>' is not marked as EOG
llm_load_vocab: control token: 128211 '<|reserved_special_token_202|>' is not marked as EOG
llm_load_vocab: control token: 128210 '<|reserved_special_token_201|>' is not marked as EOG
llm_load_vocab: control token: 128209 '<|reserved_special_token_200|>' is not marked as EOG
llm_load_vocab: control token: 128208 '<|reserved_special_token_199|>' is not marked as EOG
llm_load_vocab: control token: 128205 '<|reserved_special_token_196|>' is not marked as EOG
llm_load_vocab: control token: 128203 '<|reserved_special_token_194|>' is not marked as EOG
llm_load_vocab: control token: 128198 '<|reserved_special_token_189|>' is not marked as EOG
llm_load_vocab: control token: 128196 '<|reserved_special_token_187|>' is not marked as EOG
llm_load_vocab: control token: 128195 '<|reserved_special_token_186|>' is not marked as EOG
llm_load_vocab: control token: 128192 '<|reserved_special_token_183|>' is not marked as EOG
llm_load_vocab: control token: 128191 '<|reserved_special_token_182|>' is not marked as EOG
llm_load_vocab: control token: 128189 '<|reserved_special_token_180|>' is not marked as EOG
llm_load_vocab: control token: 128188 '<|reserved_special_token_179|>' is not marked as EOG
llm_load_vocab: control token: 128186 '<|reserved_special_token_177|>' is not marked as EOG
llm_load_vocab: control token: 128185 '<|reserved_special_token_176|>' is not marked as EOG
llm_load_vocab: control token: 128184 '<|reserved_special_token_175|>' is not marked as EOG
llm_load_vocab: control token: 128179 '<|reserved_special_token_170|>' is not marked as EOG
llm_load_vocab: control token: 128178 '<|reserved_special_token_169|>' is not marked as EOG
llm_load_vocab: control token: 128177 '<|reserved_special_token_168|>' is not marked as EOG
llm_load_vocab: control token: 128176 '<|reserved_special_token_167|>' is not marked as EOG
llm_load_vocab: control token: 128175 '<|reserved_special_token_166|>' is not marked as EOG
llm_load_vocab: control token: 128174 '<|reserved_special_token_165|>' is not marked as EOG
llm_load_vocab: control token: 128173 '<|reserved_special_token_164|>' is not marked as EOG
llm_load_vocab: control token: 128170 '<|reserved_special_token_161|>' is not marked as EOG
llm_load_vocab: control token: 128168 '<|reserved_special_token_159|>' is not marked as EOG
llm_load_vocab: control token: 128167 '<|reserved_special_token_158|>' is not marked as EOG
llm_load_vocab: control token: 128161 '<|reserved_special_token_152|>' is not marked as EOG
llm_load_vocab: control token: 128160 '<|reserved_special_token_151|>' is not marked as EOG
llm_load_vocab: control token: 128158 '<|reserved_special_token_149|>' is not marked as EOG
llm_load_vocab: control token: 128157 '<|reserved_special_token_148|>' is not marked as EOG
llm_load_vocab: control token: 128155 '<|reserved_special_token_146|>' is not marked as EOG
llm_load_vocab: control token: 128153 '<|reserved_special_token_144|>' is not marked as EOG
llm_load_vocab: control token: 128152 '<|reserved_special_token_143|>' is not marked as EOG
llm_load_vocab: control token: 128151 '<|reserved_special_token_142|>' is not marked as EOG
llm_load_vocab: control token: 128148 '<|reserved_special_token_139|>' is not marked as EOG
llm_load_vocab: control token: 128145 '<|reserved_special_token_136|>' is not marked as EOG
llm_load_vocab: control token: 128143 '<|reserved_special_token_134|>' is not marked as EOG
llm_load_vocab: control token: 128142 '<|reserved_special_token_133|>' is not marked as EOG
llm_load_vocab: control token: 128141 '<|reserved_special_token_132|>' is not marked as EOG
llm_load_vocab: control token: 128134 '<|reserved_special_token_125|>' is not marked as EOG
llm_load_vocab: control token: 128131 '<|reserved_special_token_122|>' is not marked as EOG
llm_load_vocab: control token: 128129 '<|reserved_special_token_120|>' is not marked as EOG
llm_load_vocab: control token: 128128 '<|reserved_special_token_119|>' is not marked as EOG
llm_load_vocab: control token: 128127 '<|reserved_special_token_118|>' is not marked as EOG
llm_load_vocab: control token: 128126 '<|reserved_special_token_117|>' is not marked as EOG
llm_load_vocab: control token: 128125 '<|reserved_special_token_116|>' is not marked as EOG
llm_load_vocab: control token: 128124 '<|reserved_special_token_115|>' is not marked as EOG
llm_load_vocab: control token: 128123 '<|reserved_special_token_114|>' is not marked as EOG
llm_load_vocab: control token: 128122 '<|reserved_special_token_113|>' is not marked as EOG
llm_load_vocab: control token: 128121 '<|reserved_special_token_112|>' is not marked as EOG
llm_load_vocab: control token: 128120 '<|reserved_special_token_111|>' is not marked as EOG
llm_load_vocab: control token: 128117 '<|reserved_special_token_108|>' is not marked as EOG
llm_load_vocab: control token: 128116 '<|reserved_special_token_107|>' is not marked as EOG
llm_load_vocab: control token: 128115 '<|reserved_special_token_106|>' is not marked as EOG
llm_load_vocab: control token: 128114 '<|reserved_special_token_105|>' is not marked as EOG
llm_load_vocab: control token: 128112 '<|reserved_special_token_103|>' is not marked as EOG
llm_load_vocab: control token: 128111 '<|reserved_special_token_102|>' is not marked as EOG
llm_load_vocab: control token: 128108 '<|reserved_special_token_99|>' is not marked as EOG
llm_load_vocab: control token: 128107 '<|reserved_special_token_98|>' is not marked as EOG
llm_load_vocab: control token: 128106 '<|reserved_special_token_97|>' is not marked as EOG
llm_load_vocab: control token: 128105 '<|reserved_special_token_96|>' is not marked as EOG
llm_load_vocab: control token: 128104 '<|reserved_special_token_95|>' is not marked as EOG
llm_load_vocab: control token: 128101 '<|reserved_special_token_92|>' is not marked as EOG
llm_load_vocab: control token: 128098 '<|reserved_special_token_89|>' is not marked as EOG
llm_load_vocab: control token: 128097 '<|reserved_special_token_88|>' is not marked as EOG
llm_load_vocab: control token: 128095 '<|reserved_special_token_86|>' is not marked as EOG
llm_load_vocab: control token: 128094 '<|reserved_special_token_85|>' is not marked as EOG
llm_load_vocab: control token: 128091 '<|reserved_special_token_82|>' is not marked as EOG
llm_load_vocab: control token: 128090 '<|reserved_special_token_81|>' is not marked as EOG
llm_load_vocab: control token: 128088 '<|reserved_special_token_79|>' is not marked as EOG
llm_load_vocab: control token: 128086 '<|reserved_special_token_77|>' is not marked as EOG
llm_load_vocab: control token: 128081 '<|reserved_special_token_72|>' is not marked as EOG
llm_load_vocab: control token: 128078 '<|reserved_special_token_69|>' is not marked as EOG
llm_load_vocab: control token: 128077 '<|reserved_special_token_68|>' is not marked as EOG
llm_load_vocab: control token: 128074 '<|reserved_special_token_65|>' is not marked as EOG
llm_load_vocab: control token: 128071 '<|reserved_special_token_62|>' is not marked as EOG
llm_load_vocab: control token: 128070 '<|reserved_special_token_61|>' is not marked as EOG
llm_load_vocab: control token: 128068 '<|reserved_special_token_59|>' is not marked as EOG
llm_load_vocab: control token: 128065 '<|reserved_special_token_56|>' is not marked as EOG
llm_load_vocab: control token: 128063 '<|reserved_special_token_54|>' is not marked as EOG
llm_load_vocab: control token: 128062 '<|reserved_special_token_53|>' is not marked as EOG
llm_load_vocab: control token: 128061 '<|reserved_special_token_52|>' is not marked as EOG
llm_load_vocab: control token: 128055 '<|reserved_special_token_46|>' is not marked as EOG
llm_load_vocab: control token: 128046 '<|reserved_special_token_37|>' is not marked as EOG
llm_load_vocab: control token: 128045 '<|reserved_special_token_36|>' is not marked as EOG
llm_load_vocab: control token: 128044 '<|reserved_special_token_35|>' is not marked as EOG
llm_load_vocab: control token: 128043 '<|reserved_special_token_34|>' is not marked as EOG
llm_load_vocab: control token: 128039 '<|reserved_special_token_30|>' is not marked as EOG
llm_load_vocab: control token: 128038 '<|reserved_special_token_29|>' is not marked as EOG
llm_load_vocab: control token: 128036 '<|reserved_special_token_27|>' is not marked as EOG
llm_load_vocab: control token: 128035 '<|reserved_special_token_26|>' is not marked as EOG
llm_load_vocab: control token: 128034 '<|reserved_special_token_25|>' is not marked as EOG
llm_load_vocab: control token: 128033 '<|reserved_special_token_24|>' is not marked as EOG
llm_load_vocab: control token: 128031 '<|reserved_special_token_22|>' is not marked as EOG
llm_load_vocab: control token: 128030 '<|reserved_special_token_21|>' is not marked as EOG
llm_load_vocab: control token: 128029 '<|reserved_special_token_20|>' is not marked as EOG
llm_load_vocab: control token: 128027 '<|reserved_special_token_18|>' is not marked as EOG
llm_load_vocab: control token: 128026 '<|reserved_special_token_17|>' is not marked as EOG
llm_load_vocab: control token: 128025 '<|reserved_special_token_16|>' is not marked as EOG
llm_load_vocab: control token: 128023 '<|reserved_special_token_14|>' is not marked as EOG
llm_load_vocab: control token: 128021 '<|reserved_special_token_12|>' is not marked as EOG
llm_load_vocab: control token: 128018 '<|reserved_special_token_9|>' is not marked as EOG
llm_load_vocab: control token: 128017 '<|reserved_special_token_8|>' is not marked as EOG
llm_load_vocab: control token: 128016 '<|reserved_special_token_7|>' is not marked as EOG
llm_load_vocab: control token: 128015 '<|reserved_special_token_6|>' is not marked as EOG
llm_load_vocab: control token: 128014 '<|reserved_special_token_5|>' is not marked as EOG
llm_load_vocab: control token: 128012 '<|reserved_special_token_3|>' is not marked as EOG
llm_load_vocab: control token: 128010 '<|python_tag|>' is not marked as EOG
llm_load_vocab: control token: 128006 '<|start_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128005 '<|step_id|>' is not marked as EOG
llm_load_vocab: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
llm_load_vocab: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
llm_load_vocab: control token: 128000 '<|begin_of_text|>' is not marked as EOG
llm_load_vocab: control token: 128042 '<|reserved_special_token_33|>' is not marked as EOG
llm_load_vocab: control token: 128064 '<|reserved_special_token_55|>' is not marked as EOG
llm_load_vocab: control token: 128047 '<|reserved_special_token_38|>' is not marked as EOG
llm_load_vocab: control token: 128007 '<|end_header_id|>' is not marked as EOG
llm_load_vocab: control token: 128066 '<|reserved_special_token_57|>' is not marked as EOG
llm_load_vocab: control token: 128172 '<|reserved_special_token_163|>' is not marked as EOG
llm_load_vocab: control token: 128163 '<|reserved_special_token_154|>' is not marked as EOG
llm_load_vocab: control token: 128166 '<|reserved_special_token_157|>' is not marked as EOG
llm_load_vocab: control token: 128058 '<|reserved_special_token_49|>' is not marked as EOG
llm_load_vocab: control token: 128051 '<|reserved_special_token_42|>' is not marked as EOG
llm_load_vocab: control token: 128057 '<|reserved_special_token_48|>' is not marked as EOG
llm_load_vocab: control token: 128231 '<|reserved_special_token_222|>' is not marked as EOG
llm_load_vocab: control token: 128099 '<|reserved_special_token_90|>' is not marked as EOG
llm_load_vocab: control token: 128154 '<|reserved_special_token_145|>' is not marked as EOG
llm_load_vocab: control token: 128085 '<|reserved_special_token_76|>' is not marked as EOG
llm_load_vocab: control token: 128083 '<|reserved_special_token_74|>' is not marked as EOG
llm_load_vocab: control token: 128103 '<|reserved_special_token_94|>' is not marked as EOG
llm_load_vocab: control token: 128254 '<|reserved_special_token_245|>' is not marked as EOG
llm_load_vocab: control token: 128180 '<|reserved_special_token_171|>' is not marked as EOG
llm_load_vocab: control token: 128072 '<|reserved_special_token_63|>' is not marked as EOG
llm_load_vocab: control token: 128136 '<|reserved_special_token_127|>' is not marked as EOG
llm_load_vocab: control token: 128162 '<|reserved_special_token_153|>' is not marked as EOG
llm_load_vocab: control token: 128165 '<|reserved_special_token_156|>' is not marked as EOG
llm_load_vocab: control token: 128135 '<|reserved_special_token_126|>' is not marked as EOG
llm_load_vocab: control token: 128256 '<|image|>' is not marked as EOG
llm_load_vocab: control token: 128250 '<|reserved_special_token_241|>' is not marked as EOG
llm_load_vocab: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
llm_load_vocab: control token: 128037 '<|reserved_special_token_28|>' is not marked as EOG
llm_load_vocab: control token: 128149 '<|reserved_special_token_140|>' is not marked as EOG
llm_load_vocab: control token: 128182 '<|reserved_special_token_173|>' is not marked as EOG
llm_load_vocab: control token: 128223 '<|reserved_special_token_214|>' is not marked as EOG
llm_load_vocab: control token: 128076 '<|reserved_special_token_67|>' is not marked as EOG
llm_load_vocab: control token: 128242 '<|reserved_special_token_233|>' is not marked as EOG
llm_load_vocab: control token: 128052 '<|reserved_special_token_43|>' is not marked as EOG
llm_load_vocab: control token: 128069 '<|reserved_special_token_60|>' is not marked as EOG
llm_load_vocab: control token: 128150 '<|reserved_special_token_141|>' is not marked as EOG
llm_load_vocab: control token: 128202 '<|reserved_special_token_193|>' is not marked as EOG
llm_load_vocab: control token: 128059 '<|reserved_special_token_50|>' is not marked as EOG
llm_load_vocab: control token: 128147 '<|reserved_special_token_138|>' is not marked as EOG
llm_load_vocab: control token: 128144 '<|reserved_special_token_135|>' is not marked as EOG
llm_load_vocab: control token: 128024 '<|reserved_special_token_15|>' is not marked as EOG
llm_load_vocab: control token: 128040 '<|reserved_special_token_31|>' is not marked as EOG
llm_load_vocab: control token: 128133 '<|reserved_special_token_124|>' is not marked as EOG
llm_load_vocab: control token: 128102 '<|reserved_special_token_93|>' is not marked as EOG
llm_load_vocab: control token: 128213 '<|reserved_special_token_204|>' is not marked as EOG
llm_load_vocab: control token: 128190 '<|reserved_special_token_181|>' is not marked as EOG
llm_load_vocab: control token: 128226 '<|reserved_special_token_217|>' is not marked as EOG
llm_load_vocab: control token: 128130 '<|reserved_special_token_121|>' is not marked as EOG
llm_load_vocab: control token: 128011 '<|reserved_special_token_2|>' is not marked as EOG
llm_load_vocab: control token: 128079 '<|reserved_special_token_70|>' is not marked as EOG
llm_load_vocab: control token: 128164 '<|reserved_special_token_155|>' is not marked as EOG
llm_load_vocab: control token: 128073 '<|reserved_special_token_64|>' is not marked as EOG
llm_load_vocab: control token: 128113 '<|reserved_special_token_104|>' is not marked as EOG
llm_load_vocab: control token: 128187 '<|reserved_special_token_178|>' is not marked as EOG
llm_load_vocab: control token: 128096 '<|reserved_special_token_87|>' is not marked as EOG
llm_load_vocab: control token: 128110 '<|reserved_special_token_101|>' is not marked as EOG
llm_load_vocab: control token: 128100 '<|reserved_special_token_91|>' is not marked as EOG
llm_load_vocab: control token: 128139 '<|reserved_special_token_130|>' is not marked as EOG
llm_load_vocab: control token: 128194 '<|reserved_special_token_185|>' is not marked as EOG
llm_load_vocab: control token: 128200 '<|reserved_special_token_191|>' is not marked as EOG
llm_load_vocab: control token: 128049 '<|reserved_special_token_40|>' is not marked as EOG
llm_load_vocab: control token: 128089 '<|reserved_special_token_80|>' is not marked as EOG
llm_load_vocab: control token: 128193 '<|reserved_special_token_184|>' is not marked as EOG
llm_load_vocab: control token: 128137 '<|reserved_special_token_128|>' is not marked as EOG
llm_load_vocab: control token: 128093 '<|reserved_special_token_84|>' is not marked as EOG
llm_load_vocab: control token: 128159 '<|reserved_special_token_150|>' is not marked as EOG
llm_load_vocab: control token: 128050 '<|reserved_special_token_41|>' is not marked as EOG
llm_load_vocab: control token: 128032 '<|reserved_special_token_23|>' is not marked as EOG
llm_load_vocab: control token: 128183 '<|reserved_special_token_174|>' is not marked as EOG
llm_load_vocab: control token: 128067 '<|reserved_special_token_58|>' is not marked as EOG
llm_load_vocab: control token: 128181 '<|reserved_special_token_172|>' is not marked as EOG
llm_load_vocab: control token: 128234 '<|reserved_special_token_225|>' is not marked as EOG
llm_load_vocab: control token: 128080 '<|reserved_special_token_71|>' is not marked as EOG
llm_load_vocab: control token: 128082 '<|reserved_special_token_73|>' is not marked as EOG
llm_load_vocab: control token: 128232 '<|reserved_special_token_223|>' is not marked as EOG
llm_load_vocab: control token: 128197 '<|reserved_special_token_188|>' is not marked as EOG
llm_load_vocab: control token: 128048 '<|reserved_special_token_39|>' is not marked as EOG
llm_load_vocab: control token: 128084 '<|reserved_special_token_75|>' is not marked as EOG
llm_load_vocab: control token: 128140 '<|reserved_special_token_131|>' is not marked as EOG
llm_load_vocab: control token: 128132 '<|reserved_special_token_123|>' is not marked as EOG
llm_load_vocab: control token: 128119 '<|reserved_special_token_110|>' is not marked as EOG
llm_load_vocab: control token: 128054 '<|reserved_special_token_45|>' is not marked as EOG
llm_load_vocab: control token: 128221 '<|reserved_special_token_212|>' is not marked as EOG
llm_load_vocab: control token: 128109 '<|reserved_special_token_100|>' is not marked as EOG
llm_load_vocab: control token: 128092 '<|reserved_special_token_83|>' is not marked as EOG
llm_load_vocab: control token: 128204 '<|reserved_special_token_195|>' is not marked as EOG
llm_load_vocab: control token: 128060 '<|reserved_special_token_51|>' is not marked as EOG
llm_load_vocab: control token: 128020 '<|reserved_special_token_11|>' is not marked as EOG
llm_load_vocab: control token: 128171 '<|reserved_special_token_162|>' is not marked as EOG
llm_load_vocab: control token: 128206 '<|reserved_special_token_197|>' is not marked as EOG
llm_load_vocab: control token: 128041 '<|reserved_special_token_32|>' is not marked as EOG
llm_load_vocab: control token: 128201 '<|reserved_special_token_192|>' is not marked as EOG
llm_load_vocab: control token: 128237 '<|reserved_special_token_228|>' is not marked as EOG
llm_load_vocab: control token: 128146 '<|reserved_special_token_137|>' is not marked as EOG
llm_load_vocab: control token: 128169 '<|reserved_special_token_160|>' is not marked as EOG
llm_load_vocab: control token: 128215 '<|reserved_special_token_206|>' is not marked as EOG
llm_load_vocab: control token: 128138 '<|reserved_special_token_129|>' is not marked as EOG
llm_load_vocab: control token: 128233 '<|reserved_special_token_224|>' is not marked as EOG
llm_load_vocab: control token: 128240 '<|reserved_special_token_231|>' is not marked as EOG
llm_load_vocab: control token: 128056 '<|reserved_special_token_47|>' is not marked as EOG
llm_load_vocab: control token: 128229 '<|reserved_special_token_220|>' is not marked as EOG
llm_load_vocab: control token: 128207 '<|reserved_special_token_198|>' is not marked as EOG
llm_load_vocab: control token: 128019 '<|reserved_special_token_10|>' is not marked as EOG
llm_load_vocab: control token: 128013 '<|reserved_special_token_4|>' is not marked as EOG
llm_load_vocab: control token: 128199 '<|reserved_special_token_190|>' is not marked as EOG
llm_load_vocab: control token: 128022 '<|reserved_special_token_13|>' is not marked as EOG
llm_load_vocab: control token: 128087 '<|reserved_special_token_78|>' is not marked as EOG
llm_load_vocab: control token: 128075 '<|reserved_special_token_66|>' is not marked as EOG
llm_load_vocab: control token: 128028 '<|reserved_special_token_19|>' is not marked as EOG
llm_load_vocab: control token: 128243 '<|reserved_special_token_234|>' is not marked as EOG
llm_load_vocab: control token: 128156 '<|reserved_special_token_147|>' is not marked as EOG
llm_load_vocab: control token: 128053 '<|reserved_special_token_44|>' is not marked as EOG
llm_load_vocab: control token: 128247 '<|reserved_special_token_238|>' is not marked as EOG
llm_load_vocab: control token: 128118 '<|reserved_special_token_109|>' is not marked as EOG
llm_load_vocab: control token: 128238 '<|reserved_special_token_229|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 257
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = mllama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 11B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 10.67 B
llm_load_print_meta: model size = 8.88 GiB (7.15 BPW)
llm_load_print_meta: general.name = Llama 3.2 11B Vision Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: PAD token = 128004 '<|finetune_right_pad_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128001 '<|end_of_text|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
ggml_backend_metal_log_allocated_size: allocated buffer, size = 9094.64 MiB, ( 9094.72 / 16384.02)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors: Metal_Mapped model buffer size = 9094.63 MiB
llm_load_tensors: CPU_Mapped model buffer size = 3696.09 MiB
.............................................................................................
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: n_ctx_per_seq = 4096
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3
ggml_metal_init: picking default device: Apple M3
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M3
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 17179.89 MB
ggml_metal_init: loaded kernel_add 0x152705fc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x152706a10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x152706c40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x152707030 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x152707420 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x152707810 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x152707c00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x152707ff0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x1527083e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x152708ab0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x152708ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x152709400 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x152709b00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x15270a210 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x15270a970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x15270afe0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x15270b650 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x15270bcd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x15270c340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x15270cc50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x15270d2d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x15270d950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x15270dfc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x15270e7c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x15270ee30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16 0x15270f220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16_4 0x15270f660 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32 0x15270faa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32_4 0x15270fee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf 0x152710320 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x152710890 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x152710c80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x15270c730 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: loaded kernel_get_rows_q4_0 0x152711280 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x152711670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x152711a60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x152711e50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x152712240 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x152712630 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x152712a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x152712e10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x152713200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x1527135f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x1527139e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x152713e20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x152714260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x1527146a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x152714df0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x152715230 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x152715670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x152715ab0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x152715ef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x152716330 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rms_norm 0x152716770 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_group_norm 0x152716b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_norm 0x152716f50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_conv_f32 0x152717340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_scan_f32 0x152717780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f32_f32 0x152717bc0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_mul_mv_f16_f32 0x152717fb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row 0x1527183a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4 0x152718790 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f16 0x152718b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32 0x152718f70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32 0x152719360 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32 0x152719750 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32 0x152719b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32 0x152719f30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_2 0x15271a320 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_3 0x15271a770 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_4 0x15271abc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_f16_f32_r1_5 0x15271b010 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_2 0x15271b460 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_3 0x15271b8b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_4 0x15271bd00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_0_f32_r1_5 0x15271c150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_2 0x15271c5a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_3 0x15271c9f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_4 0x15271ce40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_1_f32_r1_5 0x15271d290 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_2 0x15271d6e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_3 0x15271db30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_4 0x15271df80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_0_f32_r1_5 0x15271e3d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_2 0x15271e820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_3 0x15271ec70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_4 0x15271f0c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_1_f32_r1_5 0x15271f510 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_2 0x15271f960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_3 0x15271fdb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_4 0x152720200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q8_0_f32_r1_5 0x152720650 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_2 0x152720aa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_3 0x152720ef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_4 0x152721340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q4_K_f32_r1_5 0x152721790 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_2 0x152721be0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_3 0x152722030 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_4 0x152722260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q5_K_f32_r1_5 0x152722710 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_2 0x152722b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_3 0x152722fb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_4 0x152723400 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_q6_K_f32_r1_5 0x152723850 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_2 0x152723ca0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_3 0x1527240f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_4 0x152724540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_ext_iq4_nl_f32_r1_5 0x152724990 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32 0x152724de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32 0x1527251d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32 0x1527255c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32 0x1527259b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32 0x152725da0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32 0x152726190 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32 0x152726580 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32 0x152726970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32 0x152726d60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32 0x152727150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32 0x152727540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32 0x152727930 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32 0x152727d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32 0x152728160 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32 0x152728550 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32 0x152728940 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32 0x152728d30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32 0x152729120 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32 0x152729510 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32 0x152729900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32 0x152729cf0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32 0x15272a0e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32 0x15272a4d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32 0x15272a8c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32 0x15272acb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32 0x15272b0a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32 0x15272b490 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32 0x15272b880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32 0x15272bc70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32 0x15272c060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32 0x15272c450 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32 0x15272c840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32 0x15272cc30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32 0x15272d020 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32 0x15272d410 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f32_f32 0x15272d800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x15272dbf0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x15272dfe0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x15272e3d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32 0x15272e7c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32 0x15272ebb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x15272efa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x15272f390 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x15272f780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x15272fb70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x15272ff60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x152730350 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32 0x152730740 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32 0x152730b30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32 0x152730f20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32 0x152731310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32 0x152731700 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32 0x152731af0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32 0x152731ee0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32 0x1527322d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32 0x1527326c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32 0x152732ab0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32 0x152732ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32 0x152733290 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32 0x152733680 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32 0x152733a70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32 0x152733e60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32 0x152734250 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32 0x152734640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32 0x152734a30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32 0x152734e20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32 0x152735210 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32 0x152735600 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32 0x1527359f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32 0x152735de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32 0x1527361d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32 0x1527365c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32 0x1527369b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32 0x152736da0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32 0x152737190 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32 0x152737580 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32 0x152737970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f32 0x152737d60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f16 0x1527381b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f32 0x152738600 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f16 0x152738a50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f16 0x152738ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f32 0x1527392e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f16 0x152739720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f32 0x152739b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32_f32 0x152739fa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16_f32 0x15273a390 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_upscale_f32 0x15273a780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_f32 0x15273abc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_reflect_1d_f32 0x15273b000 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_timestep_embedding_f32 0x15273b440 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_arange_f32 0x15273b830 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc 0x15273bc20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc 0x15273c010 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_leaky_relu_f32 0x15273c720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64 0x15273cb10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80 0x15273cf60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96 0x15273d3b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112 0x15273d800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128 0x15273dc50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256 0x15273e0a0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64 0x15273e4f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80 0x15273e940 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96 0x15273ed90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112 0x15273f1e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128 0x15273f630 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256 0x15273fa80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64 0x15273fed0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80 0x152740320 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96 0x152740770 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112 0x152740bc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128 0x152741010 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256 0x152741460 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64 0x1527418b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80 0x152741d00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96 0x152742150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112 0x1527425a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128 0x1527429f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256 0x152742e40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64 0x152743290 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80 0x1527436e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96 0x152743b30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112 0x152743f80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128 0x1527443d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256 0x152744820 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64 0x152744c70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80 0x1527450c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96 0x152745510 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112 0x152745960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128 0x152745db0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256 0x152746200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128 0x152746650 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128 0x152746aa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128 0x152746ef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128 0x152747340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128 0x152747790 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128 0x152747be0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256 0x152748030 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256 0x152748480 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256 0x1527488d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256 0x152748d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256 0x152749170 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256 0x1527495c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_f32 0x152749a10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_set_i32 0x152749e00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f32 0x15274a1f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f16 0x15274a5e0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f16_f32 0x15274a9d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f16 0x15274adc0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f32_q8_0 0x15274b1b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_0 0x15274b5a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_1 0x15274b990 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_0 0x15274bd80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_1 0x15274c170 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl 0x15274c560 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_concat 0x15274c950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqr 0x15274d020 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqrt 0x15274d690 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sin 0x15274dd00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cos 0x15274e370 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sum_rows 0x15274e760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argmax 0x15274eba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_avg_f32 0x15274ef90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_max_f32 0x15274f3d0 | th_max = 1024 | th_width = 32
llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 40
llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: Metal KV buffer size = 912.25 MiB
llama_new_context_with_model: KV self size = 912.25 MiB, K (f16): 456.12 MiB, V (f16): 456.12 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 296.00 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 16.01 MiB
llama_new_context_with_model: Metal compute buffer size = 296.00 MiB
llama_new_context_with_model: CPU compute buffer size = 16.01 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 2
token = 128006
token = 882
token = 128007
token = 271
token = 128256
token = 3923
token = 374
token = 304
token = 420
token = 2217
token = 30
token = 128009
token = 128006
token = 78191
token = 128007
token = 271
Calculating optimal canvas for image 1280x748 with max_tiles=4, tile_size=560
Possible ratios and their canvas sizes:
Ratio 1x1 -> Canvas 560x560 (scale_w=0.438 scale_h=0.749 selected=0.438)
Ratio 1x2 -> Canvas 560x1120 (scale_w=0.438 scale_h=1.497 selected=0.438)
Ratio 1x3 -> Canvas 560x1680 (scale_w=0.438 scale_h=2.246 selected=0.438)
Ratio 1x4 -> Canvas 560x2240 (scale_w=0.438 scale_h=2.995 selected=0.438)
Ratio 2x1 -> Canvas 1120x560 (scale_w=0.875 scale_h=0.749 selected=0.749)
Ratio 2x2 -> Canvas 1120x1120 (scale_w=0.875 scale_h=1.497 selected=0.875)
Ratio 3x1 -> Canvas 1680x560 (scale_w=1.312 scale_h=0.749 selected=0.749)
Ratio 4x1 -> Canvas 2240x560 (scale_w=1.750 scale_h=0.749 selected=0.749)
Selected scale: 0.875000 (upscale=0)
Candidate canvas 1120x1120 (area=1254400)
Final selected canvas 1120x1120
Get image size fit to canvas: img=1280x748, canvas=1120x1120, tile=560
Now resize image to size: 1120x654
Padding image to size 560x560 with aspect ratio 2x2
Padded image to size 1120x1120
Splitting into 2x2 tiles
split_to_tiles: img_width=1120, img_height=1120, tile_width=560, tile_height=560, tiles_x=2, tiles_y=2
Processing tile [0,0], source region: x=0-559, y=0-559
Tile[0,0] at (0,0): src=(16,147,193) -> dst=(16,147,193)
Tile[0,0] at (1,0): src=(15,146,192) -> dst=(15,146,192)
Tile[0,0] at (2,0): src=(12,145,192) -> dst=(12,145,192)
Tile[0,0] at (0,1): src=(15,148,194) -> dst=(15,148,194)
Tile[0,0] at (1,1): src=(14,148,193) -> dst=(14,148,193)
Tile[0,0] at (2,1): src=(10,147,192) -> dst=(10,147,192)
Tile[0,0] at (0,2): src=(8,145,189) -> dst=(8,145,189)
Tile[0,0] at (1,2): src=(7,145,190) -> dst=(7,145,190)
Tile[0,0] at (2,2): src=(5,145,191) -> dst=(5,145,191)
Processing tile [1,0], source region: x=560-1119, y=0-559
Tile[1,0] at (0,0): src=(195,221,236) -> dst=(195,221,236)
Tile[1,0] at (1,0): src=(195,221,236) -> dst=(195,221,236)
Tile[1,0] at (2,0): src=(197,220,236) -> dst=(197,220,236)
Tile[1,0] at (0,1): src=(192,217,232) -> dst=(192,217,232)
Tile[1,0] at (1,1): src=(194,218,233) -> dst=(194,218,233)
Tile[1,0] at (2,1): src=(196,219,235) -> dst=(196,219,235)
Tile[1,0] at (0,2): src=(192,216,230) -> dst=(192,216,230)
Tile[1,0] at (1,2): src=(194,217,231) -> dst=(194,217,231)
Tile[1,0] at (2,2): src=(195,218,232) -> dst=(195,218,232)
Processing tile [0,1], source region: x=0-559, y=560-1119
Tile[0,1] at (0,0): src=(38,34,35) -> dst=(38,34,35)
Tile[0,1] at (1,0): src=(25,21,23) -> dst=(25,21,23)
Tile[0,1] at (2,0): src=(0,0,0) -> dst=(0,0,0)
Tile[0,1] at (0,1): src=(24,20,21) -> dst=(24,20,21)
Tile[0,1] at (1,1): src=(18,14,15) -> dst=(18,14,15)
Tile[0,1] at (2,1): src=(0,0,0) -> dst=(0,0,0)
Tile[0,1] at (0,2): src=(13,9,10) -> dst=(13,9,10)
Tile[0,1] at (1,2): src=(11,7,8) -> dst=(11,7,8)
Tile[0,1] at (2,2): src=(16,11,13) -> dst=(16,11,13)
Processing tile [1,1], source region: x=560-1119, y=560-1119
Tile[1,1] at (0,0): src=(126,124,129) -> dst=(126,124,129)
Tile[1,1] at (1,0): src=(216,214,220) -> dst=(216,214,220)
Tile[1,1] at (2,0): src=(177,176,181) -> dst=(177,176,181)
Tile[1,1] at (0,1): src=(109,107,112) -> dst=(109,107,112)
Tile[1,1] at (1,1): src=(223,221,227) -> dst=(223,221,227)
Tile[1,1] at (2,1): src=(182,181,186) -> dst=(182,181,186)
Tile[1,1] at (0,2): src=(109,108,113) -> dst=(109,108,113)
Tile[1,1] at (1,2): src=(225,224,230) -> dst=(225,224,230)
Tile[1,1] at (2,2): src=(185,184,189) -> dst=(185,184,189)
Processing tile 0
Processing tile 1
Processing tile 2
Processing tile 3
aspect_ratio=6
Tile 0 first 10 values:
[0] = -1.558688
[1] = -1.573286
[2] = -1.617081
[3] = -1.675475
[4] = -1.719270
[5] = -1.733869
[6] = -1.748467
[7] = -1.763066
[8] = -1.792263
[9] = -1.792263
Tile 1 first 10 values:
[0] = 1.054431
[1] = 1.054431
[2] = 1.083627
[3] = 1.083627
[4] = 1.083627
[5] = 1.098226
[6] = 1.127423
[7] = 1.142021
[8] = 1.127423
[9] = 1.112824
Tile 2 first 10 values:
[0] = -1.237522
[1] = -1.427302
[2] = -1.792263
[3] = -0.288625
[4] = -0.098845
[5] = -1.047743
[6] = -0.040451
[7] = -1.164530
[8] = -1.660877
[9] = -1.558688
Tile 3 first 10 values:
[0] = 0.047139
[1] = 1.360998
[2] = 0.791659
[3] = 0.587281
[4] = 0.879250
[5] = 0.061738
[6] = -1.587885
[7] = -1.704672
[8] = -1.792263
[9] = -1.792263
n_positions bytes: 6404, n_positions: 1601
vision encoder output[0] = 9.585714
vision encoder output[1] = 14.321547
vision encoder output[2] = -3.193105
vision encoder output[3] = 5.831894
vision encoder output[4] = 0.395433
vision encoder output[5] = -13.520039
vision encoder output[6] = -2.124158
vision encoder output[7] = 3.160614
vision encoder output[8] = -7.931821
vision encoder output[9] = -4.416915
n_img_tokens = 1
ca_patch_emd[0] = 9.585714
ca_patch_emd[1] = 14.321547
ca_patch_emd[2] = -3.193105
ca_patch_emd[3] = 5.831894
ca_patch_emd[4] = 0.395433
ca_patch_emd[5] = -13.520039
ca_patch_emd[6] = -2.124158
ca_patch_emd[7] = 3.160614
ca_patch_emd[8] = -7.931821
ca_patch_emd[9] = -4.416915
This image shows a cityscape of New York City. In the center of the image is the Empire State Building, a skyscraper in Midtown Manhattan, New York City. It is known as "The Empire State" and stands at a height of 1,454 feet (443 meters). It
main: decoded 60 tokens in 5.79 s, speed: 10.37 t/s
llama_perf_context_print: load time = 77683.33 ms
llama_perf_context_print: prompt eval time = 1311.75 ms / 17 tokens ( 77.16 ms per token, 12.96 tokens per second)
llama_perf_context_print: eval time = 5683.89 ms / 59 runs ( 96.34 ms per token, 10.38 tokens per second)
llama_perf_context_print: total time = 83469.91 ms / 76 tokens
ggml_metal_free: deallocating