Closed
Description
What happened?
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
Name and Version
version: 2965 (03d8900e)
built with MSVC 19.39.33523.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
E:\slm\llama\other>finetune --model-base ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.g
guf --checkpoint-in chk-piss-LATEST.gguf --checkpoint-out chk-piss-ITERATION.ggu
f --lora-out piss-ITERATION.bin --train-data traindata.txt --save-every 10 --thr
eads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
main: seed: 1717079846
main: model base = '..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf'
llama_model_loader: loaded meta data with 21 key-value pairs and 201 tensors fro
m ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not appl
y in this output.
llama_model_loader: - kv 0: general.architecture str
= llama
llama_model_loader: - kv 1: general.name str
= models
llama_model_loader: - kv 2: llama.context_length u32
= 2048
llama_model_loader: - kv 3: llama.embedding_length u32
= 2048
llama_model_loader: - kv 4: llama.block_count u32
= 22
llama_model_loader: - kv 5: llama.feed_forward_length u32
= 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32
= 64
llama_model_loader: - kv 7: llama.attention.head_count u32
= 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
= 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
= 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32
= 10000.000000
llama_model_loader: - kv 11: general.file_type u32
= 2
llama_model_loader: - kv 12: tokenizer.ggml.model str
= llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str
,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32
,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32
,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32
= 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32
= 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32
= 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32
= 2
llama_model_loader: - kv 20: general.quantization_version u32
= 2
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type q4_0: 155 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens cache size = 259.
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 1.10 B
llm_load_print_meta: model size = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 2 '</s>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.10 MiB
llm_load_tensors: CPU buffer size = 606.53 MiB
................................................................................
.....
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 11.00 MiB
llama_new_context_with_model: KV self size = 11.00 MiB, K (f16): 5.50 MiB,
V (f16): 5.50 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 66.50 MiB
llama_new_context_with_model: graph nodes = 710
llama_new_context_with_model: graph splits = 1
main: init model
print_params: n_vocab : 32000
print_params: n_ctx : 64
print_params: n_embd : 2048
print_params: n_ff : 5632
print_params: n_head : 32
print_params: n_head_kv : 4
print_params: n_layer : 22
print_params: norm_rms_eps : 0.000010
print_params: rope_freq_base : 10000.000000
print_params: rope_freq_scale : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq : 4
print_lora_params: n_rank_wk : 4
print_lora_params: n_rank_wv : 4
print_lora_params: n_rank_wo : 4
print_lora_params: n_rank_ffn_norm : 1
print_lora_params: n_rank_ffn_gate : 4
print_lora_params: n_rank_ffn_down : 4
print_lora_params: n_rank_ffn_up : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm : 1
print_lora_params: n_rank_output : 4
main: total train_iterations 0
main: seen train_samples 0
main: seen train_tokens 0
main: completed train_epochs 0
main: lora_size = 28472224 bytes (27.2 MB)
main: opt_size = 42223360 bytes (40.3 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 1507336544 bytes (1437.5 MB)
main: evaluation order = RIGHT_TO_LEFT
main: tokenize training data from traindata.txt
main: sample-start:
main: include-sample-start: false
tokenize_file: total number of samples: 1
main: number of training tokens: 12
main: number of unique tokens: 12
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 512240 bytes (0.5 MB)
train_opt_callback: iter= 0 sample=1/1 sched=0.000000 loss=0.000000 |->
train_opt_callback: reshuffle samples. completed epochs: 1
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02