Skip to content

Bug: cant finetune #7643

Closed
Closed
@cabfile

Description

@cabfile

What happened?

GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02

Name and Version

version: 2965 (03d8900e)
built with MSVC 19.39.33523.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

E:\slm\llama\other>finetune --model-base ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.g
guf --checkpoint-in chk-piss-LATEST.gguf --checkpoint-out chk-piss-ITERATION.ggu
f --lora-out piss-ITERATION.bin --train-data traindata.txt --save-every 10 --thr
eads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
main: seed: 1717079846
main: model base = '..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf'
llama_model_loader: loaded meta data with 21 key-value pairs and 201 tensors fro
m ..\..\tinyllama-1.1b-chat-v0.6-q4_0_2.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not appl
y in this output.
llama_model_loader: - kv   0:                       general.architecture str
          = llama
llama_model_loader: - kv   1:                               general.name str
          = models
llama_model_loader: - kv   2:                       llama.context_length u32
          = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32
          = 2048
llama_model_loader: - kv   4:                          llama.block_count u32
          = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
          = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
          = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32
          = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
          = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
          = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32
          = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32
          = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str
          = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str
,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32
,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32
,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32
          = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32
          = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32
          = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32
          = 2
llama_model_loader: - kv  20:               general.quantization_version u32
          = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_0:  155 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 259.
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW)
llm_load_print_meta: general.name     = models
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MiB
llm_load_tensors:        CPU buffer size =   606.53 MiB
................................................................................
.....
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =    11.00 MiB
llama_new_context_with_model: KV self size  =   11.00 MiB, K (f16):    5.50 MiB,
 V (f16):    5.50 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB
llama_new_context_with_model:        CPU compute buffer size =    66.50 MiB
llama_new_context_with_model: graph nodes  = 710
llama_new_context_with_model: graph splits = 1
main: init model
print_params: n_vocab               : 32000
print_params: n_ctx                 : 64
print_params: n_embd                : 2048
print_params: n_ff                  : 5632
print_params: n_head                : 32
print_params: n_head_kv             : 4
print_params: n_layer               : 22
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 10000.000000
print_params: rope_freq_scale       : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq             : 4
print_lora_params: n_rank_wk             : 4
print_lora_params: n_rank_wv             : 4
print_lora_params: n_rank_wo             : 4
print_lora_params: n_rank_ffn_norm       : 1
print_lora_params: n_rank_ffn_gate       : 4
print_lora_params: n_rank_ffn_down       : 4
print_lora_params: n_rank_ffn_up         : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm           : 1
print_lora_params: n_rank_output         : 4
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: lora_size = 28472224 bytes (27.2 MB)
main: opt_size  = 42223360 bytes (40.3 MB)
main: opt iter 0
main: input_size = 32769056 bytes (31.3 MB)
main: compute_size = 1507336544 bytes (1437.5 MB)
main: evaluation order = RIGHT_TO_LEFT
main: tokenize training data from traindata.txt
main: sample-start:
main: include-sample-start: false
tokenize_file: total number of samples: 1
main: number of training tokens: 12
main: number of unique tokens: 12
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 512240 bytes (0.5 MB)
train_opt_callback: iter=     0 sample=1/1 sched=0.000000 loss=0.000000 |->
train_opt_callback: reshuffle samples. completed epochs: 1
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02
GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml.c:12853: ne2 == ne02

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions