Closed
Description
🐛 Describe the bug
Using docker and gemma finetuned model
--model /data/merged_model_GPTQ --max-model-len 8192 --max-num-seqs 1024 --served-model-name model --quantization gptq_marlin
fails with
RuntimeError: Some weights are not initialized from checkpoints: {'model.layers.3.mlp.gate_up_proj.g_idx_sort_indices', 'model.layers.8.self_attn.qkv_proj.g_idx_sort_indices', 'model.layers.9.mlp.gate_up_proj.g_idx_sort_indices .....
The same works with --quantization gptq
Activity