More optimizations on metal #2959

ikawrakow · 2023-09-01T16:12:59Z

On 30-core M2 Max:

model	backend	test	t/s (Master)	t/s (PR)	Speedup
LLaMA 7B mostly Q4_0	Metal	tg 32	62.16 ± 0.07	62.57 ± 0.08	1.007
LLaMA 7B mostly Q4_0	Metal	tg 64	61.65 ± 0.05	62.09 ± 0.08	1.007
LLaMA 7B mostly Q4_0	Metal	tg 128	61.22 ± 0.13	61.71 ± 0.05	1.008
LLaMA 7B mostly Q4_0	Metal	tg 256	58.44 ± 0.03	59.46 ± 0.15	1.017
LLaMA 7B mostly Q4_0	Metal	pp 32	382.65 ± 1.92	388.64 ± 1.47	1.016
LLaMA 7B mostly Q4_0	Metal	pp 64	450.66 ± 1.26	468.87 ± 1.91	1.040
LLaMA 7B mostly Q4_0	Metal	pp 128	444.64 ± 0.62	484.74 ± 0.49	1.090
LLaMA 7B mostly Q4_0	Metal	pp 256	406.50 ± 0.21	479.20 ± 0.55	1.179
LLaMA 7B mostly Q4_0	Metal	pp 512	327.79 ± 0.18	433.35 ± 0.17	1.322
LLaMA 7B mostly Q4_0	Metal	pp 1024	227.97 ± 0.10	352.67 ± 0.04	1.547

With these changes along with the merged #2951, perplexity now runs in 13.6 minutes on my M2 Max laptop vs ~24 minutes before.

ggerganov · 2023-09-02T08:34:49Z

M2 Ultra results:

model	size	test	master t/s	IK t/s	speedup
LLaMA 7B Q4_0	3.56 GiB	pp 32	701.75 ± 4.36	712.61 ± 2.79	1.015
LLaMA 7B Q4_0	3.56 GiB	pp 64	879.19 ± 3.16	919.36 ± 5.79	1.046
LLaMA 7B Q4_0	3.56 GiB	pp 128	964.09 ± 1.39	1060.88 ± 1.97	1.100
LLaMA 7B Q4_0	3.56 GiB	pp 256	888.98 ± 1.02	1073.01 ± 1.69	1.207
LLaMA 7B Q4_0	3.56 GiB	pp 512	719.63 ± 0.23	1009.82 ± 0.96	1.403
LLaMA 7B Q4_0	3.56 GiB	pp 1024	549.62 ± 0.86	844.49 ± 1.62	1.536
LLaMA 7B Q4_0	3.56 GiB	tg 128	87.45 ± 0.10	87.65 ± 0.17	1.002

Perplexity Q4_0 takes ~7 min:

perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.65 seconds per pass - ETA 7.10 minutes

ikawrakow · 2023-09-02T15:41:09Z

With the latest commit 363f0bf TG for fp16 is basically 2X faster compared to master. PP is also improved by a margin rapidly increasing with context length.

On 30-core M2-Max:

model	backend	test	t/s (Master)	t/s (PR)	Speedup
LLaMA 7B mostly F16	Metal	tg 32	12.04 ± 0.01	24.34 ± 0.05	2.022
LLaMA 7B mostly F16	Metal	tg 64	12.01 ± 0.00	24.28 ± 0.03	2.022
LLaMA 7B mostly F16	Metal	tg 128	11.91 ± 0.09	24.05 ± 0.11	2.020
LLaMA 7B mostly F16	Metal	tg 256	10.88 ± 0.91	23.83 ± 0.11	2.190
LLaMA 7B mostly F16	Metal	pp 32	426.51 ± 2.98	433.79 ± 2.54	1.017
LLaMA 7B mostly F16	Metal	pp 64	509.79 ± 1.16	531.98 ± 1.61	1.044
LLaMA 7B mostly F16	Metal	pp 128	503.01 ± 0.99	548.36 ± 0.98	1.090
LLaMA 7B mostly F16	Metal	pp 256	449.66 ± 0.48	543.37 ± 0.49	1.208
LLaMA 7B mostly F16	Metal	pp 512	353.80 ± 0.10	494.99 ± 0.42	1.399
LLaMA 7B mostly F16	Metal	pp 1024	266.73 ± 0.09	424.11 ± 0.18	1.590

@ggerganov I'm curious to know how this compares to #2891 on you M2 Ultra.

ggerganov · 2023-09-02T16:30:20Z

model	size	test	#2891 t/s	PR t/s	speedup
LLaMA 7B F16	12.55 GiB	pp 32	717.82 ± 4.71	736.82 ± 2.22	1.026
LLaMA 7B F16	12.55 GiB	pp 64	974.19 ± 3.94	1040.40 ± 5.28	1.068
LLaMA 7B F16	12.55 GiB	pp 128	1049.52 ± 1.58	1193.71 ± 2.24	1.137
LLaMA 7B F16	12.55 GiB	pp 256	960.61 ± 1.25	1214.15 ± 0.90	1.264
LLaMA 7B F16	12.55 GiB	pp 512	763.64 ± 0.41	1135.89 ± 0.25	1.487
LLaMA 7B F16	12.55 GiB	pp 1024	574.10 ± 1.05	965.57 ± 0.21	1.682
LLaMA 7B F16	12.55 GiB	tg 32	41.57 ± 0.02	40.69 ± 0.04	0.979
LLaMA 7B F16	12.55 GiB	tg 64	41.51 ± 0.03	40.58 ± 0.03	0.978
LLaMA 7B F16	12.55 GiB	tg 128	41.25 ± 0.03	40.27 ± 0.04	0.976
LLaMA 7B F16	12.55 GiB	tg 256	40.88 ± 0.01	39.89 ± 0.08	0.976

Q4_0 7B Perplexity time also dropped - is this expected?

perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.64 seconds per pass - ETA 6.97 minutes

Also, if F16 and Q4_0 pp speed is comparable (F16 even faster now), how come the F16 perplexity is so much slower?

perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.85 seconds per pass - ETA 9.25 minutes

Is this a wrong ETA calculation, or do we have some significant overhead somewhere?

ikawrakow · 2023-09-02T16:37:33Z

Also, if F16 and Q4_0 pp speed is comparable (F16 even faster now), how come the F16 perplexity is so much slower?

My experience with the ETA is that it is not very accurate. For Q4_0 it predicted 14.5 minutes but then finished in 13.6 on my M2 Max. For fp16 it is predicting 15.5 minutes, and I was also wondering why it is slower than Q4_0 given that t/s for fp16 is now better than Q4_0. I guess, I'll let it run to completion.

OK, it finished in 12.5 minutes, so faster than Q4_0 as expected from PP 512.

main: build = 1157 (363f0bf) main: seed = 1693672429 llama_model_loader: loaded meta data with 14 key-value pairs and 291 tensors from ../models/L2_7B/ggml-model-f16.gguf (version GGUF V1 (support until nov 2023)) llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 32000, 1, 1 ] llama_model_loader: - tensor 1: output_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 2: output.weight f16 [ 4096, 32000, 1, 1 ] llama_model_loader: - tensor 3: blk.0.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 4: blk.0.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 5: blk.0.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 6: blk.0.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 7: blk.0.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 8: blk.0.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 9: blk.0.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 10: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 11: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 12: blk.1.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 13: blk.1.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 14: blk.1.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 15: blk.1.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 16: blk.1.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 17: blk.1.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 18: blk.1.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 19: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 20: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 21: blk.2.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 22: blk.2.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 23: blk.2.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 24: blk.2.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 25: blk.2.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 26: blk.2.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 27: blk.2.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 28: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 29: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 30: blk.3.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 31: blk.3.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 32: blk.3.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 33: blk.3.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 34: blk.3.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 35: blk.3.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 36: blk.3.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 37: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 38: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 39: blk.4.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 40: blk.4.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 41: blk.4.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 42: blk.4.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 43: blk.4.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 44: blk.4.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 45: blk.4.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 46: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 47: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 48: blk.5.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 49: blk.5.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 50: blk.5.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 51: blk.5.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 52: blk.5.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 53: blk.5.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 54: blk.5.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 55: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 56: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 57: blk.6.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 58: blk.6.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 59: blk.6.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 60: blk.6.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 61: blk.6.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 62: blk.6.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 63: blk.6.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 64: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 65: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 66: blk.7.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 67: blk.7.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 68: blk.7.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 69: blk.7.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 70: blk.7.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 71: blk.7.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 72: blk.7.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 73: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 74: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 75: blk.8.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 76: blk.8.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 77: blk.8.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 78: blk.8.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 79: blk.8.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 80: blk.8.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 81: blk.8.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 82: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 83: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 84: blk.9.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 85: blk.9.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 86: blk.9.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 87: blk.9.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 88: blk.9.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 89: blk.9.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 90: blk.9.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 91: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 92: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 93: blk.10.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 94: blk.10.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 95: blk.10.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 96: blk.10.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 97: blk.10.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 98: blk.10.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 99: blk.10.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 100: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 101: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 102: blk.11.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 103: blk.11.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 104: blk.11.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 105: blk.11.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 106: blk.11.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 107: blk.11.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 108: blk.11.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 109: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 110: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 111: blk.12.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 112: blk.12.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 113: blk.12.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 114: blk.12.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 115: blk.12.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 116: blk.12.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 117: blk.12.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 118: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 119: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 120: blk.13.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 121: blk.13.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 122: blk.13.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 123: blk.13.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 124: blk.13.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 125: blk.13.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 126: blk.13.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 127: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 128: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 129: blk.14.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 130: blk.14.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 131: blk.14.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 132: blk.14.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 133: blk.14.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 134: blk.14.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 135: blk.14.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 136: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 137: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 138: blk.15.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 139: blk.15.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 140: blk.15.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 141: blk.15.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 142: blk.15.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 143: blk.15.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 144: blk.15.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 145: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 146: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 147: blk.16.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 148: blk.16.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 149: blk.16.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 150: blk.16.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 151: blk.16.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 152: blk.16.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 153: blk.16.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 154: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 155: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 156: blk.17.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 157: blk.17.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 158: blk.17.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 159: blk.17.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 160: blk.17.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 161: blk.17.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 162: blk.17.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 163: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 164: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 165: blk.18.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 166: blk.18.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 167: blk.18.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 168: blk.18.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 169: blk.18.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 170: blk.18.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 171: blk.18.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 172: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 173: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 174: blk.19.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 175: blk.19.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 176: blk.19.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 177: blk.19.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 178: blk.19.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 179: blk.19.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 180: blk.19.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 181: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 182: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 183: blk.20.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 184: blk.20.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 185: blk.20.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 186: blk.20.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 187: blk.20.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 188: blk.20.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 189: blk.20.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 190: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 191: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 192: blk.21.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 193: blk.21.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 194: blk.21.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 195: blk.21.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 196: blk.21.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 197: blk.21.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 198: blk.21.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 199: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 200: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 201: blk.22.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 202: blk.22.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 203: blk.22.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 204: blk.22.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 205: blk.22.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 206: blk.22.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 207: blk.22.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 208: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 209: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 210: blk.23.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 211: blk.23.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 212: blk.23.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 213: blk.23.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 214: blk.23.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 215: blk.23.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 216: blk.23.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 217: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 218: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 219: blk.24.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 220: blk.24.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 221: blk.24.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 222: blk.24.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 223: blk.24.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 224: blk.24.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 225: blk.24.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 226: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 227: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 228: blk.25.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 229: blk.25.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 230: blk.25.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 231: blk.25.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 232: blk.25.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 233: blk.25.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 234: blk.25.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 235: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 236: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 237: blk.26.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 238: blk.26.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 239: blk.26.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 240: blk.26.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 241: blk.26.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 242: blk.26.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 243: blk.26.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 244: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 245: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 246: blk.27.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 247: blk.27.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 248: blk.27.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 249: blk.27.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 250: blk.27.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 251: blk.27.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 252: blk.27.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 253: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 254: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 255: blk.28.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 256: blk.28.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 257: blk.28.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 258: blk.28.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 259: blk.28.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 260: blk.28.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 261: blk.28.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 262: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 263: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 264: blk.29.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 265: blk.29.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 266: blk.29.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 267: blk.29.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 268: blk.29.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 269: blk.29.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 270: blk.29.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 271: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 272: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 273: blk.30.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 274: blk.30.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 275: blk.30.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 276: blk.30.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 277: blk.30.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 278: blk.30.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 279: blk.30.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 280: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 281: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 282: blk.31.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 283: blk.31.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 284: blk.31.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 285: blk.31.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 286: blk.31.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 287: blk.31.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 288: blk.31.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 289: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 290: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str llama_model_loader: - kv 2: llama.context_length u32 llama_model_loader: - kv 3: llama.embedding_length u32 llama_model_loader: - kv 4: llama.block_count u32 llama_model_loader: - kv 5: llama.feed_forward_length u32 llama_model_loader: - kv 6: llama.rope.dimension_count u32 llama_model_loader: - kv 7: llama.attention.head_count u32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 llama_model_loader: - kv 10: tokenizer.ggml.model str llama_model_loader: - kv 11: tokenizer.ggml.tokens arr llama_model_loader: - kv 12: tokenizer.ggml.scores arr llama_model_loader: - kv 13: tokenizer.ggml.token_type arr llama_model_loader: - type f32: 65 tensors llama_model_loader: - type f16: 226 tensors llm_load_print_meta: format = GGUF V1 (support until nov 2023) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_ctx = 512 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: freq_base = 10000.0 llm_load_print_meta: freq_scale = 1 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = mostly F16 (guessed) llm_load_print_meta: model size = 6.74 B llm_load_print_meta: general.name = LLaMA llm_load_print_meta: BOS token = 1 '~~' llm_load_print_meta: EOS token = 2 '~~' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.09 MB llm_load_tensors: mem required = 12853.11 MB (+ 256.00 MB per state) ................................................................................................... llama_new_context_with_model: kv self size = 256.00 MB ggml_metal_init: allocating ggml_metal_init: loading '/Users/iwan/other/llama.cpp/build/bin/ggml-metal.metal' ggml_metal_init: loaded kernel_add 0x106e0a630 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_add_row 0x106e0acd0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul 0x106e0b170 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul_row 0x106e0b720 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_scale 0x106e0bbc0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_silu 0x106e0c060 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_relu 0x106e0c500 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_gelu 0x106e0c9a0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_soft_max 0x106e0cfd0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_diag_mask_inf 0x106e0d5b0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_f16 0x106e0dbe0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q4_0 0x106e0e380 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q4_1 0x106e0e9b0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q8_0 0x106e0efe0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q2_K 0x106e0f610 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q3_K 0x106e0fc40 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q4_K 0x106e10270 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q5_K 0x106e108a0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_get_rows_q6_K 0x106e10ed0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_rms_norm 0x106e11680 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_norm 0x106e11cb0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x106e12490 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row 0x106e12c70 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x106e134d0 | th_max = 896 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x106e13bb0 | th_max = 896 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q8_0_f32 0x106e14290 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x106e14970 | th_max = 640 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x106e15250 | th_max = 704 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x106e15a80 | th_max = 576 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x106e16160 | th_max = 576 | th_width = 32 ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x106e16840 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x106e16f60 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x106e17400 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x106e17b20 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x106e18240 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x106e18960 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x106e19080 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x106e197a0 | th_max = 768 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x106e19ec0 | th_max = 704 | th_width = 32 ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x106e1a5e0 | th_max = 704 | th_width = 32 ggml_metal_init: loaded kernel_rope 0x106e1aa80 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_alibi_f32 0x106e1b2c0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f32_f16 0x106e1bad0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f32_f32 0x106e1c2e0 | th_max = 1024 | th_width = 32 ggml_metal_init: loaded kernel_cpy_f16_f16 0x106e1caf0 | th_max = 1024 | th_width = 32 ggml_metal_init: recommendedMaxWorkingSetSize = 49152.00 MB ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 73.47 MB llama_new_context_with_model: max tensor size = 250.00 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 12853.61 MB, (12854.05 / 49152.00) ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1.48 MB, (12855.53 / 49152.00) ggml_metal_add_buffer: allocated 'kv ' buffer, size = 258.00 MB, (13113.53 / 49152.00) ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 72.02 MB, (13185.55 / 49152.00)

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
perplexity: tokenizing the input ..
perplexity: tokenization took 558.492 ms
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 1.42 seconds per pass - ETA 15.47 minutes
[1]4.1673,[2]4.6879,[3]5.3354,[4]5.9055,[5]6.0324,[6]5.9499,[7]6.1214,[8]6.2104,[9]6.5347,[10]6.7147,[11]6.9313,[12]6.9794,[13]6.9035,[14]6.9785,[15]7.2015,[16]6.8633,[17]6.7470,[18]6.7376,[19]6.4191,[20]6.4129,[21]6.3387,[22]6.1702,[23]6.1408,[24]6.0507,[25]6.0387,[26]5.8833,[27]5.7017,[28]5.6024,[29]5.5209,[30]5.3714,[31]5.3338,[32]5.3533,[33]5.3097,[34]5.3386,[35]5.3543,[36]5.3790,[37]5.3737,[38]5.3724,[39]5.3859,[40]5.4353,[41]5.4569,[42]5.4941,[43]5.4564,[44]5.5108,[45]5.5202,[46]5.4975,[47]5.5212,[48]5.5007,[49]5.5010,[50]5.4683,[51]5.4688,[52]5.4591,[53]5.5063,[54]5.4939,[55]5.4793,[56]5.5086,[57]5.5277,[58]5.5556,[59]5.5769,[60]5.6259,[61]5.6227,[62]5.6839,[63]5.7190,[64]5.7258,[65]5.7679,[66]5.7788,[67]5.7979,[68]5.8177,[69]5.8524,[70]5.8900,[71]5.9167,[72]5.9514,[73]6.0033,[74]6.0135,[75]6.0245,[76]6.0418,[77]6.0567,[78]6.0462,[79]6.0727,[80]6.0717,[81]6.0868,[82]6.0924,[83]6.0440,[84]6.0338,[85]6.0300,[86]6.0131,[87]5.9564,[88]5.9326,[89]5.9181,[90]5.9107,[91]5.9308,[92]5.9299,[93]5.9317,[94]5.9317,[95]5.9610,[96]5.9597,[97]5.9521,[98]5.9464,[99]5.9351,[100]5.9374,[101]5.9597,[102]5.9554,[103]5.9723,[104]5.9814,[105]5.9830,[106]5.9997,[107]6.0022,[108]6.0169,[109]6.0158,[110]6.0125,[111]6.0319,[112]6.0514,[113]6.0538,[114]6.0543,[115]6.0631,[116]6.0515,[117]6.0558,[118]6.0807,[119]6.1012,[120]6.1345,[121]6.1508,[122]6.1728,[123]6.2132,[124]6.2304,[125]6.2218,[126]6.2574,[127]6.2921,[128]6.3170,[129]6.3011,[130]6.3103,[131]6.3044,[132]6.2954,[133]6.2812,[134]6.2887,[135]6.2863,[136]6.2749,[137]6.2688,[138]6.2518,[139]6.2448,[140]6.2416,[141]6.2165,[142]6.2120,[143]6.1844,[144]6.1652,[145]6.1562,[146]6.1463,[147]6.1531,[148]6.1543,[149]6.1474,[150]6.1475,[151]6.1537,[152]6.1471,[153]6.1339,[154]6.1264,[155]6.1336,[156]6.1312,[157]6.1464,[158]6.1491,[159]6.1510,[160]6.1534,[161]6.1661,[162]6.1388,[163]6.1274,[164]6.1027,[165]6.0722,[166]6.0447,[167]6.0074,[168]5.9768,[169]5.9628,[170]5.9516,[171]5.9268,[172]5.9130,[173]5.8978,[174]5.8697,[175]5.8488,[176]5.8354,[177]5.8157,[178]5.7942,[179]5.7784,[180]5.7695,[181]5.7511,[182]5.7334,[183]5.7194,[184]5.7177,[185]5.7098,[186]5.7116,[187]5.7162,[188]5.7150,[189]5.7305,[190]5.7307,[191]5.7486,[192]5.7665,[193]5.7840,[194]5.7974,[195]5.8198,[196]5.8339,[197]5.8529,[198]5.8676,[199]5.8701,[200]5.8738,[201]5.8666,[202]5.8824,[203]5.8910,[204]5.8886,[205]5.8998,[206]5.9038,[207]5.9021,[208]5.9110,[209]5.9134,[210]5.9194,[211]5.9290,[212]5.9348,[213]5.9446,[214]5.9461,[215]5.9491,[216]5.9630,[217]5.9792,[218]5.9932,[219]5.9930,[220]5.9903,[221]5.9829,[222]5.9806,[223]5.9721,[224]5.9627,[225]5.9577,[226]5.9775,[227]5.9820,[228]5.9876,[229]5.9929,[230]5.9885,[231]6.0028,[232]5.9913,[233]5.9759,[234]5.9598,[235]5.9361,[236]5.9298,[237]5.9180,[238]5.9200,[239]5.9065,[240]5.8958,[241]5.8983,[242]5.8998,[243]5.8959,[244]5.8847,[245]5.8810,[246]5.8699,[247]5.8590,[248]5.8517,[249]5.8480,[250]5.8511,[251]5.8428,[252]5.8384,[253]5.8290,[254]5.8231,[255]5.8129,[256]5.7961,[257]5.7848,[258]5.7765,[259]5.7768,[260]5.7687,[261]5.7630,[262]5.7573,[263]5.7514,[264]5.7271,[265]5.7266,[266]5.7233,[267]5.7169,[268]5.7250,[269]5.7239,[270]5.7251,[271]5.7314,[272]5.7345,[273]5.7349,[274]5.7363,[275]5.7429,[276]5.7492,[277]5.7646,[278]5.7736,[279]5.7836,[280]5.7864,[281]5.7971,[282]5.8020,[283]5.8169,[284]5.8263,[285]5.8357,[286]5.8489,[287]5.8483,[288]5.8545,[289]5.8469,[290]5.8311,[291]5.8167,[292]5.8008,[293]5.7870,[294]5.7868,[295]5.7862,[296]5.7905,[297]5.7884,[298]5.7899,[299]5.7860,[300]5.7748,[301]5.7740,[302]5.7663,[303]5.7588,[304]5.7497,[305]5.7457,[306]5.7334,[307]5.7351,[308]5.7360,[309]5.7201,[310]5.7160,[311]5.7103,[312]5.7107,[313]5.7047,[314]5.7015,[315]5.6859,[316]5.6808,[317]5.6661,[318]5.6467,[319]5.6598,[320]5.6728,[321]5.6778,[322]5.6738,[323]5.6682,[324]5.6674,[325]5.6787,[326]5.6791,[327]5.6811,[328]5.6847,[329]5.6907,[330]5.6933,[331]5.7052,[332]5.7013,[333]5.7088,[334]5.7023,[335]5.6962,[336]5.6981,[337]5.6966,[338]5.6967,[339]5.6919,[340]5.6880,[341]5.6952,[342]5.6970,[343]5.7014,[344]5.7009,[345]5.7012,[346]5.6984,[347]5.7009,[348]5.7044,[349]5.7073,[350]5.7054,[351]5.7057,[352]5.7057,[353]5.6998,[354]5.6986,[355]5.7025,[356]5.7053,[357]5.7029,[358]5.7112,[359]5.7138,[360]5.7116,[361]5.7113,[362]5.7184,[363]5.7297,[364]5.7358,[365]5.7409,[366]5.7433,[367]5.7514,[368]5.7485,[369]5.7499,[370]5.7514,[371]5.7467,[372]5.7521,[373]5.7562,[374]5.7549,[375]5.7537,[376]5.7608,[377]5.7574,[378]5.7605,[379]5.7651,[380]5.7576,[381]5.7548,[382]5.7498,[383]5.7478,[384]5.7469,[385]5.7458,[386]5.7465,[387]5.7465,[388]5.7417,[389]5.7374,[390]5.7314,[391]5.7246,[392]5.7199,[393]5.7200,[394]5.7225,[395]5.7205,[396]5.7129,[397]5.7200,[398]5.7242,[399]5.7312,[400]5.7304,[401]5.7324,[402]5.7333,[403]5.7353,[404]5.7414,[405]5.7326,[406]5.7289,[407]5.7283,[408]5.7305,[409]5.7415,[410]5.7522,[411]5.7613,[412]5.7761,[413]5.7873,[414]5.7932,[415]5.7990,[416]5.8061,[417]5.8177,[418]5.8217,[419]5.8268,[420]5.8350,[421]5.8454,[422]5.8493,[423]5.8548,[424]5.8645,[425]5.8725,[426]5.8792,[427]5.8826,[428]5.8898,[429]5.8950,[430]5.9017,[431]5.9157,[432]5.9194,[433]5.9179,[434]5.9141,[435]5.9155,[436]5.9183,[437]5.9276,[438]5.9349,[439]5.9313,[440]5.9292,[441]5.9246,[442]5.9237,[443]5.9252,[444]5.9260,[445]5.9242,[446]5.9262,[447]5.9284,[448]5.9320,[449]5.9297,[450]5.9300,[451]5.9265,[452]5.9121,[453]5.9026,[454]5.8964,[455]5.8967,[456]5.9016,[457]5.9032,[458]5.9014,[459]5.9013,[460]5.9092,[461]5.9055,[462]5.9039,[463]5.9071,[464]5.9062,[465]5.9040,[466]5.8971,[467]5.8985,[468]5.8986,[469]5.9004,[470]5.9012,[471]5.8975,[472]5.9012,[473]5.8955,[474]5.8966,[475]5.8904,[476]5.8924,[477]5.8853,[478]5.8844,[479]5.8895,[480]5.8943,[481]5.8966,[482]5.8926,[483]5.8888,[484]5.8904,[485]5.8890,[486]5.8846,[487]5.8839,[488]5.8819,[489]5.8777,[490]5.8761,[491]5.8735,[492]5.8682,[493]5.8650,[494]5.8629,[495]5.8609,[496]5.8574,[497]5.8527,[498]5.8510,[499]5.8467,[500]5.8381,[501]5.8327,[502]5.8328,[503]5.8311,[504]5.8228,[505]5.8247,[506]5.8252,[507]5.8199,[508]5.8155,[509]5.8156,[510]5.8178,[511]5.8224,[512]5.8258,[513]5.8274,[514]5.8331,[515]5.8283,[516]5.8279,[517]5.8281,[518]5.8276,[519]5.8299,[520]5.8322,[521]5.8334,[522]5.8359,[523]5.8363,[524]5.8414,[525]5.8446,[526]5.8459,[527]5.8473,[528]5.8425,[529]5.8429,[530]5.8384,[531]5.8365,[532]5.8415,[533]5.8436,[534]5.8425,[535]5.8457,[536]5.8404,[537]5.8392,[538]5.8437,[539]5.8450,[540]5.8469,[541]5.8477,[542]5.8483,[543]5.8506,[544]5.8517,[545]5.8504,[546]5.8516,[547]5.8478,[548]5.8429,[549]5.8423,[550]5.8398,[551]5.8373,[552]5.8359,[553]5.8323,[554]5.8300,[555]5.8271,[556]5.8266,[557]5.8289,[558]5.8254,[559]5.8258,[560]5.8259,[561]5.8261,[562]5.8235,[563]5.8233,[564]5.8280,[565]5.8290,[566]5.8296,[567]5.8269,[568]5.8271,[569]5.8258,[570]5.8286,[571]5.8294,[572]5.8299,[573]5.8297,[574]5.8260,[575]5.8245,[576]5.8244,[577]5.8228,[578]5.8213,[579]5.8214,[580]5.8160,[581]5.8133,[582]5.8125,[583]5.8137,[584]5.8139,[585]5.8067,[586]5.8003,[587]5.8000,[588]5.8039,[589]5.8090,[590]5.8119,[591]5.8137,[592]5.8124,[593]5.8084,[594]5.8089,[595]5.8070,[596]5.8108,[597]5.8084,[598]5.8052,[599]5.8072,[600]5.8062,[601]5.8047,[602]5.8046,[603]5.8061,[604]5.8075,[605]5.8108,[606]5.8126,[607]5.8115,[608]5.8075,[609]5.8084,[610]5.8123,[611]5.8111,[612]5.8134,[613]5.8107,[614]5.8061,[615]5.7998,[616]5.8024,[617]5.7967,[618]5.7917,[619]5.7868,[620]5.7740,[621]5.7677,[622]5.7656,[623]5.7675,[624]5.7678,[625]5.7687,[626]5.7683,[627]5.7711,[628]5.7718,[629]5.7722,[630]5.7753,[631]5.7805,[632]5.7861,[633]5.7857,[634]5.7886,[635]5.7898,[636]5.7870,[637]5.7832,[638]5.7855,[639]5.7817,[640]5.7828,[641]5.7832,[642]5.7888,[643]5.7904,[644]5.7914,[645]5.7901,[646]5.7937,[647]5.7898,[648]5.7905,[649]5.7912,[650]5.7946,[651]5.7990,[652]5.7998,[653]5.8032,[654]5.7971,[655]5.7962,
Final estimate: PPL = 5.7962 +/- 0.03235

llama_print_timings: load time = 2157.42 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 742818.30 ms / 335360 tokens ( 2.21 ms per token, 451.47 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 749648.32 ms

ggerganov · 2023-09-02T16:39:58Z

I'm also running.

Q4_0 finished in just 5m and 28s:

system_info: n_threads = 16 / 24 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
perplexity: tokenizing the input ..
perplexity: tokenization took 555.092 ms
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.63 seconds per pass - ETA 6.90 minutes
[1]4.4083,[2]4.9137,[3]5.8087,[4]6.4334,[5]6.5183,[6]6.4580,[7]6.6552,[8]6.7720,[9]7.1091,[10]7.3490,[11]7.5538,[12]7.5747,[13]7.5065,[14]7.5835,[15]7.8312,[16]7.4352,[17]7.3219,[18]7.2706,[19]6.9050,[20]6.8973,[21]6.7935,[22]6.6206,[23]6.5822,[24]6.4818,[25]6.4785,[26]6.3148,[27]6.1317,[28]6.0291,[29]5.9307,[30]5.7704,[31]5.7397,[32]5.7649,[33]5.7076,[34]5.7423,[35]5.7684,[36]5.8032,[37]5.8046,[38]5.8130,[39]5.8435,[40]5.9025,[41]5.9135,[42]5.9526,[43]5.9132,[44]5.9670,[45]5.9734,[46]5.9428,[47]5.9681,[48]5.9435,[49]5.9497,[50]5.9054,[51]5.9008,[52]5.8909,[53]5.9358,[54]5.9171,[55]5.8918,[56]5.9238,[57]5.9458,[58]5.9670,[59]5.9849,[60]6.0304,[61]6.0207,[62]6.0783,[63]6.1122,[64]6.1242,[65]6.1716,[66]6.1813,[67]6.1964,[68]6.2132,[69]6.2359,[70]6.2670,[71]6.2845,[72]6.3174,[73]6.3773,[74]6.3830,[75]6.3972,[76]6.4099,[77]6.4203,[78]6.4055,[79]6.4338,[80]6.4266,[81]6.4372,[82]6.4417,[83]6.3908,[84]6.3768,[85]6.3654,[86]6.3429,[87]6.2834,[88]6.2565,[89]6.2375,[90]6.2222,[91]6.2463,[92]6.2405,[93]6.2370,[94]6.2343,[95]6.2630,[96]6.2609,[97]6.2558,[98]6.2495,[99]6.2344,[100]6.2330,[101]6.2570,[102]6.2513,[103]6.2718,[104]6.2783,[105]6.2786,[106]6.2934,[107]6.2914,[108]6.3017,[109]6.2958,[110]6.2935,[111]6.3143,[112]6.3351,[113]6.3345,[114]6.3311,[115]6.3370,[116]6.3280,[117]6.3325,[118]6.3605,[119]6.3806,[120]6.4151,[121]6.4306,[122]6.4553,[123]6.4922,[124]6.5114,[125]6.5023,[126]6.5425,[127]6.5792,[128]6.6103,[129]6.5937,[130]6.6009,[131]6.5956,[132]6.5870,[133]6.5722,[134]6.5813,[135]6.5770,[136]6.5644,[137]6.5573,[138]6.5403,[139]6.5300,[140]6.5262,[141]6.4979,[142]6.4947,[143]6.4670,[144]6.4459,[145]6.4373,[146]6.4251,[147]6.4296,[148]6.4298,[149]6.4246,[150]6.4211,[151]6.4223,[152]6.4120,[153]6.3948,[154]6.3864,[155]6.3928,[156]6.3884,[157]6.4047,[158]6.4087,[159]6.4146,[160]6.4172,[161]6.4286,[162]6.3996,[163]6.3869,[164]6.3628,[165]6.3322,[166]6.3052,[167]6.2681,[168]6.2369,[169]6.2231,[170]6.2126,[171]6.1857,[172]6.1688,[173]6.1513,[174]6.1222,[175]6.0995,[176]6.0869,[177]6.0665,[178]6.0437,[179]6.0267,[180]6.0175,[181]5.9964,[182]5.9792,[183]5.9655,[184]5.9647,[185]5.9573,[186]5.9577,[187]5.9633,[188]5.9594,[189]5.9768,[190]5.9788,[191]6.0009,[192]6.0171,[193]6.0334,[194]6.0449,[195]6.0670,[196]6.0826,[197]6.1057,[198]6.1203,[199]6.1233,[200]6.1287,[201]6.1237,[202]6.1438,[203]6.1513,[204]6.1506,[205]6.1611,[206]6.1686,[207]6.1646,[208]6.1732,[209]6.1778,[210]6.1837,[211]6.1940,[212]6.2008,[213]6.2113,[214]6.2143,[215]6.2176,[216]6.2309,[217]6.2486,[218]6.2621,[219]6.2624,[220]6.2584,[221]6.2533,[222]6.2501,[223]6.2409,[224]6.2337,[225]6.2299,[226]6.2499,[227]6.2586,[228]6.2635,[229]6.2683,[230]6.2653,[231]6.2817,[232]6.2692,[233]6.2521,[234]6.2380,[235]6.2225,[236]6.2162,[237]6.2059,[238]6.2084,[239]6.1932,[240]6.1836,[241]6.1861,[242]6.1902,[243]6.1890,[244]6.1771,[245]6.1734,[246]6.1622,[247]6.1499,[248]6.1419,[249]6.1391,[250]6.1433,[251]6.1355,[252]6.1311,[253]6.1216,[254]6.1171,[255]6.1056,[256]6.0882,[257]6.0756,[258]6.0671,[259]6.0648,[260]6.0565,[261]6.0521,[262]6.0471,[263]6.0413,[264]6.0232,[265]6.0226,[266]6.0199,[267]6.0134,[268]6.0228,[269]6.0207,[270]6.0217,[271]6.0295,[272]6.0338,[273]6.0345,[274]6.0362,[275]6.0448,[276]6.0507,[277]6.0659,[278]6.0760,[279]6.0856,[280]6.0887,[281]6.0982,[282]6.1048,[283]6.1190,[284]6.1267,[285]6.1351,[286]6.1488,[287]6.1477,[288]6.1537,[289]6.1450,[290]6.1292,[291]6.1140,[292]6.0991,[293]6.0858,[294]6.0878,[295]6.0870,[296]6.0914,[297]6.0905,[298]6.0937,[299]6.0909,[300]6.0803,[301]6.0806,[302]6.0729,[303]6.0645,[304]6.0564,[305]6.0531,[306]6.0411,[307]6.0434,[308]6.0463,[309]6.0303,[310]6.0248,[311]6.0186,[312]6.0212,[313]6.0154,[314]6.0133,[315]5.9974,[316]5.9929,[317]5.9771,[318]5.9569,[319]5.9690,[320]5.9811,[321]5.9855,[322]5.9809,[323]5.9744,[324]5.9714,[325]5.9816,[326]5.9814,[327]5.9836,[328]5.9875,[329]5.9927,[330]5.9954,[331]6.0080,[332]6.0053,[333]6.0121,[334]6.0065,[335]6.0005,[336]6.0037,[337]6.0011,[338]6.0009,[339]5.9953,[340]5.9917,[341]5.9987,[342]6.0016,[343]6.0073,[344]6.0074,[345]6.0076,[346]6.0049,[347]6.0097,[348]6.0127,[349]6.0150,[350]6.0125,[351]6.0135,[352]6.0136,[353]6.0082,[354]6.0083,[355]6.0133,[356]6.0164,[357]6.0127,[358]6.0216,[359]6.0249,[360]6.0211,[361]6.0208,[362]6.0277,[363]6.0391,[364]6.0451,[365]6.0501,[366]6.0515,[367]6.0600,[368]6.0574,[369]6.0582,[370]6.0598,[371]6.0550,[372]6.0598,[373]6.0643,[374]6.0629,[375]6.0630,[376]6.0696,[377]6.0660,[378]6.0688,[379]6.0749,[380]6.0670,[381]6.0635,[382]6.0585,[383]6.0573,[384]6.0569,[385]6.0559,[386]6.0550,[387]6.0549,[388]6.0515,[389]6.0462,[390]6.0390,[391]6.0316,[392]6.0280,[393]6.0260,[394]6.0291,[395]6.0276,[396]6.0204,[397]6.0276,[398]6.0308,[399]6.0385,[400]6.0377,[401]6.0392,[402]6.0401,[403]6.0419,[404]6.0484,[405]6.0384,[406]6.0344,[407]6.0338,[408]6.0353,[409]6.0474,[410]6.0583,[411]6.0693,[412]6.0854,[413]6.0963,[414]6.1037,[415]6.1091,[416]6.1169,[417]6.1286,[418]6.1324,[419]6.1390,[420]6.1473,[421]6.1584,[422]6.1625,[423]6.1694,[424]6.1802,[425]6.1886,[426]6.1952,[427]6.1996,[428]6.2083,[429]6.2127,[430]6.2215,[431]6.2351,[432]6.2395,[433]6.2389,[434]6.2339,[435]6.2350,[436]6.2374,[437]6.2471,[438]6.2545,[439]6.2518,[440]6.2511,[441]6.2462,[442]6.2448,[443]6.2457,[444]6.2459,[445]6.2435,[446]6.2459,[447]6.2483,[448]6.2525,[449]6.2499,[450]6.2509,[451]6.2470,[452]6.2355,[453]6.2272,[454]6.2216,[455]6.2227,[456]6.2272,[457]6.2293,[458]6.2269,[459]6.2271,[460]6.2354,[461]6.2328,[462]6.2316,[463]6.2356,[464]6.2345,[465]6.2316,[466]6.2241,[467]6.2242,[468]6.2243,[469]6.2262,[470]6.2267,[471]6.2220,[472]6.2261,[473]6.2206,[474]6.2220,[475]6.2164,[476]6.2181,[477]6.2109,[478]6.2098,[479]6.2166,[480]6.2212,[481]6.2230,[482]6.2186,[483]6.2145,[484]6.2167,[485]6.2148,[486]6.2096,[487]6.2095,[488]6.2071,[489]6.2024,[490]6.2002,[491]6.1974,[492]6.1914,[493]6.1886,[494]6.1870,[495]6.1872,[496]6.1836,[497]6.1781,[498]6.1764,[499]6.1721,[500]6.1628,[501]6.1562,[502]6.1568,[503]6.1558,[504]6.1468,[505]6.1494,[506]6.1505,[507]6.1450,[508]6.1406,[509]6.1398,[510]6.1431,[511]6.1475,[512]6.1511,[513]6.1532,[514]6.1596,[515]6.1543,[516]6.1533,[517]6.1547,[518]6.1547,[519]6.1581,[520]6.1601,[521]6.1618,[522]6.1647,[523]6.1654,[524]6.1711,[525]6.1745,[526]6.1757,[527]6.1775,[528]6.1726,[529]6.1731,[530]6.1680,[531]6.1666,[532]6.1714,[533]6.1738,[534]6.1724,[535]6.1747,[536]6.1693,[537]6.1669,[538]6.1716,[539]6.1726,[540]6.1762,[541]6.1767,[542]6.1780,[543]6.1794,[544]6.1806,[545]6.1783,[546]6.1789,[547]6.1747,[548]6.1695,[549]6.1696,[550]6.1669,[551]6.1633,[552]6.1612,[553]6.1572,[554]6.1552,[555]6.1524,[556]6.1521,[557]6.1544,[558]6.1505,[559]6.1496,[560]6.1494,[561]6.1495,[562]6.1469,[563]6.1470,[564]6.1508,[565]6.1524,[566]6.1521,[567]6.1502,[568]6.1507,[569]6.1493,[570]6.1520,[571]6.1523,[572]6.1532,[573]6.1531,[574]6.1492,[575]6.1488,[576]6.1490,[577]6.1474,[578]6.1457,[579]6.1461,[580]6.1393,[581]6.1356,[582]6.1343,[583]6.1351,[584]6.1356,[585]6.1282,[586]6.1215,[587]6.1222,[588]6.1271,[589]6.1325,[590]6.1352,[591]6.1373,[592]6.1359,[593]6.1329,[594]6.1339,[595]6.1318,[596]6.1356,[597]6.1334,[598]6.1308,[599]6.1328,[600]6.1326,[601]6.1313,[602]6.1326,[603]6.1355,[604]6.1364,[605]6.1398,[606]6.1415,[607]6.1402,[608]6.1367,[609]6.1371,[610]6.1405,[611]6.1384,[612]6.1412,[613]6.1374,[614]6.1325,[615]6.1254,[616]6.1280,[617]6.1218,[618]6.1166,[619]6.1112,[620]6.0975,[621]6.0908,[622]6.0892,[623]6.0910,[624]6.0918,[625]6.0917,[626]6.0907,[627]6.0927,[628]6.0931,[629]6.0926,[630]6.0956,[631]6.1013,[632]6.1065,[633]6.1050,[634]6.1085,[635]6.1095,[636]6.1067,[637]6.1034,[638]6.1059,[639]6.1029,[640]6.1041,[641]6.1045,[642]6.1111,[643]6.1131,[644]6.1142,[645]6.1125,[646]6.1168,[647]6.1135,[648]6.1143,[649]6.1143,[650]6.1182,[651]6.1234,[652]6.1245,[653]6.1284,[654]6.1219,[655]6.1213,
Final estimate: PPL = 6.1213 +/- 0.03511

llama_print_timings:        load time =  1930.79 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 323972.82 ms / 335360 tokens (    0.97 ms per token,  1035.15 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 327603.66 ms
ggml_metal_free: deallocating

real	5m27.892s
user	0m29.329s
sys	0m2.167s

F16 perplexity finished even faster in just 5m and 4s !

system_info: n_threads = 16 / 24 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
perplexity: tokenizing the input ..
perplexity: tokenization took 585.54 ms
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.85 seconds per pass - ETA 9.22 minutes
[1]4.2295,[2]4.7072,[3]5.5755,[4]6.1797,[5]6.2988,[6]6.2670,[7]6.4606,[8]6.5534,[9]6.8729,[10]7.1200,[11]7.3171,[12]7.3391,[13]7.2497,[14]7.2966,[15]7.5340,[16]7.1647,[17]7.0577,[18]7.0056,[19]6.6589,[20]6.6460,[21]6.5491,[22]6.3765,[23]6.3376,[24]6.2424,[25]6.2400,[26]6.0786,[27]5.9062,[28]5.8051,[29]5.7163,[30]5.5628,[31]5.5325,[32]5.5546,[33]5.5031,[34]5.5341,[35]5.5574,[36]5.5899,[37]5.5881,[38]5.5983,[39]5.6285,[40]5.6778,[41]5.6866,[42]5.7227,[43]5.6856,[44]5.7407,[45]5.7451,[46]5.7163,[47]5.7391,[48]5.7160,[49]5.7166,[50]5.6771,[51]5.6735,[52]5.6644,[53]5.7098,[54]5.6945,[55]5.6741,[56]5.7027,[57]5.7224,[58]5.7409,[59]5.7584,[60]5.7979,[61]5.7898,[62]5.8467,[63]5.8761,[64]5.8884,[65]5.9300,[66]5.9390,[67]5.9556,[68]5.9701,[69]5.9914,[70]6.0197,[71]6.0398,[72]6.0714,[73]6.1270,[74]6.1314,[75]6.1452,[76]6.1578,[77]6.1680,[78]6.1535,[79]6.1803,[80]6.1745,[81]6.1874,[82]6.1921,[83]6.1434,[84]6.1280,[85]6.1170,[86]6.0963,[87]6.0358,[88]6.0121,[89]5.9930,[90]5.9787,[91]6.0015,[92]5.9954,[93]5.9942,[94]5.9914,[95]6.0186,[96]6.0186,[97]6.0129,[98]6.0081,[99]5.9952,[100]5.9937,[101]6.0169,[102]6.0127,[103]6.0326,[104]6.0389,[105]6.0389,[106]6.0547,[107]6.0538,[108]6.0653,[109]6.0613,[110]6.0585,[111]6.0797,[112]6.1002,[113]6.1011,[114]6.0970,[115]6.1030,[116]6.0941,[117]6.0988,[118]6.1261,[119]6.1472,[120]6.1807,[121]6.1953,[122]6.2198,[123]6.2567,[124]6.2742,[125]6.2662,[126]6.3037,[127]6.3393,[128]6.3686,[129]6.3539,[130]6.3600,[131]6.3559,[132]6.3491,[133]6.3350,[134]6.3442,[135]6.3396,[136]6.3291,[137]6.3214,[138]6.3034,[139]6.2939,[140]6.2905,[141]6.2625,[142]6.2596,[143]6.2301,[144]6.2096,[145]6.2007,[146]6.1901,[147]6.1937,[148]6.1942,[149]6.1892,[150]6.1853,[151]6.1870,[152]6.1776,[153]6.1620,[154]6.1540,[155]6.1593,[156]6.1550,[157]6.1713,[158]6.1757,[159]6.1795,[160]6.1822,[161]6.1948,[162]6.1679,[163]6.1567,[164]6.1342,[165]6.1050,[166]6.0789,[167]6.0437,[168]6.0146,[169]6.0012,[170]5.9914,[171]5.9661,[172]5.9498,[173]5.9340,[174]5.9054,[175]5.8841,[176]5.8726,[177]5.8539,[178]5.8319,[179]5.8160,[180]5.8071,[181]5.7867,[182]5.7697,[183]5.7569,[184]5.7558,[185]5.7489,[186]5.7497,[187]5.7558,[188]5.7523,[189]5.7688,[190]5.7700,[191]5.7907,[192]5.8059,[193]5.8213,[194]5.8321,[195]5.8526,[196]5.8675,[197]5.8882,[198]5.9023,[199]5.9049,[200]5.9100,[201]5.9049,[202]5.9235,[203]5.9304,[204]5.9292,[205]5.9392,[206]5.9459,[207]5.9416,[208]5.9500,[209]5.9540,[210]5.9591,[211]5.9701,[212]5.9772,[213]5.9872,[214]5.9896,[215]5.9932,[216]6.0061,[217]6.0236,[218]6.0364,[219]6.0359,[220]6.0325,[221]6.0268,[222]6.0247,[223]6.0161,[224]6.0096,[225]6.0062,[226]6.0261,[227]6.0336,[228]6.0388,[229]6.0447,[230]6.0418,[231]6.0573,[232]6.0462,[233]6.0298,[234]6.0157,[235]5.9969,[236]5.9910,[237]5.9818,[238]5.9840,[239]5.9700,[240]5.9603,[241]5.9618,[242]5.9651,[243]5.9637,[244]5.9530,[245]5.9495,[246]5.9392,[247]5.9277,[248]5.9205,[249]5.9176,[250]5.9220,[251]5.9153,[252]5.9120,[253]5.9026,[254]5.8973,[255]5.8865,[256]5.8697,[257]5.8576,[258]5.8494,[259]5.8467,[260]5.8385,[261]5.8340,[262]5.8289,[263]5.8232,[264]5.8023,[265]5.8017,[266]5.7997,[267]5.7937,[268]5.8023,[269]5.8004,[270]5.8017,[271]5.8093,[272]5.8126,[273]5.8133,[274]5.8152,[275]5.8232,[276]5.8293,[277]5.8444,[278]5.8540,[279]5.8632,[280]5.8662,[281]5.8758,[282]5.8819,[283]5.8961,[284]5.9038,[285]5.9121,[286]5.9249,[287]5.9242,[288]5.9296,[289]5.9219,[290]5.9068,[291]5.8926,[292]5.8783,[293]5.8657,[294]5.8677,[295]5.8666,[296]5.8713,[297]5.8703,[298]5.8731,[299]5.8708,[300]5.8606,[301]5.8608,[302]5.8534,[303]5.8449,[304]5.8369,[305]5.8333,[306]5.8213,[307]5.8236,[308]5.8261,[309]5.8111,[310]5.8060,[311]5.8000,[312]5.8022,[313]5.7967,[314]5.7951,[315]5.7800,[316]5.7752,[317]5.7597,[318]5.7405,[319]5.7519,[320]5.7636,[321]5.7682,[322]5.7642,[323]5.7578,[324]5.7551,[325]5.7651,[326]5.7654,[327]5.7672,[328]5.7707,[329]5.7757,[330]5.7783,[331]5.7903,[332]5.7878,[333]5.7943,[334]5.7891,[335]5.7835,[336]5.7870,[337]5.7851,[338]5.7846,[339]5.7795,[340]5.7758,[341]5.7829,[342]5.7857,[343]5.7905,[344]5.7909,[345]5.7915,[346]5.7891,[347]5.7933,[348]5.7967,[349]5.7992,[350]5.7965,[351]5.7975,[352]5.7972,[353]5.7918,[354]5.7919,[355]5.7969,[356]5.8002,[357]5.7967,[358]5.8056,[359]5.8082,[360]5.8052,[361]5.8047,[362]5.8116,[363]5.8225,[364]5.8282,[365]5.8328,[366]5.8339,[367]5.8420,[368]5.8396,[369]5.8407,[370]5.8423,[371]5.8378,[372]5.8427,[373]5.8471,[374]5.8457,[375]5.8459,[376]5.8525,[377]5.8491,[378]5.8519,[379]5.8575,[380]5.8500,[381]5.8469,[382]5.8420,[383]5.8410,[384]5.8405,[385]5.8395,[386]5.8390,[387]5.8389,[388]5.8356,[389]5.8307,[390]5.8240,[391]5.8167,[392]5.8129,[393]5.8113,[394]5.8140,[395]5.8129,[396]5.8060,[397]5.8127,[398]5.8163,[399]5.8238,[400]5.8237,[401]5.8251,[402]5.8262,[403]5.8283,[404]5.8345,[405]5.8251,[406]5.8218,[407]5.8213,[408]5.8229,[409]5.8342,[410]5.8449,[411]5.8559,[412]5.8715,[413]5.8821,[414]5.8897,[415]5.8951,[416]5.9027,[417]5.9143,[418]5.9180,[419]5.9242,[420]5.9326,[421]5.9438,[422]5.9477,[423]5.9543,[424]5.9647,[425]5.9730,[426]5.9793,[427]5.9836,[428]5.9920,[429]5.9966,[430]6.0047,[431]6.0180,[432]6.0226,[433]6.0219,[434]6.0176,[435]6.0187,[436]6.0212,[437]6.0304,[438]6.0379,[439]6.0352,[440]6.0344,[441]6.0297,[442]6.0283,[443]6.0296,[444]6.0301,[445]6.0281,[446]6.0305,[447]6.0331,[448]6.0371,[449]6.0348,[450]6.0357,[451]6.0320,[452]6.0189,[453]6.0106,[454]6.0053,[455]6.0063,[456]6.0108,[457]6.0128,[458]6.0107,[459]6.0110,[460]6.0194,[461]6.0169,[462]6.0155,[463]6.0190,[464]6.0179,[465]6.0153,[466]6.0078,[467]6.0080,[468]6.0078,[469]6.0098,[470]6.0102,[471]6.0056,[472]6.0098,[473]6.0048,[474]6.0061,[475]6.0002,[476]6.0016,[477]5.9947,[478]5.9936,[479]5.9992,[480]6.0035,[481]6.0052,[482]6.0009,[483]5.9971,[484]5.9990,[485]5.9969,[486]5.9915,[487]5.9912,[488]5.9888,[489]5.9842,[490]5.9820,[491]5.9793,[492]5.9740,[493]5.9716,[494]5.9701,[495]5.9697,[496]5.9660,[497]5.9607,[498]5.9590,[499]5.9550,[500]5.9461,[501]5.9398,[502]5.9401,[503]5.9394,[504]5.9309,[505]5.9330,[506]5.9339,[507]5.9279,[508]5.9237,[509]5.9232,[510]5.9261,[511]5.9307,[512]5.9340,[513]5.9362,[514]5.9423,[515]5.9371,[516]5.9360,[517]5.9372,[518]5.9368,[519]5.9399,[520]5.9421,[521]5.9433,[522]5.9459,[523]5.9465,[524]5.9522,[525]5.9552,[526]5.9559,[527]5.9577,[528]5.9527,[529]5.9531,[530]5.9482,[531]5.9472,[532]5.9518,[533]5.9542,[534]5.9526,[535]5.9547,[536]5.9496,[537]5.9475,[538]5.9524,[539]5.9535,[540]5.9572,[541]5.9572,[542]5.9585,[543]5.9601,[544]5.9610,[545]5.9591,[546]5.9599,[547]5.9560,[548]5.9516,[549]5.9518,[550]5.9491,[551]5.9459,[552]5.9438,[553]5.9403,[554]5.9384,[555]5.9355,[556]5.9349,[557]5.9373,[558]5.9337,[559]5.9332,[560]5.9330,[561]5.9333,[562]5.9311,[563]5.9308,[564]5.9348,[565]5.9367,[566]5.9365,[567]5.9345,[568]5.9353,[569]5.9340,[570]5.9368,[571]5.9373,[572]5.9384,[573]5.9382,[574]5.9347,[575]5.9338,[576]5.9336,[577]5.9320,[578]5.9303,[579]5.9306,[580]5.9241,[581]5.9206,[582]5.9196,[583]5.9205,[584]5.9208,[585]5.9135,[586]5.9069,[587]5.9075,[588]5.9124,[589]5.9174,[590]5.9203,[591]5.9223,[592]5.9211,[593]5.9181,[594]5.9192,[595]5.9170,[596]5.9203,[597]5.9184,[598]5.9157,[599]5.9178,[600]5.9174,[601]5.9160,[602]5.9170,[603]5.9197,[604]5.9205,[605]5.9239,[606]5.9258,[607]5.9243,[608]5.9210,[609]5.9218,[610]5.9251,[611]5.9233,[612]5.9260,[613]5.9225,[614]5.9179,[615]5.9110,[616]5.9134,[617]5.9074,[618]5.9026,[619]5.8974,[620]5.8844,[621]5.8781,[622]5.8766,[623]5.8782,[624]5.8788,[625]5.8790,[626]5.8780,[627]5.8803,[628]5.8804,[629]5.8800,[630]5.8831,[631]5.8883,[632]5.8937,[633]5.8923,[634]5.8958,[635]5.8966,[636]5.8934,[637]5.8900,[638]5.8923,[639]5.8892,[640]5.8902,[641]5.8903,[642]5.8968,[643]5.8989,[644]5.9001,[645]5.8984,[646]5.9024,[647]5.8985,[648]5.8995,[649]5.8997,[650]5.9033,[651]5.9083,[652]5.9094,[653]5.9132,[654]5.9071,[655]5.9066,
Final estimate: PPL = 5.9066 +/- 0.03308

llama_print_timings:        load time =  3548.93 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 298226.30 ms / 335360 tokens (    0.89 ms per token,  1124.52 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 303265.95 ms
ggml_metal_free: deallocating

real	5m3.493s
user	0m29.352s
sys	0m3.894s

So definitely a wrong ETA calculation. We should fix this

ikawrakow · 2023-09-02T16:41:02Z

@ggerganov

I see TG is slightly faster with #2891 on your M2 Ultra. On my M2 Max this PR is very slightly faster (~1%) than #2891 for LLaMA-7B and fior Falcon-7B.

ggerganov · 2023-09-02T16:51:26Z

I see TG is slightly faster with #2891 on your M2 Ultra. On my M2 Max this PR is very slightly faster (~1%) than #2891 for LLaMA-7B and fior Falcon-7B.

Have you fetched the latest branch of #2891 ? I think I updated it yesterday by merging master into it and if you haven't fetched since then, then you might have a stale version. Or it's just a small difference between Max and Ultra

ikawrakow · 2023-09-02T16:59:12Z

I see TG is slightly faster with #2891 on your M2 Ultra. On my M2 Max this PR is very slightly faster (~1%) than #2891 for LLaMA-7B and fior Falcon-7B.

Have you fetched the latest branch of #2891 ? I think I updated it yesterday by merging master into it and if you haven't fetched since then, then you might have a stale version. Or it's just a small difference between Max and Ultra

Yes, I just updated this afternoon. I'm on b46ae7b for #2891

ggerganov · 2023-09-03T06:11:42Z

ggml-metal.metal

-    uint ith = tpitg.x;
-    uint nth = tptg.x;
+    float sumf = 0;
+    if (ne00 < 128) {


Should we have 2 separate kernels to avoid this branch?

I was considering it, but taking into account the desire for shorter and simple code didn't do it. This is something one needs to study more carefully anyway. The best way to perform this computation is not just a function of ne00.

mechanicmuthu · 2023-09-03T09:10:22Z

Causing incoherent paragraphs very fast in M2 Metal inference. -ngl 0 works fine though.

Sample: (With -ngl 99)

[INST]
<>
You are a helpful assistant.
<>
Tenali Rama story.
[/INST]
Certainly, I'd be happy to tell you a story about Tenali Rama!

Tenali Rama was a great poet and storyteller from India who lived in the 16th century. He was known for his wit and humor, and was particularly famous for his ability to create humorous stories and poems that poked fun at the foolishness of King and aristocrats of his time.
One day,One day, Rama was at the king's court of the king Krishnadev, who was known for a s good and foolish king. He loved to beed to and commerce and of his subjects, and did and making foolish all the time. Rama was not only clever than king's foolish his, but also very poor. and he had. He was very and 64 and king's and 4 queens, and and that's why Rama 6 for . sake. The king he is want Rama, so He was . Not ) i, he befoolish And he the king Rama) for him a very kind of do. him.

him the king, and to them of place, they still On themeant their poor have were They the to eat there and drinking, smile and the and and the them the for).

The king Rama Rama 6 him a the to the his and the king with a The the his and

Rama Rama six years king Krish his his in the was the Rama is was a the not his to poor Rama. He had his the a king of and he the king He was no foolish king and four king and the or for and the king of Ram at all the king Wash in king Rama for king Hindustan so and king all Rama and all a king all for a a king Rama a king for day.

kingish Rama 6 Rama Rama Of Raman Rams years por the king years. Rams poor king Rams and he Rams years yearsRams, of poor Rams six years were years, the years king king Rams.

Sample with -ngl 0:

[INST]
<>
You are a helpful assistant.
<>
Tenali Rama story.
[/INST]
Ah, the delightful Tenali Rama! He was a renowned poet and humorist of the 18th century in India. Here's a brief summary of his life and works:
Tenali Rama (1725-1805) was born in a brahmin family in the state of Andhra Pradesh, India. He was largely self-taught and developed a passion for literature at an early age. He was known for his wit, humor, and creativity, which he showcased through his writings, including poetry, drama, and fiction.
Rama's works are characterized by their playfulness, clever wordplay, and satire. He often used humor to comment on contemporary issues, politics, and society. His most notable work is the "Tenali Ramakrishnudu" (1775), a collection of humorous anecdotes and stories that are still popular today.
Some of Rama's most famous works include:

"Tenali Ramakrishnudu" (1775) - This is Rama's magnum opus, a collection of humorous anecdotes and stories that are still popular today. The work is divided into four volumes, each containing a series of tales featuring Tenali Ramakrishnudu, a clever and witty Brahmin who often outsmarts his neighbors and friends with his quick wit and clever ideas.

..

Please revert.

ggerganov · 2023-09-03T09:22:36Z

Yes, I just observed that F16 inference is broken with the PR - starts ok and then degrades into incoherent text.
Reverting the commit fixes the issue.

I'm looking into it

Edit: 363f0bf is the problematic commit

This restores the generated text to be the same as before #2959

ggerganov · 2023-09-03T10:00:19Z

ggml-metal.metal

+
+            float all_sum = simd_sum(sumf);
+            if (tiisg == 0) {
+                for (int i = 4*(ne00/4); i < ne00; ++i) sumf += (float) x[i] * y[i];


Changing this to all_sum += (float) x[i] * y[i]; in both kernels seems to resolve the issue

Kawrakow added 5 commits September 1, 2023 17:50

Very minor speedup via simd-group synchronization in f16 x f32

2cb47e0

Another very minor speedup on metal

e3ff8c2

Quite significant PP speedup on metal

2b60170

Another attempt

b557bc3

Minor

74df0de

ikawrakow requested a review from ggerganov September 1, 2023 16:12

ggerganov mentioned this pull request Sep 2, 2023

metal: template for mat-vec multiplication kernels #2891

Open

Merge branch 'master' into ik/more_metal_optimizations

01eed46

Massive improvement for TG for fp16

363f0bf

~4-5% improvement for Q8_0 TG on metal

6af0bab

ggerganov approved these changes Sep 3, 2023

View reviewed changes

ggerganov merged commit ca82cf7 into master Sep 3, 2023

ggerganov added a commit that referenced this pull request Sep 3, 2023

metal : revert 6af0bab until we fix it

d9151e6

This restores the generated text to be the same as before #2959

ggerganov reviewed Sep 3, 2023

View reviewed changes

ikawrakow pushed a commit that referenced this pull request Sep 3, 2023

Fix bug intriduced in PR #2959

6731796

mmnga pushed a commit to mmnga/llama.cpp that referenced this pull request Sep 6, 2023

revert ggml-org#2959 ca82cf7

cd69689

ggerganov mentioned this pull request Sep 7, 2023

metal : fix kernel_norm #3057

Merged

ikawrakow deleted the ik/more_metal_optimizations branch September 24, 2023 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More optimizations on metal #2959

More optimizations on metal #2959

Uh oh!

ikawrakow commented Sep 1, 2023 •

edited

Loading

Uh oh!

ggerganov commented Sep 2, 2023 •

edited

Loading

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov commented Sep 2, 2023

Uh oh!

ikawrakow commented Sep 2, 2023 •

edited

Loading

Uh oh!

ggerganov commented Sep 2, 2023 •

edited

Loading

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov commented Sep 2, 2023

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov Sep 3, 2023

Uh oh!

ikawrakow Sep 3, 2023

Uh oh!

mechanicmuthu commented Sep 3, 2023 •

edited

Loading

Uh oh!

ggerganov commented Sep 3, 2023 •

edited

Loading

Uh oh!

ggerganov Sep 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

More optimizations on metal #2959

More optimizations on metal #2959

Uh oh!

Conversation

ikawrakow commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Sep 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov commented Sep 2, 2023

Uh oh!

ikawrakow commented Sep 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Sep 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov commented Sep 2, 2023

Uh oh!

ikawrakow commented Sep 2, 2023

Uh oh!

ggerganov Sep 3, 2023

Choose a reason for hiding this comment

Uh oh!

ikawrakow Sep 3, 2023

Choose a reason for hiding this comment

Uh oh!

mechanicmuthu commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sample: (With -ngl 99)

Uh oh!

ggerganov commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ikawrakow commented Sep 1, 2023 •

edited

Loading

ggerganov commented Sep 2, 2023 •

edited

Loading

ikawrakow commented Sep 2, 2023 •

edited

Loading

ggerganov commented Sep 2, 2023 •

edited

Loading

mechanicmuthu commented Sep 3, 2023 •

edited

Loading

ggerganov commented Sep 3, 2023 •

edited

Loading

ggerganov Sep 3, 2023 •

edited

Loading