Yi vl llava templating #76

Nexesenex · 2024-01-26T07:22:59Z

No description provided.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

bugfix for new conversions

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

CUDA: faster float -> iq4_nl conversion (#73) * iqk_mul_mat: better iq4_nl implementation on Zen4/AVX2 PP-512 performance for LLaMA-3.1-8B goes to 162.6 t/s up from 133.2 t/s. * Speed up float -> iq4_nl conversion on CUDA --------- iq4_nl: faster quantization (#76) Enable IQ4_NL for V-cache in token generation Add IQ4_NL + IQ4_NL to FA This is a better alternative than Q4_0 + Q4_0 for the VRAM poor. IQ4_NL KVQ for KCPP/Croco missing templates instances for KVQ IQ4_NL Update fattn.cu for KVQ IQ4_NL Update fattn-vec-f16.cuh for KVQ IQ4_NL Update fattn-vec-f32.cuh for KVQ IQ4_NL CML and Makefile FOR IQ4_NL KV_IQ4_NL uncommenting VEC16 cases KV_IQ4_NL uncommenting VEC32 cases Adding Q6_0 (#77) * Adding q6_0 - basics + AVX2/Zen4 working * Adding q6_0: CUDA dequantize works, but not mmvq * Adding q6_0: CUDA mmvq works * Adding q6_0: CUDA cpy, so Q6_0 can be used for KV-cache * Add q6_0 to CPU flash attention Disappointing result: for LlaMA-3.2-1B, q6_0 K- and V-cache gives about the same PPL as q8_0 K-cache and q4_0 V-cache, while needing the exact same RAM. I.e., what was the point? * q6_0: slightly better kv-cache result Better than q8_0+q4_0, but not as good as q8_0+iq4_nl * q6_0: works on ARM_NEON * q6_0: dequantize works on Metal, but not vector dot product * q6_0: it now works on Metal Outperforms q5_0 by a significant margin. E.g. | model | size | params | backend | ngl | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | ---------------: | | llama 8B Q6_0 | 6.08 GiB | 8.03 B | Metal | 100 | 4 | tg128 | 44.02 ± 0.08 | | llama 8B Q5_0 | 5.21 GiB | 8.03 B | Metal | 100 | 4 | tg128 | 40.13 ± 0.12 | | llama 8B Q6_0 | 6.08 GiB | 8.03 B | Metal | 100 | 4 | pp512 | 500.55 ± 0.32 | | llama 8B Q5_0 | 5.21 GiB | 8.03 B | Metal | 100 | 4 | pp512 | 448.02 ± 0.27 | * q6_0: can now be used for kv-cache on Metal --------- Enable q6_0 for flash attention As with IQ4_NL, just for head size of 128 for now. Without GGML_CUDA_FA_ALL_QUANTS set, only Q6_0 + Q5_0 and Q8_0 + Q6_0 are included. With this the VRAM poor have better options for selecting the best possible (as allowed by VRAM, model size, context length) quantized KV-cache. PR by Ikawrakow on ik_llama.cpp

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

cmp-nct and others added 5 commits January 23, 2024 05:40

Support for Yi-VL, templating fix for mobileVLM

8f39716

ws

ffcd8e7

Update examples/llava/clip.cpp

51462f1

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Update llava-cli.cpp

0dbd295

Update clip.cpp

fccab82

bugfix for new conversions

Nexesenex merged commit 8f7b17b into Nexesenex:_master_up Jan 26, 2024

Nexesenex pushed a commit that referenced this pull request Oct 19, 2024

iq4_nl: faster quantization (#76)

bcea333

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 20, 2024

iq4_nl: faster quantization (#76)

3bf5c4c

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 20, 2024

iq4_nl: faster quantization (#76)

da1f55b

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 20, 2024

iq4_nl: faster quantization (#76)

2b1d14d

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 21, 2024

iq4_nl: faster quantization (#76)

03536e4

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 21, 2024

iq4_nl: faster quantization (#76)

17e1bfd

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 22, 2024

iq4_nl: faster quantization (#76)

c394889

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 22, 2024

iq4_nl: faster quantization (#76)

1cf6d7c

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

0e3a6b5

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

7885e1f

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

43a4612

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

bc4ecf3

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

13a66bf

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2024

iq4_nl: faster quantization (#76)

b0a5850

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 26, 2024

iq4_nl: faster quantization (#76)

6b31810

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 27, 2024

iq4_nl: faster quantization (#76)

21cf679

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 27, 2024

iq4_nl: faster quantization (#76)

5bf87a5

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 27, 2024

iq4_nl: faster quantization (#76)

0a7f6ef

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Dec 22, 2024

iq4_nl: faster quantization (#76)

d6909ed

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Feb 25, 2025

iq4_nl: faster quantization (#76)

86a1592

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 3, 2025

iq4_nl: faster quantization (#76)

ef46766

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 4, 2025

iq4_nl: faster quantization (#76)

df833de

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 5, 2025

iq4_nl: faster quantization (#76)

8ec6246

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 7, 2025

iq4_nl: faster quantization (#76)

47773f8

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 7, 2025

iq4_nl: faster quantization (#76)

f2a4322

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 9, 2025

iq4_nl: faster quantization (#76)

821f87e

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 9, 2025

iq4_nl: faster quantization (#76)

7c6f375

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 11, 2025

iq4_nl: faster quantization (#76)

268a1ac

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 11, 2025

iq4_nl: faster quantization (#76)

e1df5f1

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 11, 2025

iq4_nl: faster quantization (#76)

4572f4b

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 12, 2025

iq4_nl: faster quantization (#76)

950322c

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 13, 2025

iq4_nl: faster quantization (#76)

8b23a36

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 13, 2025

iq4_nl: faster quantization (#76)

8a24205

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 16, 2025

iq4_nl: faster quantization (#76)

ae1d0cf

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 16, 2025

iq4_nl: faster quantization (#76)

fb10b29

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 18, 2025

iq4_nl: faster quantization (#76)

b942f9d

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 19, 2025

iq4_nl: faster quantization (#76)

e5e3f50

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 20, 2025

iq4_nl: faster quantization (#76)

86745c6

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 21, 2025

iq4_nl: faster quantization (#76)

2d58761

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 21, 2025

iq4_nl: faster quantization (#76)

7391cd2

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 21, 2025

iq4_nl: faster quantization (#76)

9707685

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 22, 2025

iq4_nl: faster quantization (#76)

8d4a8eb

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 22, 2025

iq4_nl: faster quantization (#76)

0c3d682

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 23, 2025

iq4_nl: faster quantization (#76)

1370546

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 24, 2025

iq4_nl: faster quantization (#76)

096c1bd

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2025

iq4_nl: faster quantization (#76)

bdea2db

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2025

iq4_nl: faster quantization (#76)

9675134

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 25, 2025

iq4_nl: faster quantization (#76)

a344708

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 27, 2025

iq4_nl: faster quantization (#76)

c229356

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex pushed a commit that referenced this pull request Oct 28, 2025

iq4_nl: faster quantization (#76)

8188cc6

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Yi vl llava templating #76

Yi vl llava templating #76

Uh oh!

Nexesenex commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yi vl llava templating #76

Yi vl llava templating #76

Uh oh!

Conversation

Nexesenex commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants