Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : new Q4 and Q5 quantization formats + backward ops #154

Merged
merged 5 commits into from
May 14, 2023
Merged

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented May 14, 2023

ref #150

sync llama.cpp

  • bump GGML_QNT_VERSION -> 1
  • increase ggml object overhead size from 256 to 512 in examples
  • drop Q4_2 support
  • ggml_tensor.backend member (CPU / CUDA)
  • fix data race in multi-threaded ggml_diag_mask_inf() operator a483bb2
  • fix ggml_rope() when not inplace 788381e
  • fix ggml_rope() GPT-NeoX (hopefully) 788381e
  • some of the old ops are no longer implicitly inplace !!! make sure to update your code if necessary, by explicitly using inplace calls, otherwise there will be unnecessary copies of some of the tensors:
    • ggml_scale() -> ggml_scale_inplace()
    • ggml_diag_mask_inf() -> ggml_diag_mask_info_inplace()
    • ggml_soft_max() -> ggml_soft_max_inplace()
    • ggml_rope() -> ggml_rope_inplace()
    • see 5839d9e

sync llama.cpp

- bump GGML_QNT_VERSION -> 1
- increase cwggml object overhead size from 256 to 512 in examples
- drop Q4_2 support
- tensor backend support CUDA
@ggerganov ggerganov merged commit 3ce3145 into master May 14, 2023
@ggerganov ggerganov deleted the new-qnt branch May 14, 2023 12:18
CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023
…gerganov#154) (ggerganov#294)

* Use F16 for memory_k and memory_v

* add command line switch to use f16 instead of f32 for memory k+v

---------

Co-authored-by: Ty Everett <ty@tyweb.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant