-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Issues: ggerganov/llama.cpp
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
llama/ggml: add LLM training support
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
testing
Everything test related
#10544
opened Nov 27, 2024 by
JohannesGaessler
Loading…
ggml: skip excess iteration for pair whose vars same element when i2 == i1
ggml
changes relating to the ggml tensor library for machine learning
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#9177
opened Aug 25, 2024 by
GermanAizek
Loading…
2 of 4 tasks
ggml : make GeLU faster and more accurate on CPU
ggml
changes relating to the ggml tensor library for machine learning
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#8878
opened Aug 5, 2024 by
jart
Loading…
Add support for loongarch backend in sgemm.cpp
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#8726
opened Jul 27, 2024 by
Tianzhengshuyuan
Loading…
llama : support Jamba hybrid Transformer-Mamba models
android
Issues specific to Android
embeddings
embedding related topics
enhancement
New feature or request
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
need feedback
Testing and feedback with results are needed
python
python script changes
refactoring
Refactoring
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
server
Introduce Q8_0 and Q4_0 with Bf16 delta values
examples
ggml
changes relating to the ggml tensor library for machine learning
python
python script changes
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#7497
opened May 23, 2024 by
Srihari-mcw
Loading…
Add token healing to New feature or request
examples
help wanted
Extra attention is needed
need feedback
Testing and feedback with results are needed
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
server
main
and server
enhancement
#7187
opened May 9, 2024 by
mare5x
Loading…
Fix flash attention for ROCm
enhancement
New feature or request
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#7011
opened Apr 30, 2024 by
jdecourval
•
Draft
support MiniCPM-V-2
demo
Demonstrate some concept or idea, not intended to be merged
enhancement
New feature or request
examples
python
python script changes
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6919
opened Apr 26, 2024 by
Achazwl
Loading…
cuda : use amd wave sharing intrinsics for warp_reduce functions
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6522
opened Apr 7, 2024 by
Engininja2
Loading…
Adding Support for Custom Qwen2moe Architectures with mergekit-qwen2
model
Model specific
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Smooth Sampling / Quadratic Sampling support
generation quality
Quality of model output
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6445
opened Apr 2, 2024 by
kalomaze
Loading…
Xeon Phi (Knights Corner) Support.
enhancement
New feature or request
ggml
changes relating to the ggml tensor library for machine learning
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6440
opened Apr 2, 2024 by
julialongtin
Loading…
Fix IQ1_S quantization
bugfix
fixes an issue or bug
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
llama : compute BERT graph with F16 K, V
demo
Demonstrate some concept or idea, not intended to be merged
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#5891
opened Mar 5, 2024 by
ggerganov
Loading…
llama : switch to floating-point token positions
demo
Demonstrate some concept or idea, not intended to be merged
refactoring
Refactoring
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
P-Step Truncation Sampling
generation quality
Quality of model output
need feedback
Testing and feedback with results are needed
refactoring
Refactoring
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#5675
opened Feb 23, 2024 by
p-e-w
Loading…
[RFC] common, server : add top-a sampler
enhancement
New feature or request
generation quality
Quality of model output
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#5612
opened Feb 20, 2024 by
Artefact2
Loading…
cuda: 1.2x faster dequantization kernel
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#2809
opened Aug 26, 2023 by
li-plus
Loading…
Q4_0 scale selection using RMSE
enhancement
New feature or request
Less than 4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
ProTip!
Find all open issues with in progress development work with linked:pr.