Releases: JamePeng/llama-cpp-python
v0.3.9-cu126-AVX2-win-20250620
Sync llama.cpp API 20250620
v0.3.9-cu126-AVX2-linux-20250620
Sync llama.cpp API 20250620
v0.3.9-cu124-AVX2-win-20250620
Sync llama.cpp API 20250620
v0.3.9-cu124-AVX2-linux-20250620
Sync llama.cpp API 20250620
v0.3.9-cu126-AVX2-win-20250525
Implement mtmd_cpp.py, base on tools/mtmd/mtmd.h #MTMD_API
Note: llava_cpp.py will be removed after llama_chat_format.py is adjusted.
It cannot connect to llava.dll (it is now mtmd.dll)
Sync kv-cache : add SWA support
Update llama.cpp API code 20250513
Sync context : remove logits_all flag and update API
Update LLAVA_API code in llava_cpp.py
Sync llava_cpp code: Update clip.h function API
Sync quantize: Handle user-defined quantization levels for additional tensors (#12511)
Sync llama : Support llama 4 text-only
Update llama : add option to override model tensor buffers
Sync llama-vocab : add SuperBPE pre-tokenizer
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
v0.3.9-cu126-AVX2-linux-20250525
Implement mtmd_cpp.py, base on tools/mtmd/mtmd.h #MTMD_API
Note: llava_cpp.py will be removed after llama_chat_format.py is adjusted.
It cannot connect to llava.dll (it is now mtmd.dll)
build-wheels-linux -DCMAKE_CUDA_ARCHITECTURES add 70-real;75-real
Sync kv-cache : add SWA support
Update llama.cpp API code 20250513
Sync context : remove logits_all flag and update API
Update LLAVA_API code in llava_cpp.py
Sync llava_cpp code: Update clip.h function API
Sync quantize: Handle user-defined quantization levels for additional tensors (#12511)
Sync llama : Support llama 4 text-only
Update llama : add option to override model tensor buffers
Sync llama-vocab : add SuperBPE pre-tokenizer
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
v0.3.9-cu124-AVX2-win-20250525
Implement mtmd_cpp.py, base on tools/mtmd/mtmd.h #MTMD_API
Note: llava_cpp.py will be removed after llama_chat_format.py is adjusted.
It cannot connect to llava.dll (it is now mtmd.dll)
Sync kv-cache : add SWA support
Update llama.cpp API code 20250513
Sync context : remove logits_all flag and update API
Update LLAVA_API code in llava_cpp.py
Sync llava_cpp code: Update clip.h function API
Sync quantize: Handle user-defined quantization levels for additional tensors (#12511)
Sync llama : Support llama 4 text-only
Update llama : add option to override model tensor buffers
Sync llama-vocab : add SuperBPE pre-tokenizer
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
v0.3.9-cu124-AVX2-linux-20250525
Implement mtmd_cpp.py, base on tools/mtmd/mtmd.h #MTMD_API
Note: llava_cpp.py will be removed after llama_chat_format.py is adjusted.
It cannot connect to llava.dll (it is now mtmd.dll)
build-wheels-linux -DCMAKE_CUDA_ARCHITECTURES add 70-real;75-real
Sync kv-cache : add SWA support
Update llama.cpp API code 20250513
Sync context : remove logits_all flag and update API
Update LLAVA_API code in llava_cpp.py
Sync llava_cpp code: Update clip.h function API
Sync quantize: Handle user-defined quantization levels for additional tensors (#12511)
Sync llama : Support llama 4 text-only
Update llama : add option to override model tensor buffers
Sync llama-vocab : add SuperBPE pre-tokenizer
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
v0.3.8-cu126-AVX2-win-20250506
0.3.8_Final_version
v0.3.8-cu126-AVX2-linux-20250506
0.3.8_Final_version