Skip to content

release-v1.1.0

Latest
Compare
Choose a tag to compare
@yhcvb yhcvb released this 11 Oct 08:53
  • Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
  • Added gdq algorithm to improve 4-bit quantization accuracy.
  • Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
  • Added support for Llama3, Gemma2, and Minicpm3 models.
  • Added support for gguf model conversion (currently supports q4_0 and fp16 only).
  • Added support for LoRa models.
  • Added storage and loading of prompt cache
  • Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
  • Fixed catastrophic forgetting issue when the token count exceeds max_context.
  • Optimized prefill speed.
  • Optimized generate speed.
  • Optimized model initialization time
  • Added support for four input interfaces: prompt, embedding, token, and multimodal.