Skip to content

Releases: Genta-Technology/Kolosal

v0.1.7

15 Mar 18:38
978f73c
Compare
Choose a tag to compare

What's Changed

  • Fixed installer to clean install the application, avoiding weird bugs
  • Fixed application and server crashed on large prompt
  • Added control of maximum number of tokens to be processed on each iteration frame
  • Fixed chat name can't contain some symbols
  • Allowing to rename duplicate chat name
  • Fixed pasted long text on system prompt crashed the application
  • Acrylic background
  • Refactor AI model config
  • Added section of downloaded model
  • Sort model list based on alphabet
  • Added search in model manager model
  • Gemma 3 support!

Full Changelog: v0.1.6...v0.1.7

v0.1.6

08 Mar 18:08
678b4ca
Compare
Choose a tag to compare
  • Introducing Kolosal AI Server, easily managed server within kolosal AI's application.
  • Added phi 4 and phi 4 mini models.
  • Added continous batching mechanism for decoding
  • Added kv cache management mechanism for batch decoding
  • Added model loading settings within server tabs
  • Added tab management system
  • Added automatic title generation for each chat history

v0.1.5

13 Feb 16:35
c330cec
Compare
Choose a tag to compare

What's Changed

  • Context shifting with StreamingLLM (https://arxiv.org/abs/2309.17453) to unlimited generation
  • Llimit max context to 4096 to make it more memory efficient and faster
  • Added stop generation
  • Added regenerate button
  • Redesign progress bar
  • Model loading handled asyncrhonously
  • Added unload model button
  • Huge refactor
  • Fix code block rendering glitches
  • Setting max new tokens to 0 will result in unlimited generation with context shifting
  • Fix application crash when delete a chat

Full Changelog: v0.1.4.1...v0.1.5

v0.1.4

01 Feb 14:00
d7f162f
Compare
Choose a tag to compare
  • added deepseek r1 support
  • added markdown rendering
  • added tps stat
  • added cancel download button
  • added delete model button
  • fixed model duplication issue
  • fix engine memory leak
  • added thinking UI
  • add automation to detect number of thread to use
  • fix last selected model issue
  • add fallback on loading model failed

v0.1.3

22 Jan 04:26
ecaa341
Compare
Choose a tag to compare

New feature

  • Added persistence KV Cache method: Persistence KV Cache allow Kolosal to have model kv cache state saved for each chat history, making processing previous chat to be instant.

Bug fixing

  • Fix model parameter is not correctly passed to the model
  • Fix deleting a chat, crashed the application
  • Fix switching model resulting the model to generate in different chat
  • Fix does not detect AMD GPU
  • Fix EOS not detected on a finetuned model with chatml format
  • Fix force Close in chat feature
  • Fix performance issue on GPU

New model

  • qwen 2.5 code 0.5b - 14b
  • qwen 2.5 14b

v0.1.2

12 Jan 09:33
41aac36
Compare
Choose a tag to compare

What's Changed

  • Fix GPU support, now can detect automatically nvidia/amd gpu in your device and select it
  • Added clear chat and delete chat buttons
  • Fix application shortcut (removed fn + left arrow key short cut to open kolosal)
  • Added Qwen2.5 models 0.5 - 7B

Full Changelog: v0.1.0...v0.1.2

v0.1.1

09 Jan 06:05
b505e86
Compare
Choose a tag to compare

What's Changed

  • Added Windows Installer
  • Added Sahabat AI Llama 3 8B
  • Added Sahabat AI Gemma 2 9B
  • Added Gemma 2 2B
  • Added Gemma 2 9B
  • Added Llama 3.1 8B
  • Added 8bit quantization support
  • Update quantization selection UI to be radio button

Full Changelog: v0.1...v0.1.1

v0.1.0

08 Jan 13:24
b505e86
Compare
Choose a tag to compare

Kolosal AI 0.1 marks the very first release of our groundbreaking solution for on-device Large Language Model (LLM) inference. Engineered to run smoothly on both CPUs and a variety of GPUs, Kolosal AI brings powerful AI capabilities to Windows 64-bit systems without relying on external servers or cloud dependencies.

Key Features and Highlights:

  • On-Device Inference: Harness the power of advanced LLMs locally on Windows 64-bit machines, preserving data privacy and reducing latency.
  • Broad Hardware Support: Optimize performance on most common CPUs and GPUs, making Kolosal AI accessible and efficient for a wide range of hardware configurations.
  • Easy Integration: Seamlessly incorporate Kolosal AI’s inference engine into existing applications or workflows with straightforward setup and minimal dependencies.
  • Low Latency & High Throughput: Experience near real-time responses through efficient model optimization and hardware utilization.

With Kolosal AI 0.1, developers and enthusiasts alike can begin exploring the possibilities of large-scale AI right from their desktop—no cloud required. This release is just the start of our mission to empower everyone with the latest advancements in artificial intelligence, all within a user-friendly and hardware-agnostic platform.