Releases: Genta-Technology/Kolosal
v0.1.7
What's Changed
- Fixed installer to clean install the application, avoiding weird bugs
- Fixed application and server crashed on large prompt
- Added control of maximum number of tokens to be processed on each iteration frame
- Fixed chat name can't contain some symbols
- Allowing to rename duplicate chat name
- Fixed pasted long text on system prompt crashed the application
- Acrylic background
- Refactor AI model config
- Added section of downloaded model
- Sort model list based on alphabet
- Added search in model manager model
- Gemma 3 support!
Full Changelog: v0.1.6...v0.1.7
v0.1.6
- Introducing Kolosal AI Server, easily managed server within kolosal AI's application.
- Added phi 4 and phi 4 mini models.
- Added continous batching mechanism for decoding
- Added kv cache management mechanism for batch decoding
- Added model loading settings within server tabs
- Added tab management system
- Added automatic title generation for each chat history
v0.1.5
What's Changed
- Context shifting with StreamingLLM (https://arxiv.org/abs/2309.17453) to unlimited generation
- Llimit max context to 4096 to make it more memory efficient and faster
- Added stop generation
- Added regenerate button
- Redesign progress bar
- Model loading handled asyncrhonously
- Added unload model button
- Huge refactor
- Fix code block rendering glitches
- Setting max new tokens to 0 will result in unlimited generation with context shifting
- Fix application crash when delete a chat
Full Changelog: v0.1.4.1...v0.1.5
v0.1.4
- added deepseek r1 support
- added markdown rendering
- added tps stat
- added cancel download button
- added delete model button
- fixed model duplication issue
- fix engine memory leak
- added thinking UI
- add automation to detect number of thread to use
- fix last selected model issue
- add fallback on loading model failed
v0.1.3
New feature
- Added persistence KV Cache method: Persistence KV Cache allow Kolosal to have model kv cache state saved for each chat history, making processing previous chat to be instant.
Bug fixing
- Fix model parameter is not correctly passed to the model
- Fix deleting a chat, crashed the application
- Fix switching model resulting the model to generate in different chat
- Fix does not detect AMD GPU
- Fix EOS not detected on a finetuned model with chatml format
- Fix force Close in chat feature
- Fix performance issue on GPU
New model
- qwen 2.5 code 0.5b - 14b
- qwen 2.5 14b
v0.1.2
What's Changed
- Fix GPU support, now can detect automatically nvidia/amd gpu in your device and select it
- Added clear chat and delete chat buttons
- Fix application shortcut (removed fn + left arrow key short cut to open kolosal)
- Added Qwen2.5 models 0.5 - 7B
Full Changelog: v0.1.0...v0.1.2
v0.1.1
What's Changed
- Added Windows Installer
- Added Sahabat AI Llama 3 8B
- Added Sahabat AI Gemma 2 9B
- Added Gemma 2 2B
- Added Gemma 2 9B
- Added Llama 3.1 8B
- Added 8bit quantization support
- Update quantization selection UI to be radio button
Full Changelog: v0.1...v0.1.1
v0.1.0
Kolosal AI 0.1 marks the very first release of our groundbreaking solution for on-device Large Language Model (LLM) inference. Engineered to run smoothly on both CPUs and a variety of GPUs, Kolosal AI brings powerful AI capabilities to Windows 64-bit systems without relying on external servers or cloud dependencies.
Key Features and Highlights:
- On-Device Inference: Harness the power of advanced LLMs locally on Windows 64-bit machines, preserving data privacy and reducing latency.
- Broad Hardware Support: Optimize performance on most common CPUs and GPUs, making Kolosal AI accessible and efficient for a wide range of hardware configurations.
- Easy Integration: Seamlessly incorporate Kolosal AI’s inference engine into existing applications or workflows with straightforward setup and minimal dependencies.
- Low Latency & High Throughput: Experience near real-time responses through efficient model optimization and hardware utilization.
With Kolosal AI 0.1, developers and enthusiasts alike can begin exploring the possibilities of large-scale AI right from their desktop—no cloud required. This release is just the start of our mission to empower everyone with the latest advancements in artificial intelligence, all within a user-friendly and hardware-agnostic platform.