Boost your GPU's LLM performance by 300% on everyday GPU hardware, as validated by renowned developers, in just 5 minutes of setup and with no additional hardware costs.
- Radical Simplicity (Utilizing super-powerful LLMs with as minimal lines of code as possible)
- Ultra-Optimizated Peformance (High Performance code that extract all the power from these LLMs)
- Fludity & Shapelessness (Plug in and play and re-architecture as you please)
$ pip3 install exxa
-
World-Class Quantization: Get the most out of your models with top-tier performance and preserved accuracy! 🏋️♂️
-
Automated PEFT: Simplify your workflow! Let our toolkit handle the optimizations. 🛠️
-
LoRA Configuration: Dive into the potential of flexible LoRA configurations, a game-changer for performance! 🌌
-
Seamless Integration: Designed to work seamlessly with popular models like LLAMA, Falcon, and more! 🤖
We're excited about the journey ahead and would love to have you with us! For feedback, suggestions, or contributions, feel free to open an issue or a pull request. Let's shape the future of fine-tuning together! 🌱
Check out our project board for our current backlog and features we're implementing
MIT
- Setup utils logger classes for metric logging with useful metadata such as token inference per second, latency, memory consumption
- Add cuda c++ extensions for radically optimized classes for high performance quantization + inference on the edge