A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
-
Updated
Feb 27, 2024 - Python
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
A JAX implementation of stochastic addition.
A Pytorch implementation of stochastic addition.
End-to-end Vietnamese ASR pipeline using NVIDIA NeMo. Features production-grade CI/CD (GitHub Actions), hardware-aware optimization (40% latency reduction), and robust linguistic data testing.
Add a description, image, and links to the bfloat16 topic page so that developers can more easily learn about it.
To associate your repository with the bfloat16 topic, visit your repo's landing page and select "manage topics."