A lightweight implementation of LoRA (Low-Rank Adaptation) and RAG (Retrieval Augmented Generation) for efficient language model fine-tuning. The project provides tools for model adaptation, semantic search, and evaluation.
- LoRA fine-tuning with 8-bit quantization
- FAISS-based semantic search and RAG implementations
- MLflow integration for experiment tracking
- Support for TinyLlama, Mistral, Llama, and Phi-2 models
- PyTorch, Hugging Face Transformers
- FAISS, LangChain
- MLflow, Python, CUDA
- LoRA
- Langchain RAG
- Custom RAG
- MLflow