"""
This system implements a multilingual Retrieval-Augmented Generation (RAG) approach for answering questions about different things which includes a variety of content types, such as general text, technical documentation, and code snippets extracted from a standard. It uses advanced text processing, embeddings, and language models to provide accurate responses to any technical queries.
- Enhanced text splitting with context preservation
- Efficient document retrieval using FAISS
- Optimized model inference with 4-bit quantization
- Batched processing for improved performance
application/
: Web application using Streamlitoutputs/
: Saves the log files and Leaderboard submission filesrc/
: Source code modulesmain.py
: Entry pointrequirements.txt
: Dependencies
- EnhancedTextSplitter Class
- Handles specialized text splitting for Matter protocol documentation
- Features:
- Chunk size of 512 tokens with 50 token overlap
- Preserves important technical phrases
- Maintains question-answer context
- Custom separators for technical documentation
- Key Functions:
preprocess_text()
: Preserves important technical phrasespostprocess_chunk()
: Restores original text formatprocess_qa_pair()
: Processes Q&A pairs while maintaining context
- EmbeddingManager Class
- Manages document embeddings and similarity search
- Features:
- Uses SentenceTransformer model "multi-qa-mpnet-base-dot-v1"
- GPU-accelerated embedding generation
- FAISS index for efficient similarity search
- Key Functions:
encode_documents()
: Batch processing of document embeddingssearch()
: Retrieves similar documents using FAISS
- ModelManager Class
- Handles language model operations
- Features:
- Uses Qwen2.5-14B-Instruct model
- 4-bit quantization for efficiency
- Optimized prompt formatting
- Key Functions:
generate_response()
: Generates model responsesformat_prompt()
: Structures prompts with system instructions
- RAGSystem Class
- Main system implementation
- Features:
- Combines embedding and model components
- Batch processing of queries
- Performance statistics tracking
- Key Functions:
process_training_data()
: Builds knowledge baseprocess_test_data_batched()
: Handles test queries in batches
- Command Line Interface
- Configurable parameters:
- Training data path
- Test data path
- Model selection
- Output directory
- Batch size
- Features:
- Comprehensive logging
- Error handling
- Performance statistics
- Configurable parameters:
- Python 3.11.10
- CUDA-capable GPU (recommended)
- 16GB+ RAM (My personal machine has GPU with 24 GB RTX 4090 - Hence all dependencies are with this machine only)
- Clone the repository:
git clone [repository-url]
cd [repository-name]
- Create and activate a virtual environment:
python -m venv env
source env/bin/activate # Linux/Mac
.\env\Scripts\activate # Windows
- Install dependencies:
- Before we jump on to requirements.txt - We need to run separately
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
- Now we can go to this requirements.txt file
pip install -r requirements.txt
python main.py --train path/to/train.csv --test path/to/test.csv
python main.py \
--train path/to/train.csv \
--test path/to/test.csv \
--model "Qwen/Qwen2.5-14B-Instruct" \
--output "outputs" \
--batch-size 8 \
--seed 42
--train
: Path to training data CSV (required)--test
: Path to test data CSV (required)--model
: Model name/path (default: "Qwen/Qwen2.5-14B-Instruct")--output
: Output directory (default: "outputs")--batch-size
: Processing batch size (default: 8)--seed
: Random seed (default: 42)
- Results saved as CSV files in the output directory
- Log files with detailed execution information
- Performance statistics including:
- Processing success rate
- Execution time
- Query statistics
- 4-bit quantization for reduced memory usage
- Batch processing for efficient throughput
- GPU acceleration for embeddings
- FAISS indexing for fast similarity search
- Comprehensive error logging
- Graceful handling of runtime errors
- Input validation
- Resource cleanup
- Minimum:
- 16GB RAM
- CUDA-capable GPU (My personal machine has GPU with 24 GB RTX 4090 - Hence all dependencies are with this machine only)
- 50GB disk space
- Recommended:
- 32GB RAM
- GPU with 16GB+ VRAM (My personal machine has GPU with 24 GB RTX 4090 - Hence all dependencies are with this machine only)
- 100GB SSD storage
- Requires CUDA-capable GPU for optimal performance
- Model size requires significant memory
- Processing speed depends on hardware capabilities
- Requires proper error handling for production use
- Multi-GPU support
- Dynamic batch size adjustment
- Additional model support
- Enhanced error recovery
- Performance optimization
Contributions are welcome! Please feel free to submit pull requests.
Author
: Shravan KonintiEmail
: shravankumar224@gmail.comLinkedIn
: LinkedIn
Copyright Notice Copyright (c) 2024 Shravan Koninti
License Terms Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Disclaimer THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""