A Retrieval Augmented Generation (RAG) system built with LangChain and Anthropic's Claude.
rag-project/
├── src/
│ └── rag_system/
│ ├── __init__.py
│ ├── llm.py
│ └── document_loader.py
├── tests/
│ ├── __init__.py
│ └── test_llm.py
├── setup.py
├── requirements.txt
├── .env.example
├── .cursorignore
└── README.md
- Python 3.9+
- OpenAI API key (or other LLM provider)
- Clone the repository:
git clone <repository-url>
cd rag_system
- Create and activate a virtual environment:
python -m venv venv
- Activate the virtual environment:
- On macOS/Linux:
source venv/bin/activate
- On Windows:
venv\Scripts\activate
- Install the package in development mode:
pip install -e .
- Copy
.env.example
to.env
and add your Anthropic API key:
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY
- Ingest documents:
python -m rag_system.cli ingest documents/your_file.pdf
- Query the system:
python -m rag_system.cli query "Your question here"
- Interactive mode:
python -m rag_system.cli interactive
- Run tests:
python -m pytest
- Run specific test file:
python -m pytest tests/test_llm.py
- Document ingestion from various formats (PDF, TXT, HTML)
- Web content ingestion
- Semantic search using embeddings
- RAG implementation with Anthropic's Claude
- Interactive query interface
- Vector database storage with ChromaDB
- langchain
- langchain-community
- langchain-anthropic
- chromadb
- python-dotenv
- sentence-transformers
- beautifulsoup4
- requests
- pypdf
- pytest
MIT License