A powerful Retrieval-Augmented Generation (RAG) system built with CrewAI agents, featuring a REST API and a beautiful Gradio web interface. This project demonstrates how to create intelligent AI agents that can research queries, synthesize information, and provide comprehensive responses.
- Multi-Agent Architecture: Researcher and Writer agents working together
- Web Search Integration: Powered by SerperDev tools for real-time information
- REST API: Clean API endpoints using LitServe
- Beautiful Web UI: Interactive Gradio interface
- Command Line Client: Simple CLI for testing and automation
- Production Ready: Proper error handling and logging
This project leverages Qwen 3 4B via Ollama as a local language model, providing:
- π Complete Data Privacy - All inference happens on your machine
- π° Zero API Costs - No charges for LLM calls
- β‘ Fast Responses - No network latency
- π Offline Capability - Works without internet (except web search)
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Web Client β β REST API β β CrewAI Agents β
β (Gradio UI) βββββΊβ (LitServe) βββββΊβ (Researcher β
β β β β β + Writer) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β SerperDev β
β Web Search β
ββββββββββββββββββββ
This project uses a multi-agent architecture powered by CrewAI, where specialized AI agents collaborate to provide intelligent responses:
-
Researcher Agent π
- Role: Information gathering and analysis
- Tools: SerperDev web search integration
- Goal: Research the user's query and generate comprehensive insights
- Process:
- Receives the user's query
- Uses web search to find relevant, up-to-date information
- Analyzes and synthesizes findings
- Produces research insights and context
-
Writer Agent βοΈ
- Role: Response synthesis and communication
- Goal: Transform research insights into clear, informative responses
- Process:
- Takes the researcher's insights as input
- Crafts a concise, well-structured response
- Ensures the response is accurate and easy to understand
- Formats the final answer for the user
User Query β Researcher Agent β Web Search β Analysis β Writer Agent β Final Response
β β β β β β
"What is Researches & Searches Synthesizes Writes "Machine
machine gathers info current research clear learning
learning?" from web data results response" is..."
- Sequential Processing: Agents work in sequence, each building on the previous agent's work
- Tool Integration: Real-time web search for current information
- Context Preservation: Information flows from researcher to writer seamlessly
- Quality Assurance: Each agent has specialized expertise for their role
- Error Handling: Robust error handling ensures reliable responses
- Framework: CrewAI for agent orchestration
- Language Model: Ollama Qwen 3 4B for intelligent responses
- Tools: SerperDev for web search capabilities
- API: LitServe for production-ready serving
- UI: Gradio for beautiful web interface
- Python 3.11 or higher
- uv package manager (recommended)
- Internet connection for web search functionality
-
Clone the repository
git clone <your-repo-url> cd deploy-agentic-rag
-
Install dependencies
uv sync
-
Set up environment variables
# Copy the example environment file cp .env.example .env # Edit .env and add your actual API key # Get your SerperDev API key from: https://serper.dev SERPER_API_KEY="your-actual-serper-api-key-here"
-
Start the API server
uv run python server.py
The API will be available at
http://localhost:8000 -
Start the web interface (in a new terminal)
uv run python gradio_ui.py
The web UI will be available at
http://localhost:7860 -
Test with the CLI client
uv run python client.py --query "What is machine learning?"
Agentic RAG Interface showcasing intelligent AI responses to complex queries about AI breakthroughs
- Open
http://localhost:7860in your browser - Enter your query in the text box
- Click "Submit Query" or press Enter
- View the intelligent response generated by the AI agents
Endpoint: POST /predict
Request:
{
"query": "Explain quantum computing in simple terms"
}Response:
{
"output": {
"raw": "Quantum computing is a type of computing that uses quantum mechanics...",
"tasks_output": [...],
"token_usage": {...}
}
}# Simple query
uv run python client.py --query "What is the capital of France?"
# The client displays responses with a typewriter effect- Server URL:
http://localhost:8000 - Timeout: 60 seconds
- Model: Ollama Qwen 3 4B (configurable in
server.py)
Researcher Agent:
- Role: Research and gather information
- Tools: SerperDev web search
- Goal: Generate comprehensive insights
Writer Agent:
- Role: Synthesize information into clear responses
- Goal: Provide concise, informative answers
deploy-agentic-rag/
βββ server.py # Main API server using LitServe
βββ client.py # Command-line client
βββ gradio_ui.py # Web interface
βββ pyproject.toml # Project configuration
βββ README.md # This file
- Custom Agents: Modify
server.pyto add new agent types - Additional Tools: Extend the Researcher agent with more tools
- UI Enhancements: Customize
gradio_ui.pyfor new features
# Run tests
uv run pytest
# Code formatting
uv run black .
uv run flake8 .- GET
/health- Check server status
- POST
/predict- Submit queries for processing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- CrewAI for the amazing agent framework
- LitServe for the lightweight API serving
- Gradio for the beautiful web interface
- SerperDev for web search capabilities
Made with β€οΈ using CrewAI Agents