An AI-powered tool library for scientific research using Retrieval Augmented Generation (RAG) with Large Language Models (LLMs). LLMaven provides open, transparent, and useful AI-based software for scientific discovery by leveraging publicly available diverse datasets and disparate academic knowledge bases.
LLMaven's scientific goal is to create accessible AI tools for researchers who need to work with private/IP-sensitive data in a cost-effective manner. The project uses RAG-based LLMs to extend language models with domain-specific knowledge without requiring expensive model training or specialized hardware for individual researchers.
- FastAPI Backend: RESTful API with retrieval and generation endpoints
- Streamlit Frontend: Interactive chat interface for document Q&A
- RAG Architecture: Combines retrieval from vector databases with language model generation
- Vector Database: Qdrant-based document storage with semantic search (MMR - Maximal Marginal Relevance)
- Agentic RAG System (NEW): Next-generation hybrid search with multi-vector embeddings (Dense + Sparse + ColBERT) and intelligent agent-based Q&A
- Flexible Models:
- Embedding models via HuggingFace (default: sentence-transformers/all-MiniLM-L12-v2)
- Generation models via HuggingFace Transformers (default: allenai/OLMo-2-1124-7B-Instruct)
- Quantization support (4-bit/8-bit) for efficient inference
- Infrastructure Deployment: Production-ready Azure infrastructure
deployment with Pulumi
- PostgreSQL Flexible Server for persistent storage
- Azure Key Vault for secret management
- Container Apps for MLflow and LiteLLM
- Comprehensive validation and cost estimation
LLMaven consists of several components that work together:
graph TB
subgraph Main[LLMaven Architecture]
UI["Streamlit UI<br/>Port 8501"] --> Backend["FastAPI Backend<br/>Port 8000"]
Backend --> EndpointsGroup
subgraph EndpointsGroup[API Endpoints]
Retrieve["Retrieve<br/>/v1/retrieve"]
Generate["Generate<br/>/v1/generate"]
end
Retrieve --> RetService["Retrieval Service"]
Generate --> GenService["Generation Service"]
RetService --> Qdrant["Qdrant Vector Store<br/>Embeddings"]
GenService --> HF["HuggingFace Transformers<br/>LLM"]
end
subgraph Infrastructure[Azure Infrastructure]
KeyVault["Key Vault<br/>Secrets Management"]
PostgreSQL["PostgreSQL<br/>Database"]
Storage["Storage Account<br/>Blob Storage"]
ContainerApps["Container Apps<br/>MLflow & LiteLLM"]
end
style Main fill:#f9f9f9,stroke:#333,stroke-width:2px
style UI fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style EndpointsGroup fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style RetService fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style GenService fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style Qdrant fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style HF fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style Infrastructure fill:#e1f5ff,stroke:#01579b,stroke-width:2px
-
FastAPI Backend (
/src/llmaven/main.py)- RESTful API with automatic OpenAPI documentation
- CORS middleware for frontend integration
- Error handling and validation
- Health check endpoints
-
Streamlit Frontend (
/src/llmaven/frontend/app.py)- Interactive chat interface
- Document upload (PDF support)
- Real-time retrieval and generation
- Chat history display
-
Retrieval Service (
/src/llmaven/services/retrieval_service.py)- Document embedding and vector storage
- Semantic search using Qdrant
- MMR (Maximal Marginal Relevance) retrieval
- Support for temporary and persistent collections
-
Generation Service (
/src/llmaven/services/generation_service.py)- Language model inference with caching
- Quantized model loading (4-bit/8-bit)
- HuggingFace Pipeline integration
- Configurable generation parameters
-
Infrastructure Management (
/src/llmaven/infrastructure/)- YAML-based configuration schema
- Pulumi-based Azure resource provisioning
- Secret management via Azure Key Vault
- Comprehensive validation and deployment workflow
- Python 3.12+ (for LLMaven API)
- Python 3.11+ (for other components)
- Pixi package manager
- Azure CLI (for infrastructure deployment)
-
Install Pixi:
curl -fsSL https://pixi.sh/install.sh | bash -
Clone the repository:
git clone https://github.com/uw-ssec/llmaven.git cd llmaven -
Install dependencies:
pixi install
LLMaven uses multiple Pixi environments for different components:
llmaven: Main API environment (Python 3.12)frontend: Streamlit UI environmentproxy: OpenAI proxy service environment (archived)infra: Infrastructure management environment
The agentic RAG system provides superior retrieval accuracy with hybrid search and intelligent agent-based Q&A.
- Start Qdrant (required for vector storage):
docker run -p 6333:6333 qdrant/qdrant- Ingest Documents:
# Enter the llmaven environment
pixi shell -e llmaven
# Ingest documents from a directory
llmaven agentic ingest ./docs
# Or ingest from multiple directories
llmaven agentic ingest ./docs ./papers --collection research-docs- Search the Knowledge Base:
# Hybrid search with reranking
llmaven agentic search "What is machine learning?"
# Search without reranking (faster)
llmaven agentic search "vector embeddings" --no-rerank
# Custom top-k results
llmaven agentic search "transformer architecture" --top-k 10- Interactive Chat:
# Start interactive RAG chat
llmaven agentic chat
# Use custom collection or LLM provider
llmaven agentic chat --collection my-docs --provider ollama --model llama2Set environment variables with AGENTIC_ prefix:
# Qdrant configuration
export AGENTIC_QDRANT_URL=http://localhost:6333
export AGENTIC_COLLECTION_NAME=agentic-rag
# LLM configuration
export AGENTIC_LLM_PROVIDER=openai
export AGENTIC_LLM_MODEL=gpt-4o-mini
# Search configuration
export AGENTIC_ENABLE_RERANK=true
export AGENTIC_PREFETCH_TOP_K=20
export AGENTIC_FINAL_TOP_K=5See AGENTS.md for complete configuration options and architecture details.
The primary way to use LLMaven is through its FastAPI backend and Streamlit frontend.
# Using the pixi environment (recommended)
pixi shell -e llmaven
# Start in development mode with auto-reload
llmaven server serve --env development --reload
# Or start in production mode with multiple workers
llmaven server serve --env production --workers 4The API will be available at:
- API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
In a separate terminal:
# Using the pixi environment (recommended)
pixi shell -e llmaven
# Launch the UI
llmaven server ui
# Or customize host and port
llmaven server ui --host 0.0.0.0 --port 8080 --no-browserThe UI will open automatically in your browser at http://localhost:8501
- Upload Documents: Use the file uploader to add PDF documents
- Ask Questions: Type your question in the chat input
- View Results: See retrieved document chunks and AI-generated answers
- Chat History: All interactions are stored in the session
LLMaven provides a comprehensive infrastructure deployment system for production workloads.
# Enter the llmaven environment
pixi shell -e llmaven
# Initialize deployment configuration
llmaven infra init --environment dev
# This creates llmaven-config.yaml with sensible defaultsEdit the generated llmaven-config.yaml file:
project:
name: llmaven
environment: dev
location: eastus
azure:
subscription_id: "your-subscription-id"
# tenant_id is auto-detected
database:
admin_login: llmaven_admin
sku_name: "B_Standard_B1ms"
databases: [llmaven, mlflow_db, litellm_db]
mlflow:
enabled: true
image: "ghcr.io/mlflow/mlflow:latest"
litellm:
enabled: true
image: "ghcr.io/berriai/litellm:latest"Secrets are provided via environment variables:
# Generate a secure master key
export LLMAVEN_SECRETS_LITELLM_MASTER_KEY="$(openssl rand -base64 32)"
# Add your API keys
export LLMAVEN_SECRETS_AZURE_OPENAI_API_KEY="your-azure-openai-key"
export LLMAVEN_SECRETS_ANTHROPIC_API_KEY="your-anthropic-key"
# Or create a .env file
cat > .env.secrets <<EOF
LLMAVEN_SECRETS_LITELLM_MASTER_KEY=your-master-key
LLMAVEN_SECRETS_AZURE_OPENAI_API_KEY=your-azure-openai-key
LLMAVEN_SECRETS_ANTHROPIC_API_KEY=your-anthropic-key
EOF# Validate with strict mode (recommended for production)
llmaven infra validate --config llmaven-config.yaml --strict
# Or validate with secrets from .env file
llmaven infra validate --env-file .env.secrets --strictThis validates:
- Configuration syntax and schema
- Azure subscription and permissions
- Resource quotas and limits
- Secret presence
- Cost estimation
# Preview what will be deployed
llmaven infra deploy --preview
# Deploy infrastructure
llmaven infra deploy --yes
# Or deploy with .env file
llmaven infra deploy --env-file .env.secrets --yes# View deployment status and resource URLs
llmaven infra status
# Outputs include:
# - MLflow URL: https://llmaven-dev-mlflow.{region}.azurecontainerapps.io
# - LiteLLM URL: https://llmaven-dev-litellm.{region}.azurecontainerapps.io
# - Resource names (Key Vault, Storage Account, PostgreSQL, etc.)# Destroy all resources
llmaven infra destroy --yesWhen you deploy infrastructure, LLMaven creates:
- Resource Group: Container for all resources
- Virtual Network: With subnets for Container Apps and PostgreSQL
- Key Vault: Centralized secret management with RBAC
- PostgreSQL Flexible Server: Managed database with:
- Databases: llmaven, mlflow_db, litellm_db
- Auto-generated admin password stored in Key Vault
- Storage Account: With ADLS Gen2 support
- Containers: mlflow, llmaven
- Container Apps Environment: For container orchestration
- Managed Identities: For secure Key Vault access
- Container Apps (optional):
- MLflow: Experiment tracking and model registry
- LiteLLM: OpenAI-compatible API gateway
- Log Analytics Workspace: (optional) For monitoring
Development Environment (~$20-30/month):
- PostgreSQL: B_Standard_B1ms
- Storage: Standard LRS
- Container Apps: Consumption plan
Production Environment (~$400-600/month):
- PostgreSQL: GP_Standard_D2s_v3
- Storage: Standard GRS
- Container Apps: Dedicated plan
- High Availability enabled
POST /v1/agentic/retrieve
Hybrid search with multi-vector retrieval (Dense, Sparse, ColBERT).
curl -X POST http://localhost:8000/v1/agentic/retrieve \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"collection": "agentic-rag",
"top_k": 5,
"enable_rerank": true
}'POST /v1/agentic/chat
RAG chat with structured responses and citations.
curl -X POST http://localhost:8000/v1/agentic/chat \
-H "Content-Type: application/json" \
-d '{
"query": "Explain transformers",
"collection": "agentic-rag",
"message_history": []
}'POST /v1/retrieve
Retrieve relevant documents based on a query.
curl -X POST http://localhost:8000/v1/retrieve \
-H "Content-Type: application/json" \
-d '{
"query": "What is the Rubin telescope?",
"embedding_model": "sentence-transformers/all-MiniLM-L12-v2",
"existing_collection": "rubin_telescope",
"existing_qdrant_path": "data/vector_stores/rubin_qdrant"
}'Request Schema:
{
"documents": [],
"query": "string",
"existing_collection": "string",
"existing_qdrant_path": "string",
"embedding_model": "string"
}Response:
{
"docs": [
{
"page_content": "Document text...",
"metadata": {}
}
],
"status_code": 200
}POST /v1/generate
Generate text based on a prompt.
curl -X POST http://localhost:8000/v1/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Based on the context: ... Answer the question: ...",
"generation_model": "allenai/OLMo-2-1124-7B-Instruct"
}'Request Schema:
{
"prompt": "string",
"generation_model": "string"
}Response:
{
"answer": "Generated text...",
"status_code": 200
}Create a .env file in the root directory:
# API Configuration
API_TITLE="LLMaven API"
API_VERSION="0.1.0"
API_CORS_ORIGINS=["*"]
# Frontend Configuration
FRONTEND_API_BASE_URL="http://localhost:8000/v1"
FRONTEND_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L12-v2"
FRONTEND_GENERATION_MODEL="allenai/OLMo-2-1124-7B-Instruct"
FRONTEND_EXISTING_COLLECTION="rubin_telescope"
FRONTEND_EXISTING_QDRANT_PATH="data/vector_stores/rubin_qdrant"
FRONTEND_RETRIEVAL_K=2
# Model Configuration
EMBEDDING_MODEL_NAME="intfloat/multilingual-e5-large-instruct"LLMaven supports any HuggingFace sentence-transformers model:
sentence-transformers/all-MiniLM-L12-v2(default, lightweight)intfloat/multilingual-e5-large-instruct(multilingual)sentence-transformers/all-mpnet-base-v2(higher quality)
Supports HuggingFace Transformers models:
allenai/OLMo-2-1124-7B-Instruct(default, 7B parameters)- Any causal LM from HuggingFace
Quantization Options:
- 8-bit: Good balance of quality and memory
- 4-bit: Lower memory, slightly reduced quality
Models are cached locally in src/llmaven/models/ directory.
llmaven/
├── src/llmaven/ # Main application package
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── config.py # API configuration
│ ├── main.py # FastAPI application
│ ├── core/ # Core RAG components (legacy)
│ │ ├── embeddings/ # Embedding model wrapper
│ │ ├── generator/ # Language model wrapper
│ │ └── retriever/ # Vector retrieval logic
│ ├── agentic/ # Agentic RAG system (NEW)
│ │ ├── agent/ # RAG agent with pydantic-ai
│ │ ├── ingestion/ # Document ingestion pipeline
│ │ ├── search/ # Hybrid search implementation
│ │ ├── vector_store/ # Qdrant manager with Named Vectors
│ │ └── settings.py # Configuration management
│ ├── frontend/ # Streamlit UI
│ │ ├── app.py # Main UI application
│ │ └── config.py # Frontend configuration
│ ├── schemas/ # Pydantic models
│ │ ├── retrieve.py # Retrieval request/response
│ │ └── generate.py # Generation request/response
│ ├── services/ # Business logic
│ │ ├── retrieval_service.py
│ │ └── generation_service.py
│ ├── v1/ # API v1 endpoints
│ │ ├── router.py # Main router
│ │ └── endpoints/ # Endpoint implementations
│ │ ├── retrieve.py
│ │ └── generate.py
│ ├── deployment/ # Deployment utilities
│ │ ├── init.py # Configuration initialization
│ │ ├── validate.py # Configuration validation
│ │ └── deploy.py # Deployment orchestration
│ └── infrastructure/ # Infrastructure as Code
│ ├── main.py # Pulumi program entry point
│ ├── config/ # Configuration schema and loaders
│ ├── resources/ # Azure resource modules
│ └── utils/ # Infrastructure utilities
├── archive/ # Archived code (unused)
│ ├── proxy/ # OpenAI API proxy service (archived)
│ ├── infra/ # Infrastructure as code (archived)
│ └── legacy/ # Legacy Panel application (archived)
├── tests/ # Test suite
│ ├── test_retriever.py
│ └── test_generator.py
├── pixi.toml # Pixi configuration
├── pyproject.toml # Python package configuration
└── README.md # This file
# Run all tests
pixi shell -e llmaven
pytest
# Run specific test file
pytest tests/test_retriever.py
# Run with verbose output
pytest -v
# Run with coverage
pytest --cov=llmavenThe project uses pre-commit hooks for code quality:
# Install pre-commit hooks
pixi shell -e llmaven
pre-commit install
# Run manually
pre-commit run --all-filesConfigured linters:
- flake8 (Python linting)
- prettier (YAML/Markdown formatting)
- codespell (spell checking)
Run the API server with auto-reload for development:
llmaven server serve --env development --reloadChanges to Python files will automatically restart the server.
LLMaven provides a comprehensive command-line interface:
# Ingest documents
llmaven agentic ingest [DIRECTORIES]... [OPTIONS]
Options:
--collection, -c TEXT Collection name (defaults to config)
--force, -f Overwrite existing collection
--batch-size, -b INT Documents per batch (default: 100)
# Search knowledge base
llmaven agentic search <QUERY> [OPTIONS]
Options:
--collection, -c TEXT Collection name (defaults to config)
--top-k, -k INT Number of results (default: 5)
--prefetch-k, -p INT Prefetch candidates per method (default: 20)
--rerank/--no-rerank Enable/disable ColBERT reranking
# Interactive chat
llmaven agentic chat [OPTIONS]
Options:
--collection, -c TEXT Collection name (defaults to config)
--provider TEXT LLM provider (openai, ollama, huggingface)
--model, -m TEXT LLM model identifier# Start API server
llmaven server serve [OPTIONS]
Options:
--host TEXT Host to bind (default: 0.0.0.0)
--port INTEGER Port to bind (default: 8000)
--env TEXT Environment: development|production
--workers INTEGER Number of workers (production only)
--reload Enable auto-reload (development only)
# Start Streamlit UI
llmaven server ui [OPTIONS]
Options:
--host TEXT Host to bind (default: localhost)
--port INTEGER Port to bind (default: 8501)
--browser/--no-browser Open browser automatically# Initialize configuration
llmaven infra init [OPTIONS]
Options:
--environment TEXT Environment (dev, staging, prod)
--output TEXT Output path for configuration file
--interactive Interactive mode with prompts
# Validate configuration
llmaven infra validate [OPTIONS]
Options:
--config TEXT Path to configuration file
--strict Fail on warnings
--skip-secrets Skip secrets validation
--env-file TEXT Path to .env file with secrets
# Deploy infrastructure
llmaven infra deploy [OPTIONS]
Options:
--config TEXT Path to configuration file
--preview Preview changes without deploying
--yes Automatically approve deployment
--env-file TEXT Path to .env file with secrets
# Show deployment status
llmaven infra status [OPTIONS]
# Destroy infrastructure
llmaven infra destroy [OPTIONS]
# Refresh Pulumi state
llmaven infra refresh [OPTIONS]
# Cancel in-progress operation
llmaven infra cancel [OPTIONS]
# Show version
llmaven version-
Port already in use
# Change the port llmaven server serve --port 8001 -
Model download fails
- Check internet connection
- Verify HuggingFace model name
- Check disk space (models can be several GB)
-
Out of memory during generation
- Use 8-bit or 4-bit quantization
- Reduce batch size
- Use a smaller model
-
Vector store not found
- Verify the path in configuration
- Create vector store first using the notebook
- Check file permissions
-
Infrastructure deployment fails
- Ensure Azure CLI is installed and logged in:
az login - Verify subscription ID is correct
- Check that all required secrets are set
- Review quota limits in your Azure subscription
- Ensure Azure CLI is installed and logged in:
Enable debug logging:
# Set log level
export LOG_LEVEL=DEBUG
# Run with verbose output
llmaven server serve --env developmentView API logs at the console or check application logs.
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linters
- Submit a pull request
See CODE_OF_CONDUCT.md for community guidelines.
This project is licensed under the BSD License - see the LICENSE file for details.
- University of Washington Scientific Software Engineering Center (SSEC)
- Built with: FastAPI, Streamlit, LangChain, Qdrant, HuggingFace Transformers, Pulumi
- Inspired by the need for accessible AI tools in scientific research
- AGENTS Guide: AGENTS.md - Comprehensive technical reference for developers and AI assistants
- Tutorials: SSEC Tutorials
- Issues: GitHub Issues
- Deployment Guide: See AGENTS.md for detailed infrastructure deployment instructions
For questions and support:
- Open an issue on GitHub
- Contact: UW SSEC Team
Note: This project is under active development. Features and APIs may change.