LLMaven

An AI-powered tool library for scientific research using Retrieval Augmented Generation (RAG) with Large Language Models (LLMs). LLMaven provides open, transparent, and useful AI-based software for scientific discovery by leveraging publicly available diverse datasets and disparate academic knowledge bases.

Overview

LLMaven's scientific goal is to create accessible AI tools for researchers who need to work with private/IP-sensitive data in a cost-effective manner. The project uses RAG-based LLMs to extend language models with domain-specific knowledge without requiring expensive model training or specialized hardware for individual researchers.

Key Features

FastAPI Backend: RESTful API with retrieval and generation endpoints
Streamlit Frontend: Interactive chat interface for document Q&A
RAG Architecture: Combines retrieval from vector databases with language model generation
Vector Database: Qdrant-based document storage with semantic search (MMR - Maximal Marginal Relevance)
Agentic RAG System (NEW): Next-generation hybrid search with multi-vector embeddings (Dense + Sparse + ColBERT) and intelligent agent-based Q&A
Flexible Models:
- Embedding models via HuggingFace (default: sentence-transformers/all-MiniLM-L12-v2)
- Generation models via HuggingFace Transformers (default: allenai/OLMo-2-1124-7B-Instruct)
- Quantization support (4-bit/8-bit) for efficient inference
Infrastructure Deployment: Production-ready Azure infrastructure deployment with Pulumi
- PostgreSQL Flexible Server for persistent storage
- Azure Key Vault for secret management
- Container Apps for MLflow and LiteLLM
- Comprehensive validation and cost estimation

Architecture

LLMaven consists of several components that work together:

graph TB
    subgraph Main[LLMaven Architecture]
        UI["Streamlit UI<br/>Port 8501"] --> Backend["FastAPI Backend<br/>Port 8000"]

        Backend --> EndpointsGroup

        subgraph EndpointsGroup[API Endpoints]
            Retrieve["Retrieve<br/>/v1/retrieve"]
            Generate["Generate<br/>/v1/generate"]
        end

        Retrieve --> RetService["Retrieval Service"]
        Generate --> GenService["Generation Service"]

        RetService --> Qdrant["Qdrant Vector Store<br/>Embeddings"]
        GenService --> HF["HuggingFace Transformers<br/>LLM"]
    end

    subgraph Infrastructure[Azure Infrastructure]
        KeyVault["Key Vault<br/>Secrets Management"]
        PostgreSQL["PostgreSQL<br/>Database"]
        Storage["Storage Account<br/>Blob Storage"]
        ContainerApps["Container Apps<br/>MLflow & LiteLLM"]
    end

    style Main fill:#f9f9f9,stroke:#333,stroke-width:2px
    style UI fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style Backend fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style EndpointsGroup fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style RetService fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style GenService fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style Qdrant fill:#fff9c4,stroke:#f9a825,stroke-width:2px
    style HF fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    style Infrastructure fill:#e1f5ff,stroke:#01579b,stroke-width:2px

Core Components

FastAPI Backend (/src/llmaven/main.py)
- RESTful API with automatic OpenAPI documentation
- CORS middleware for frontend integration
- Error handling and validation
- Health check endpoints
Streamlit Frontend (/src/llmaven/frontend/app.py)
- Interactive chat interface
- Document upload (PDF support)
- Real-time retrieval and generation
- Chat history display
Retrieval Service (/src/llmaven/services/retrieval_service.py)
- Document embedding and vector storage
- Semantic search using Qdrant
- MMR (Maximal Marginal Relevance) retrieval
- Support for temporary and persistent collections
Generation Service (/src/llmaven/services/generation_service.py)
- Language model inference with caching
- Quantized model loading (4-bit/8-bit)
- HuggingFace Pipeline integration
- Configurable generation parameters
Infrastructure Management (/src/llmaven/infrastructure/)
- YAML-based configuration schema
- Pulumi-based Azure resource provisioning
- Secret management via Azure Key Vault
- Comprehensive validation and deployment workflow

Installation

Prerequisites

Python 3.12+ (for LLMaven API)
Python 3.11+ (for other components)
Pixi package manager
Azure CLI (for infrastructure deployment)

Quick Start

Install Pixi:

curl -fsSL https://pixi.sh/install.sh | bash

Clone the repository:

git clone https://github.com/uw-ssec/llmaven.git
cd llmaven

Install dependencies:
```
pixi install
```

Environment Setup

LLMaven uses multiple Pixi environments for different components:

llmaven: Main API environment (Python 3.12)
frontend: Streamlit UI environment
proxy: OpenAI proxy service environment (archived)
infra: Infrastructure management environment

Usage

Option 1: Agentic RAG (Recommended for New Projects)

The agentic RAG system provides superior retrieval accuracy with hybrid search and intelligent agent-based Q&A.

Quick Start

Start Qdrant (required for vector storage):

docker run -p 6333:6333 qdrant/qdrant

Ingest Documents:

# Enter the llmaven environment
pixi shell -e llmaven

# Ingest documents from a directory
llmaven agentic ingest ./docs

# Or ingest from multiple directories
llmaven agentic ingest ./docs ./papers --collection research-docs

Search the Knowledge Base:

# Hybrid search with reranking
llmaven agentic search "What is machine learning?"

# Search without reranking (faster)
llmaven agentic search "vector embeddings" --no-rerank

# Custom top-k results
llmaven agentic search "transformer architecture" --top-k 10

Interactive Chat:

# Start interactive RAG chat
llmaven agentic chat

# Use custom collection or LLM provider
llmaven agentic chat --collection my-docs --provider ollama --model llama2

Configuration

Set environment variables with AGENTIC_ prefix:

# Qdrant configuration
export AGENTIC_QDRANT_URL=http://localhost:6333
export AGENTIC_COLLECTION_NAME=agentic-rag

# LLM configuration
export AGENTIC_LLM_PROVIDER=openai
export AGENTIC_LLM_MODEL=gpt-4o-mini

# Search configuration
export AGENTIC_ENABLE_RERANK=true
export AGENTIC_PREFETCH_TOP_K=20
export AGENTIC_FINAL_TOP_K=5

See AGENTS.md for complete configuration options and architecture details.

Option 2: Local Development (API and UI)

The primary way to use LLMaven is through its FastAPI backend and Streamlit frontend.

1. Start the API Server

# Using the pixi environment (recommended)
pixi shell -e llmaven

# Start in development mode with auto-reload
llmaven server serve --env development --reload

# Or start in production mode with multiple workers
llmaven server serve --env production --workers 4

The API will be available at:

API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Alternative Docs: http://localhost:8000/redoc

2. Start the Streamlit UI

In a separate terminal:

# Using the pixi environment (recommended)
pixi shell -e llmaven

# Launch the UI
llmaven server ui

# Or customize host and port
llmaven server ui --host 0.0.0.0 --port 8080 --no-browser

The UI will open automatically in your browser at http://localhost:8501

3. Using the UI

Upload Documents: Use the file uploader to add PDF documents
Ask Questions: Type your question in the chat input
View Results: See retrieved document chunks and AI-generated answers
Chat History: All interactions are stored in the session

Option 2: Azure Infrastructure Deployment

LLMaven provides a comprehensive infrastructure deployment system for production workloads.

1. Initialize Configuration

# Enter the llmaven environment
pixi shell -e llmaven

# Initialize deployment configuration
llmaven infra init --environment dev

# This creates llmaven-config.yaml with sensible defaults

2. Configure Infrastructure

Edit the generated llmaven-config.yaml file:

project:
  name: llmaven
  environment: dev
  location: eastus

azure:
  subscription_id: "your-subscription-id"
  # tenant_id is auto-detected

database:
  admin_login: llmaven_admin
  sku_name: "B_Standard_B1ms"
  databases: [llmaven, mlflow_db, litellm_db]

mlflow:
  enabled: true
  image: "ghcr.io/mlflow/mlflow:latest"

litellm:
  enabled: true
  image: "ghcr.io/berriai/litellm:latest"

3. Set Secrets

Secrets are provided via environment variables:

# Generate a secure master key
export LLMAVEN_SECRETS_LITELLM_MASTER_KEY="$(openssl rand -base64 32)"

# Add your API keys
export LLMAVEN_SECRETS_AZURE_OPENAI_API_KEY="your-azure-openai-key"
export LLMAVEN_SECRETS_ANTHROPIC_API_KEY="your-anthropic-key"

# Or create a .env file
cat > .env.secrets <<EOF
LLMAVEN_SECRETS_LITELLM_MASTER_KEY=your-master-key
LLMAVEN_SECRETS_AZURE_OPENAI_API_KEY=your-azure-openai-key
LLMAVEN_SECRETS_ANTHROPIC_API_KEY=your-anthropic-key
EOF

4. Validate Configuration

# Validate with strict mode (recommended for production)
llmaven infra validate --config llmaven-config.yaml --strict

# Or validate with secrets from .env file
llmaven infra validate --env-file .env.secrets --strict

This validates:

Configuration syntax and schema
Azure subscription and permissions
Resource quotas and limits
Secret presence
Cost estimation

5. Deploy Infrastructure

# Preview what will be deployed
llmaven infra deploy --preview

# Deploy infrastructure
llmaven infra deploy --yes

# Or deploy with .env file
llmaven infra deploy --env-file .env.secrets --yes

6. Check Deployment Status

# View deployment status and resource URLs
llmaven infra status

# Outputs include:
# - MLflow URL: https://llmaven-dev-mlflow.{region}.azurecontainerapps.io
# - LiteLLM URL: https://llmaven-dev-litellm.{region}.azurecontainerapps.io
# - Resource names (Key Vault, Storage Account, PostgreSQL, etc.)

7. Destroy Infrastructure (when done)

# Destroy all resources
llmaven infra destroy --yes

Deployed Azure Resources

When you deploy infrastructure, LLMaven creates:

Resource Group: Container for all resources
Virtual Network: With subnets for Container Apps and PostgreSQL
Key Vault: Centralized secret management with RBAC
PostgreSQL Flexible Server: Managed database with:
- Databases: llmaven, mlflow_db, litellm_db
- Auto-generated admin password stored in Key Vault
Storage Account: With ADLS Gen2 support
- Containers: mlflow, llmaven
Container Apps Environment: For container orchestration
Managed Identities: For secure Key Vault access
Container Apps (optional):
- MLflow: Experiment tracking and model registry
- LiteLLM: OpenAI-compatible API gateway
Log Analytics Workspace: (optional) For monitoring

Cost Estimation

Development Environment (~$20-30/month):

PostgreSQL: B_Standard_B1ms
Storage: Standard LRS
Container Apps: Consumption plan

Production Environment (~$400-600/month):

PostgreSQL: GP_Standard_D2s_v3
Storage: Standard GRS
Container Apps: Dedicated plan
High Availability enabled

API Endpoints

Agentic RAG Endpoints (NEW)

POST /v1/agentic/retrieve

Hybrid search with multi-vector retrieval (Dense, Sparse, ColBERT).

curl -X POST http://localhost:8000/v1/agentic/retrieve \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "collection": "agentic-rag",
    "top_k": 5,
    "enable_rerank": true
  }'

POST /v1/agentic/chat

RAG chat with structured responses and citations.

curl -X POST http://localhost:8000/v1/agentic/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain transformers",
    "collection": "agentic-rag",
    "message_history": []
  }'

Legacy Endpoints

POST /v1/retrieve

Retrieve relevant documents based on a query.

curl -X POST http://localhost:8000/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the Rubin telescope?",
    "embedding_model": "sentence-transformers/all-MiniLM-L12-v2",
    "existing_collection": "rubin_telescope",
    "existing_qdrant_path": "data/vector_stores/rubin_qdrant"
  }'

Request Schema:

{
  "documents": [],
  "query": "string",
  "existing_collection": "string",
  "existing_qdrant_path": "string",
  "embedding_model": "string"
}

Response:

{
  "docs": [
    {
      "page_content": "Document text...",
      "metadata": {}
    }
  ],
  "status_code": 200
}

Generation Endpoint

POST /v1/generate

Generate text based on a prompt.

curl -X POST http://localhost:8000/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Based on the context: ... Answer the question: ...",
    "generation_model": "allenai/OLMo-2-1124-7B-Instruct"
  }'

Request Schema:

{
  "prompt": "string",
  "generation_model": "string"
}

Response:

{
  "answer": "Generated text...",
  "status_code": 200
}

Configuration

Environment Variables

Create a .env file in the root directory:

# API Configuration
API_TITLE="LLMaven API"
API_VERSION="0.1.0"
API_CORS_ORIGINS=["*"]

# Frontend Configuration
FRONTEND_API_BASE_URL="http://localhost:8000/v1"
FRONTEND_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L12-v2"
FRONTEND_GENERATION_MODEL="allenai/OLMo-2-1124-7B-Instruct"
FRONTEND_EXISTING_COLLECTION="rubin_telescope"
FRONTEND_EXISTING_QDRANT_PATH="data/vector_stores/rubin_qdrant"
FRONTEND_RETRIEVAL_K=2

# Model Configuration
EMBEDDING_MODEL_NAME="intfloat/multilingual-e5-large-instruct"

Model Configuration

Embedding Models

LLMaven supports any HuggingFace sentence-transformers model:

sentence-transformers/all-MiniLM-L12-v2 (default, lightweight)
intfloat/multilingual-e5-large-instruct (multilingual)
sentence-transformers/all-mpnet-base-v2 (higher quality)

Generation Models

Supports HuggingFace Transformers models:

allenai/OLMo-2-1124-7B-Instruct (default, 7B parameters)
Any causal LM from HuggingFace

Quantization Options:

8-bit: Good balance of quality and memory
4-bit: Lower memory, slightly reduced quality

Models are cached locally in src/llmaven/models/ directory.

Project Structure

llmaven/
├── src/llmaven/              # Main application package
│   ├── __init__.py
│   ├── cli.py                # Command-line interface
│   ├── config.py             # API configuration
│   ├── main.py               # FastAPI application
│   ├── core/                 # Core RAG components (legacy)
│   │   ├── embeddings/       # Embedding model wrapper
│   │   ├── generator/        # Language model wrapper
│   │   └── retriever/        # Vector retrieval logic
│   ├── agentic/              # Agentic RAG system (NEW)
│   │   ├── agent/            # RAG agent with pydantic-ai
│   │   ├── ingestion/        # Document ingestion pipeline
│   │   ├── search/           # Hybrid search implementation
│   │   ├── vector_store/     # Qdrant manager with Named Vectors
│   │   └── settings.py       # Configuration management
│   ├── frontend/             # Streamlit UI
│   │   ├── app.py            # Main UI application
│   │   └── config.py         # Frontend configuration
│   ├── schemas/              # Pydantic models
│   │   ├── retrieve.py       # Retrieval request/response
│   │   └── generate.py       # Generation request/response
│   ├── services/             # Business logic
│   │   ├── retrieval_service.py
│   │   └── generation_service.py
│   ├── v1/                   # API v1 endpoints
│   │   ├── router.py         # Main router
│   │   └── endpoints/        # Endpoint implementations
│   │       ├── retrieve.py
│   │       └── generate.py
│   ├── deployment/           # Deployment utilities
│   │   ├── init.py           # Configuration initialization
│   │   ├── validate.py       # Configuration validation
│   │   └── deploy.py         # Deployment orchestration
│   └── infrastructure/       # Infrastructure as Code
│       ├── main.py           # Pulumi program entry point
│       ├── config/           # Configuration schema and loaders
│       ├── resources/        # Azure resource modules
│       └── utils/            # Infrastructure utilities
├── archive/                  # Archived code (unused)
│   ├── proxy/                # OpenAI API proxy service (archived)
│   ├── infra/                # Infrastructure as code (archived)
│   └── legacy/               # Legacy Panel application (archived)
├── tests/                    # Test suite
│   ├── test_retriever.py
│   └── test_generator.py
├── pixi.toml                 # Pixi configuration
├── pyproject.toml            # Python package configuration
└── README.md                 # This file

Development

Running Tests

# Run all tests
pixi shell -e llmaven
pytest

# Run specific test file
pytest tests/test_retriever.py

# Run with verbose output
pytest -v

# Run with coverage
pytest --cov=llmaven

Code Quality

The project uses pre-commit hooks for code quality:

# Install pre-commit hooks
pixi shell -e llmaven
pre-commit install

# Run manually
pre-commit run --all-files

Configured linters:

flake8 (Python linting)
prettier (YAML/Markdown formatting)
codespell (spell checking)

Development Mode

Run the API server with auto-reload for development:

llmaven server serve --env development --reload

Changes to Python files will automatically restart the server.

CLI Reference

LLMaven provides a comprehensive command-line interface:

Agentic RAG Commands

# Ingest documents
llmaven agentic ingest [DIRECTORIES]... [OPTIONS]

Options:
  --collection, -c TEXT    Collection name (defaults to config)
  --force, -f              Overwrite existing collection
  --batch-size, -b INT     Documents per batch (default: 100)

# Search knowledge base
llmaven agentic search <QUERY> [OPTIONS]

Options:
  --collection, -c TEXT    Collection name (defaults to config)
  --top-k, -k INT          Number of results (default: 5)
  --prefetch-k, -p INT     Prefetch candidates per method (default: 20)
  --rerank/--no-rerank     Enable/disable ColBERT reranking

# Interactive chat
llmaven agentic chat [OPTIONS]

Options:
  --collection, -c TEXT    Collection name (defaults to config)
  --provider TEXT          LLM provider (openai, ollama, huggingface)
  --model, -m TEXT         LLM model identifier

Server Commands

# Start API server
llmaven server serve [OPTIONS]

Options:
  --host TEXT         Host to bind (default: 0.0.0.0)
  --port INTEGER      Port to bind (default: 8000)
  --env TEXT          Environment: development|production
  --workers INTEGER   Number of workers (production only)
  --reload            Enable auto-reload (development only)

# Start Streamlit UI
llmaven server ui [OPTIONS]

Options:
  --host TEXT         Host to bind (default: localhost)
  --port INTEGER      Port to bind (default: 8501)
  --browser/--no-browser  Open browser automatically

Infrastructure Commands

# Initialize configuration
llmaven infra init [OPTIONS]

Options:
  --environment TEXT  Environment (dev, staging, prod)
  --output TEXT       Output path for configuration file
  --interactive       Interactive mode with prompts

# Validate configuration
llmaven infra validate [OPTIONS]

Options:
  --config TEXT       Path to configuration file
  --strict            Fail on warnings
  --skip-secrets      Skip secrets validation
  --env-file TEXT     Path to .env file with secrets

# Deploy infrastructure
llmaven infra deploy [OPTIONS]

Options:
  --config TEXT       Path to configuration file
  --preview           Preview changes without deploying
  --yes               Automatically approve deployment
  --env-file TEXT     Path to .env file with secrets

# Show deployment status
llmaven infra status [OPTIONS]

# Destroy infrastructure
llmaven infra destroy [OPTIONS]

# Refresh Pulumi state
llmaven infra refresh [OPTIONS]

# Cancel in-progress operation
llmaven infra cancel [OPTIONS]

# Show version
llmaven version

Troubleshooting

Common Issues

Port already in use

# Change the port
llmaven server serve --port 8001

Model download fails
- Check internet connection
- Verify HuggingFace model name
- Check disk space (models can be several GB)
Out of memory during generation
- Use 8-bit or 4-bit quantization
- Reduce batch size
- Use a smaller model
Vector store not found
- Verify the path in configuration
- Create vector store first using the notebook
- Check file permissions
Infrastructure deployment fails
- Ensure Azure CLI is installed and logged in: az login
- Verify subscription ID is correct
- Check that all required secrets are set
- Review quota limits in your Azure subscription

Debugging

Enable debug logging:

# Set log level
export LOG_LEVEL=DEBUG

# Run with verbose output
llmaven server serve --env development

View API logs at the console or check application logs.

Contributing

Contributions are welcome! Please follow these guidelines:

Fork the repository
Create a feature branch
Make your changes
Run tests and linters
Submit a pull request

See CODE_OF_CONDUCT.md for community guidelines.

License

This project is licensed under the BSD License - see the LICENSE file for details.

Acknowledgments

University of Washington Scientific Software Engineering Center (SSEC)
Built with: FastAPI, Streamlit, LangChain, Qdrant, HuggingFace Transformers, Pulumi
Inspired by the need for accessible AI tools in scientific research

Additional Resources

AGENTS Guide: AGENTS.md - Comprehensive technical reference for developers and AI assistants
Tutorials: SSEC Tutorials
Issues: GitHub Issues
Deployment Guide: See AGENTS.md for detailed infrastructure deployment instructions

Contact

For questions and support:

Open an issue on GitHub
Contact: UW SSEC Team

Note: This project is under active development. Features and APIs may change.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.devcontainer		.devcontainer
.github		.github
agent_docs		agent_docs
archive		archive
docker		docker
docs/development-history/agentic-rag		docs/development-history/agentic-rag
eval		eval
src/llmaven		src/llmaven
tests		tests
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

License

uw-ssec/llmaven

Folders and files

Latest commit

History

Repository files navigation

LLMaven

Overview

Key Features

Architecture

Core Components

Installation

Prerequisites

Quick Start

Environment Setup

Usage

Option 1: Agentic RAG (Recommended for New Projects)

Quick Start

Configuration

Option 2: Local Development (API and UI)

1. Start the API Server

2. Start the Streamlit UI

3. Using the UI

Option 2: Azure Infrastructure Deployment

1. Initialize Configuration

2. Configure Infrastructure

3. Set Secrets

4. Validate Configuration

5. Deploy Infrastructure

6. Check Deployment Status

7. Destroy Infrastructure (when done)

Deployed Azure Resources

Cost Estimation

API Endpoints

Agentic RAG Endpoints (NEW)

Legacy Endpoints

Generation Endpoint

Configuration

Environment Variables

Model Configuration

Embedding Models

Generation Models

Project Structure

Development

Running Tests

Code Quality

Development Mode

CLI Reference

Agentic RAG Commands

Server Commands

Infrastructure Commands

Troubleshooting

Common Issues

Debugging

Contributing

License

Acknowledgments

Additional Resources

Contact

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages