Skip to content

ayoisio/variant-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multi-Agent Variant Analysis

Genomic Analysis Platform

Enterprise-grade genomic variant analysis powered by Google Cloud and Gemini

Features β€’ Architecture β€’ Quick Start β€’ Demo β€’ Documentation

Python Next.js GKE License

πŸš€ Overview

A production-ready platform that transforms whole-genome variant analysis from a hours-long manual process into an intelligent, conversational experience. Built with Google's Agent Development Kit (ADK) and deployed on Google Kubernetes Engine (GKE), this system processes millions of variants through a sophisticated multi-agent pipeline.

Key Capabilities

  • πŸ”¬ Comprehensive Analysis: Process 7.8M+ variants from whole-genome VCF files
  • πŸ€– AI-Powered Insights: Natural language interface for complex genomic queries
  • ⚑ Optimized Performance: VEP annotation in ~60 minutes (vs 6+ hours standard)
  • 🌍 Population Context: Integrated gnomAD frequencies across multiple ancestries
  • πŸ“Š Clinical Assessment: Automated pathogenicity evaluation and gene-disease associations
  • πŸ’¬ Conversational Interface: Ask follow-up questions about specific genes instantly

✨ Features

For Clinicians & Researchers

  • Natural Language Processing: Chat with your genomic data like you would with a colleague
  • Background Processing: Submit jobs and return later - analysis continues automatically
  • Instant Queries: Once processed, get answers about specific genes in seconds
  • Population Insights: Compare variants against global population frequencies
  • Clinical Prioritization: Automatic identification of pathogenic variants

For Developers & IT Teams

  • Scalable Architecture: Kubernetes-native design with auto-scaling
  • Multi-Agent System: Modular pipeline with specialized agents for each task
  • Production Ready: HTTPS support, authentication, and monitoring built-in
  • Cost Optimized: Efficient resource usage with on-demand scaling
  • Open Source: Fully customizable and extensible

πŸ—οΈ Architecture

graph TB
    subgraph "Frontend - Next.js"
        UI[React UI]
        Auth[Firebase Auth]
        SSE[SSE Client]
    end
    
    subgraph "Backend - GKE"
        API[FastAPI Server]
        ADK[ADK Agents]
        VEP[VEP Worker]
    end
    
    subgraph "Data & Storage"
        GCS[Cloud Storage]
        BQ[BigQuery/gnomAD]
        FS[Firestore]
    end
    
    UI --> API
    API --> ADK
    ADK --> VEP
    ADK --> BQ
    VEP --> GCS
    API --> FS
Loading

Technology Stack

Frontend (/frontend)

  • Framework: Next.js 14 with App Router
  • UI: React + TypeScript + Tailwind CSS
  • Components: Shadcn/ui component library
  • Auth: Firebase Authentication
  • Real-time: Server-Sent Events (SSE)

Backend (/backend)

  • Framework: FastAPI + Python 3.10
  • AI/ML: Google ADK + Gemini API
  • Genomics: VEP 113 + ClinVar + gnomAD
  • Infrastructure: GKE + Cloud Tasks + Firestore
  • Storage: Google Cloud Storage + BigQuery

🚦 Quick Start

Prerequisites

  • Google Cloud Project with billing enabled
  • gcloud CLI installed and configured
  • Docker installed
  • Node.js 18+ and Python 3.10+

Local Development

  1. Clone the repository

    git clone https://github.com/ayoisio/variant-agents.git
    cd variant-agents
  2. Set up the frontend

    cd frontend
    npm install
    cp .env.example .env.local
    # Configure your Firebase and API settings
    npm run dev
  3. Set up the backend

    cd backend
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    cp .env.example .env
    # Configure your API keys and GCP settings
    python main.py
  4. Access the application

Production Deployment

See backend/README.md for detailed GKE deployment instructions.

🎯 Usage Workflow

1. Start Analysis

// Simply provide a VCF file path in natural language
"Please analyze gs://genomics-data/patient123.vcf"
"Check gs://bucket/sample.vcf for cardiac variants"

2. Background Processing (~60-70 min)

  • VCF parsing and validation
  • VEP annotation with consequence prediction
  • gnomAD population frequency queries
  • ClinVar pathogenicity assessment

3. Get Results

// Ask for your report when ready
"Is my analysis complete? Please provide the report."

4. Interactive Queries

// Ask specific questions instantly
"Were any pathogenic variants found in the BRCA1 gene?"
"Show me all variants with AF < 0.01"
"List cardiac-related findings"

πŸ“Š Performance Metrics

Operation Time Throughput
VCF Parsing ~30 sec 7.8M variants
VEP Annotation ~60 min 130K variants/min
gnomAD Query ~30 sec 10K variants
Clinical Assessment ~2 min 2K pathogenic variants
Gene Query <5 sec Instant

πŸ”’ Security & Compliance

  • Authentication: Firebase Authentication with JWT tokens
  • Authorization: Role-based access control (RBAC)
  • Data Encryption: TLS 1.3 in transit, AES-256 at rest
  • Audit Logging: Comprehensive activity tracking
  • HIPAA Ready: Architecture supports HIPAA compliance requirements

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“§ Contact

For questions, issues, or collaboration opportunities:


Built with ❀️ for the genomics community

Releases

No releases published

Packages

No packages published