Enterprise-grade genomic variant analysis powered by Google Cloud and Gemini
Features β’ Architecture β’ Quick Start β’ Demo β’ Documentation
A production-ready platform that transforms whole-genome variant analysis from a hours-long manual process into an intelligent, conversational experience. Built with Google's Agent Development Kit (ADK) and deployed on Google Kubernetes Engine (GKE), this system processes millions of variants through a sophisticated multi-agent pipeline.
- π¬ Comprehensive Analysis: Process 7.8M+ variants from whole-genome VCF files
- π€ AI-Powered Insights: Natural language interface for complex genomic queries
- β‘ Optimized Performance: VEP annotation in ~60 minutes (vs 6+ hours standard)
- π Population Context: Integrated gnomAD frequencies across multiple ancestries
- π Clinical Assessment: Automated pathogenicity evaluation and gene-disease associations
- π¬ Conversational Interface: Ask follow-up questions about specific genes instantly
- Natural Language Processing: Chat with your genomic data like you would with a colleague
- Background Processing: Submit jobs and return later - analysis continues automatically
- Instant Queries: Once processed, get answers about specific genes in seconds
- Population Insights: Compare variants against global population frequencies
- Clinical Prioritization: Automatic identification of pathogenic variants
- Scalable Architecture: Kubernetes-native design with auto-scaling
- Multi-Agent System: Modular pipeline with specialized agents for each task
- Production Ready: HTTPS support, authentication, and monitoring built-in
- Cost Optimized: Efficient resource usage with on-demand scaling
- Open Source: Fully customizable and extensible
graph TB
subgraph "Frontend - Next.js"
UI[React UI]
Auth[Firebase Auth]
SSE[SSE Client]
end
subgraph "Backend - GKE"
API[FastAPI Server]
ADK[ADK Agents]
VEP[VEP Worker]
end
subgraph "Data & Storage"
GCS[Cloud Storage]
BQ[BigQuery/gnomAD]
FS[Firestore]
end
UI --> API
API --> ADK
ADK --> VEP
ADK --> BQ
VEP --> GCS
API --> FS
- Framework: Next.js 14 with App Router
- UI: React + TypeScript + Tailwind CSS
- Components: Shadcn/ui component library
- Auth: Firebase Authentication
- Real-time: Server-Sent Events (SSE)
- Framework: FastAPI + Python 3.10
- AI/ML: Google ADK + Gemini API
- Genomics: VEP 113 + ClinVar + gnomAD
- Infrastructure: GKE + Cloud Tasks + Firestore
- Storage: Google Cloud Storage + BigQuery
- Google Cloud Project with billing enabled
gcloudCLI installed and configured- Docker installed
- Node.js 18+ and Python 3.10+
-
Clone the repository
git clone https://github.com/ayoisio/variant-agents.git cd variant-agents -
Set up the frontend
cd frontend npm install cp .env.example .env.local # Configure your Firebase and API settings npm run dev
-
Set up the backend
cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt cp .env.example .env # Configure your API keys and GCP settings python main.py
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8080
See backend/README.md for detailed GKE deployment instructions.
// Simply provide a VCF file path in natural language
"Please analyze gs://genomics-data/patient123.vcf"
"Check gs://bucket/sample.vcf for cardiac variants"- VCF parsing and validation
- VEP annotation with consequence prediction
- gnomAD population frequency queries
- ClinVar pathogenicity assessment
// Ask for your report when ready
"Is my analysis complete? Please provide the report."// Ask specific questions instantly
"Were any pathogenic variants found in the BRCA1 gene?"
"Show me all variants with AF < 0.01"
"List cardiac-related findings"| Operation | Time | Throughput |
|---|---|---|
| VCF Parsing | ~30 sec | 7.8M variants |
| VEP Annotation | ~60 min | 130K variants/min |
| gnomAD Query | ~30 sec | 10K variants |
| Clinical Assessment | ~2 min | 2K pathogenic variants |
| Gene Query | <5 sec | Instant |
- Authentication: Firebase Authentication with JWT tokens
- Authorization: Role-based access control (RBAC)
- Data Encryption: TLS 1.3 in transit, AES-256 at rest
- Audit Logging: Comprehensive activity tracking
- HIPAA Ready: Architecture supports HIPAA compliance requirements
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Google Agent Development Kit for the multi-agent framework
- Ensembl VEP for variant annotation
- gnomAD for population frequencies
- ClinVar for clinical significance
For questions, issues, or collaboration opportunities:
- Open an Issue
- Email: ayoad@google.com
Built with β€οΈ for the genomics community