Skip to content

Eldergenix/DeDNA-

Repository files navigation

Genome Copilot 🧬

AI-Interactive Genomic Visualization + Variant Risk Screening (Research-Grade / Informational)

Nexis AI Banner

Genome Copilot is a privacy-first genomic application that lets users upload their DNA/genetic data, explore an interactive genome viewer, and chat with an AI copilot that explains variants and jumps to the exact genomic region being discussed.

Important Disclaimer Genome Copilot is not a medical device and does not provide medical diagnosis, treatment recommendations, or clinical decision support. Outputs are informational / research-grade and should be reviewed with qualified clinicians and confirmatory testing when relevant.


Key Capabilities

✅ Upload & Secure Storage

  • Upload VCF (baseline required for robust analysis)
  • Upload 23andMe / Ancestry raw genotype TXT/CSV (auto-converted to internal variant format)
  • File format detection, integrity hashing, schema validation
  • Encrypted-at-rest storage, versioning per upload
  • Data deletion workflow (hard delete + retention policy)

✅ Variant Analysis & Screening (V1)

  • Normalize variant records: chrom, pos, ref, alt
  • Annotate variants with:
    • gene(s)
    • transcript impact
    • predicted consequence (missense, nonsense, frameshift, splice, etc.)
    • population frequency (when available)
    • known clinical assertions (when available)
  • Classification buckets aligned with common 5-tier terminology:
    • Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign
  • Focused screening tracks:
    • Rare disease high-impact Mendelian-associated variants
    • Hereditary cancer risk indicators (configurable gene panels)

✅ AI Copilot (Grounded, Safety-Aligned)

Ask questions like:

  • “Do I have any cancer-related variants?”
  • “What does this BRCA1 variant mean?”
  • “Why was this flagged?”

AI responses are structured:

  • Summary
  • What we found (variants from your dataset)
  • Evidence (why flagged)
  • Confidence + uncertainty
  • Suggested next steps (educational + “consider confirmatory testing”)

✅ AI → Genome Navigation

When the AI references a variant, it renders:

  • “View in Genome” (button or chip)
  • Clicking navigates the viewer to:
    • chromosome + coordinate
    • optional region window (±50–200 bp)
    • highlighted variant track

✅ Embedded Genome Visualization

  • Interactive genome viewer (recommended: igv.js)
  • Features:
    • coordinate browsing
    • gene tracks + variant overlay
    • click-to-inspect variants
    • jump-to coordinate search
    • highlight + focus

✅ Reporting & Export (V1 → V1.5)

  • Rare Disease Screening Summary
  • Hereditary/Cancer Indicator Summary
  • Filterable “All Variants” view
  • Export:
    • PDF summaries (planned)
    • JSON results (advanced users)

Screens & UX

Workspace Layout (Core)

  • Left: AI Copilot chat
  • Center: Genome viewer (IGV-style)
  • Right: Findings panel (filters + search)

Variant Detail Drawer

  • gene
  • coordinate
  • consequence
  • classification tier (P/LP/VUS/LB/B)
  • evidence blocks
  • actions:
    • “View in Genome”
    • “Ask AI about this”

Critical Interaction Model

  1. Chat references variant(s)
  2. Variants appear as clickable chips
  3. Clicking a chip:
    • navigates IGV
    • opens detail drawer
    • optionally pins evidence context

Architecture (Recommended)

This repo can be implemented as a modern web app with async genomics processing.

Frontend

  • Next.js (TypeScript)
  • igv.js embedded viewer
  • Chat UI with tool-calling for navigateTo(chrom, pos, windowBp)
  • Virtualized lists for large variant tables

Backend

  • API service for uploads, jobs, results, chat orchestration
  • Async job queue for analysis (VCF/WES/WGS can take minutes)
  • Evidence + annotation resolvers

Data Stores

  • Object storage for uploads (encrypted)
  • Relational DB for entities + indexed variant querying
  • Audit logs for sensitive access

Data Model (High-Level)

Core entities:

  • User
  • GenomeUpload
  • Variant
  • Annotation
  • Finding
  • ChatSession
  • ChatMessage
  • AuditLog

Example variant fields:

  • chrom, pos, ref, alt
  • genes[]
  • consequence, transcript
  • zygosity (if inferable)
  • classification_tier (P/LP/VUS/LB/B)
  • flags (rare_disease, hereditary_cancer, high_impact)

API Surface (Planned)

Typical endpoints for V1:

Upload

  • POST /api/uploads
  • GET /api/uploads/:id/status

Analysis Jobs

  • POST /api/analysis/:uploadId/run
  • GET /api/analysis/:uploadId/status

Variants & Findings

  • GET /api/variants?uploadId=...&gene=...&tier=...
  • GET /api/findings?uploadId=...&category=rare_disease|hereditary_cancer

Chat

  • POST /api/chat
    • grounded on: user dataset + panels + conversation context
    • returns structured answer + variant chips + navigation actions

Navigation Contract (Frontend)

Expose a single navigation API to unify chat + UI interactions:

/**
 * Navigate genome viewer to a location and highlight the target locus.
 */
function navigateTo(chrom: string, pos: number, windowBp = 100): void

Chat messages should render variant chips that call navigateTo(...).



Security & Privacy

Genome Copilot is designed to be privacy-first:
		Encryption in transit (TLS) and at rest
		Access control per user workspace
		Audit logs for data access
		Clear deletion workflow (“Delete my data”)
		Minimal retention + automated purging policy



Performance & Reliability Targets
		Upload validation: immediate feedback
		Processing time expectations:
		Genotyping TXT: seconds  minutes
		VCF (WES/WGS): minutes (async)
		Responsive viewer + paginated/virtualized findings table
		Retries for annotation failures
		Health checks + metrics + job observability



AI Safety & Hallucination Controls

The copilot must:
		Never invent variants not present in the uploaded dataset
		If asked about a variant not found  respond: “Not found in uploaded data”
		Avoid medical diagnosis / certainty beyond evidence
		Treat VUS as uncertain with explicit uncertainty language
		Recommend confirmatory clinical testing for high-risk findings



QA & Acceptance Criteria

Upload & Processing
		 Upload accepts VCF under size limits and validates header
		 Genotyping TXT converts successfully (supported vendors)
		 Analysis completes and produces findings list

Viewer Navigation
		 “View in Genome” navigates to correct chrom/pos
		 Variant highlight visible and consistent

AI Safety
		 No diagnosis or overstated certainty
		 No invented findings
		 VUS responses include uncertainty language

Reporting
		 Summaries include disclaimers + top findings
		 PDF export includes selected findings (Phase 2)



Roadmap

Phase 1 (MVP)
		Upload VCF + genotype TXT
		Basic annotation + findings list
		Grounded chat copilot
		igv.js integration + jump-to navigation

Phase 2
		Improved evidence blocks + more sources
		Filtering (panels, severity, gene search)
		PDF export
		Additional tracks (gene models, reference annotations)

Phase 3
		Optional BAM/CRAM support
		Advanced disease panels
		Family history guided intake (non-diagnostic)
		Optional true 3D genome visualization module



Risks & Mitigations

Risk: Over-medicalization / user panic
Mitigation: Strong disclaimers, careful language, non-alarming UX, confirmatory testing prompts

Risk: Misclassification or inconsistent interpretations
Mitigation: Informational labeling, transparent evidence, reanalysis + reclassification support

Risk: Data privacy concerns
Mitigation: Encryption, deletion tools, user control, minimal retention



Contributing

Contributions are welcome, especially in:
		VCF normalization & parsing improvements
		Annotation pipeline integrations
		Evidence transparency UX
		IGV track configuration + performance
		Safe structured output enforcement for AI copilot



License

MIT License

Releases

No releases published

Packages

No packages published