A user-friendly, interactive web application for Copy Number Variation (CNV) analysis from single-cell RNA sequencing data using CopyKAT.
The easiest way to run the application is using Docker:
# Build and start all services
docker compose up --build
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000See the Docker Deployment section below for detailed instructions.
Getting Started? Start here: docs/project-docs/instructions.md
Frontend Development? Read: frontend/frontend.md
Backend Development? Read: backend/backend.md
This project provides an intuitive web application for analyzing copy number variations (CNVs) from single-cell RNA-seq data. It helps researchers distinguish malignant cells from non-malignant cells and visualize genomic instability in cancer datasets.
- Frontend: React + TypeScript + Tailwind CSS (modern web UI)
- Backend API: FastAPI (Python REST API)
- Analysis Engine: R with CopyKAT package
- Integration: Python subprocess bridge connecting API to R scripts
- Architecture: Decoupled 3-tier architecture (Frontend ↔ API ↔ R Engine)
Fa25-Project4-CNV-Cancer-RNAseq-analysis/
├── docs/
│ └── project-docs/
│ ├── instructions.md # Main onboarding guide
│ ├── prd.md # Product requirements
│ └── API_DOCS.md # API documentation
│
├── frontend/ # React + TypeScript UI
│ ├── src/
│ │ ├── api/ # API client utilities
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── pages/ # Page components
│ │ ├── utils/ # Utility functions
│ │ └── types/ # TypeScript types
│ ├── package.json # Node.js dependencies
│ └── vite.config.ts # Vite configuration
│
├── frontend_legacy/ # Legacy Streamlit app (backup)
│
├── backend/ # R analysis engine
│ ├── backend.md # Backend development guide
│ ├── data/ # Data files
│ ├── results/ # Analysis outputs
│ ├── r_scripts/ # R analysis scripts
│ │ ├── example_complete_workflow.R # WORKING EXAMPLE
│ │ ├── copykat_analysis.R # Main script (TODO)
│ │ ├── copykat_utils.R # Utilities (TODO)
│ │ ├── data_preprocessing.R # Preprocessing (TODO)
│ │ └── copykat_report.Rmd # Report template (TODO)
│ ├── api/ # Python-R bridge & FastAPI
│ │ ├── routes.py # FastAPI endpoints
│ │ ├── models.py # Pydantic models
│ │ ├── services.py # Business logic layer
│ │ ├── r_executor.py # Execute R scripts
│ │ ├── result_parser.py # Parse outputs
│ │ └── status_monitor.py # Monitor progress
│ ├── main.py # FastAPI application entry
│ └── requirements.txt # Python dependencies
│
├── shared/ # Shared utilities
│ ├── shared.md # Shared documentation
│ ├── config.py # Configuration management
│ ├── constants.py # Project constants
│ ├── utils.py # Common utilities
│ └── schemas/ # Data contracts
│ ├── input_schema.json # Analysis input format
│ └── output_schema.json # Analysis output format
│
├── tests/ # Test suite
│ ├── test_frontend.py # Frontend tests
│ ├── test_backend.py # Backend tests
│ └── test_integration.py # Integration tests
│
├── docs/ # User documentation
│ ├── 01_COPYKAT_OVERVIEW.md through 11_AUTOMATED_PIPELINE_GUIDE.md
│ └── GLOSSARY.md
│
├── config/ # Configuration files
│ └── analysis_config.yaml
│
└── Product Requirements Document (PRD)_*.md
- Node.js 18+ and npm (for React frontend)
- Python 3.8+ (for FastAPI backend)
- R 4.0+ (for CopyKAT analysis)
- Conda environment (recommended for R)
-
Clone the repository
git clone <repository-url> cd Fa25-Project4-CNV-Cancer-RNAseq-analysis
-
Setup Conda environment (if not already done)
conda create -n Project4-CNV-Cancer-RNAseq r-base -y conda activate Project4-CNV-Cancer-RNAseq
-
Install R packages (if not already done)
R -e 'install.packages("remotes", repos="http://cran.rstudio.com/")' R -e 'remotes::install_github("navinlabcode/copykat")' R -e 'install.packages(c("yaml", "logger", "rmarkdown", "ggplot2"), repos="http://cran.rstudio.com/")'
-
Install Python packages (backend API)
pip install -r backend/requirements.txt
-
Install Node.js packages (frontend)
cd frontend npm install -
Verify installation
# Test R R -e 'library(copykat); library(yaml); library(logger)' # Test Python/FastAPI python -c "import fastapi; print('FastAPI installed')" # Test Node.js npm --version
Terminal 1 - Backend API:
# From project root
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reloadTerminal 2 - Frontend:
# From project root
cd frontend
npm run devThe application will be available at:
- Frontend: http://localhost:5173 (React app)
- Backend API: http://localhost:8000 (FastAPI)
- API Docs: http://localhost:8000/docs (Swagger UI)
# Test the complete working example
Rscript backend/r_scripts/example_complete_workflow.R \
--input backend/data/raw/glioblastomas_compressed/GSE57872_GBM_data_matrix.txt.gz \
--name test_sample \
--output backend/results \
--genome hg20 \
--cores 4Results will be in backend/results/test_sample_*/
-
Always start from main branch
git checkout main git pull origin main
-
Create feature branch
git checkout -b feature/your-feature-name
-
Make changes and commit
git add . git commit -m "Add feature description"
-
Push and create Pull Request
git push origin feature/your-feature-name
-
Code review and merge
- Get review from team member
- Merge to main after approval
- docs/project-docs/instructions.md - Main onboarding guide
- docs/project-docs/prd.md - Product requirements
- frontend_legacy/frontend.md - Legacy Streamlit guide
- docs/project-docs/API_DOCS.md - FastAPI endpoint documentation
- docs/06_PYTHON_R_INTEGRATION.md
- backend/backend.md - Backend development guide
- backend/data/DATA_GUIDE.md
- docs/01_COPYKAT_OVERVIEW.md through docs/11_AUTOMATED_PIPELINE_GUIDE.md
-
Glioblastoma (GSE57872)
- Location:
backend/data/raw/glioblastomas_compressed/ - Cells: ~400
- Genes: ~20,000
- Source: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE57872
- Location:
-
Melanoma (GSE72056)
- Location:
backend/data/raw/melanoma_compressed/ - Cells: ~4,000
- Genes: ~23,000
- Source: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72056
- Location:
- Provide modern, responsive web interface for CNV analysis
- RESTful API architecture for easy integration
- Automate CopyKAT analysis pipeline
- Enable reproducible research
- Support multiple cancer datasets
- File Organization: Frontend code in
frontend/, backend code inbackend/, shared inshared/ - Communication: All frontend-backend calls through
backend/api/(never direct R calls) - Data Schemas: Use
shared/schemas/for input/output contracts - Configuration: Use
shared/config.pyandconfig/analysis_config.yaml - Documentation: Update relevant .md files when adding features
- Check docs/project-docs/instructions.md first
- Read development guides (frontend.md or backend.md)
- Review docs/07_TROUBLESHOOTING.md
- Create GitHub issue for bugs
# Run all tests
python -m unittest discover tests
# Run specific test file
python -m unittest tests/test_frontend.py- Docker Desktop (Windows/Mac) or Docker Engine (Linux)
- 8GB+ RAM (16GB+ recommended for large datasets)
- 10GB+ free disk space
Windows/Mac: Download from docker.com/products/docker-desktop
Linux (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-plugin
sudo usermod -aG docker $USER # Add user to docker group-
Clone or download the repository
git clone <repository-url> cd Fa25-Project4-CNV-Cancer-RNAseq-analysis
-
Start the application
docker compose up --build
First run takes 15-30 minutes to build. Subsequent starts are faster.
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
-
Stop the application
# Press Ctrl+C, or run: docker compose down
┌─────────────────────────────────────────────────────────────┐
│ Docker Network │
│ ┌──────────────────┐ ┌──────────────────────────────┐│
│ │ Frontend │ │ Backend ││
│ │ (Nginx) │─────▶│ (Python + R + CopyKAT) ││
│ │ Port 3000 │ /api │ Port 8000 ││
│ └──────────────────┘ └──────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
# Start in background (detached mode)
docker compose up -d --build
# View logs
docker compose logs -f
# Check status
docker compose ps
# Stop services
docker compose downUpload via Web Interface: Use the Upload page to add your data files
Direct File Copy: Copy data to backend/data/uploads/
Export Results:
# Copy results from container to local machine
docker cp copykat-backend:/app/results ./my_results| Dataset Size | RAM Required | Expected Runtime |
|---|---|---|
| ~500 cells | 4 GB | 2-5 minutes |
| ~2,000 cells | 8 GB | 10-20 minutes |
| ~5,000 cells | 12 GB | 30-60 minutes |
| ~10,000+ cells | 16 GB+ | 1-3 hours |
Increase Docker's memory allocation in Docker Desktop:
- Click Docker icon → Settings → Resources
- Increase Memory to 8-16 GB
- Click "Apply & Restart"
"Port already in use": Change port in docker-compose.yml:
frontend:
ports:
- "8080:80" # Change from 3000 to 8080"Out of memory": Increase Docker memory allocation (see above)
Build errors: Clean and rebuild:
docker compose down
docker system prune -f
docker compose up --buildView logs:
# All logs
docker compose logs
# Backend only
docker compose logs backend
# Follow in real-time
docker compose logs -fCreate a .env file in the project root to customize:
# .env
DOCKER_MODE=true
R_EXECUTABLE=Rscript
VITE_API_BASE_URL=/api
BACKEND_PORT=8000
FRONTEND_PORT=3000# Start services
docker compose up -d
# Access from other computers
# http://SERVER_IP:3000For production deployments, configure a reverse proxy (nginx/traefik) with HTTPS.
# Stop and remove containers
docker compose down
# Full cleanup (removes all data!)
docker compose down -v
docker system prune -af- CI/CD pipeline
- Additional CNV analysis tools (inferCNV)
- Extended test coverage
- Performance optimizations
See docs/project-docs/instructions.md for detailed contribution guidelines.
This project is for educational purposes as part of Fa25-Project4.
- CopyKAT: Gao et al. (2021) Nature Biotechnology
- CopyKAT GitHub: https://github.com/navinlabcode/copykat
- Streamlit Documentation: https://docs.streamlit.io/
Built with React, FastAPI, and CopyKAT