GitHub - gzhoubioinf/ColonyExplorer: A genotype–phenotype browser for ESKAPE pathogens — linking colony plate images with morphology metrics and genomic AMR and virulence factor data.

A Klebsiella pneumoniae discovery platform linking plate-to-fitness phenotypic data with established genomic typing (Kleborate) and predicted genetic associations (GWAS).

Overview

ColonyExplorer is an interactive Streamlit application for high-throughput chemical genomics screening of Klebsiella pneumoniae clinical isolates. It integrates four independent data streams into a single browser:

Plate images — high-density agar plate images (1536-well format) captured with IRIS
Phenotypic data — colony morphology metrics (size, circularity, opacity, biofilm) extracted per-colony by IRIS
Absolute fitness — pre-computed fitness scores per isolate per condition
Genomic data — AMR genes, virulence loci, and sequence types from Kleborate
GWAS results — gene- and SNP-level associations from pan-genome and SNP GWAS analyses

Dataset: 1,462 isolates · 225 conditions · 214 conditions with fitness data · 2 GWAS modes

Workflow and Methodology

Features

Feature	Description
Colony Inspection	Browse plate images for any isolate and condition; view four replicate crops with quantitative IRIS morphology metrics
Fitness Distribution	Visualise absolute fitness density across all isolates for any condition; see where a strain sits relative to the population with percentile annotation
Genomic Profiling	Full Kleborate strain overview: ST, K-type, AMR genes, virulence score, resistance score — linked to each colony
Gene-level GWAS	Pan-genome associations between gene presence/absence and fitness phenotypes; interactive table with fitness boxplots
SNP-level GWAS	SNP associations across conditions; allele distributions, annotations, and isolate presence per significant SNP
Seamless Navigation	Jump from any GWAS hit to Colony Viewer for any isolate — and back — with condition and gene context preserved

Project Structure

ColonyExplorer/
├── app/
│   ├── main.py                  # Entry point; navigation and home page
│   ├── colony_picker.py         # Colony Viewer page
│   ├── strain_overview.py       # Genomic metadata panels
│   └── utils/
│       ├── data_loading.py      # CSV / Excel / IRIS file parsers
│       └── image_handling.py    # Plate image loading and colony extraction
├── config/
│   └── config.yaml              # File and directory paths
├── data/
│   ├── plate_images/            # *.JPG.grid.jpg plate images
│   ├── iris_measurements/       # *.iris measurement files
│   ├── GWAS_files/              # Pan-genome and SNP presence/absence files
│   ├── GWAS_results/            # GWAS summary result files
│   ├── strain_names.csv         # Strain ID → Row / Column / Plate mapping
│   ├── kleborate_all.tsv        # Kleborate output (genomic metadata)
│   └── condition_clean_tags_mapping.csv  # Human-readable condition labels
├── requirements.txt
└── README.md

Data Format

Plate images

Filename convention: {Condition}-{Plate}-{Batch}_A.JPG.grid.jpg

Example: Ceftazidime-1ugml-1-1_A.JPG.grid.jpg

IRIS files

Filename convention: {Condition}-{Plate}-{Batch}_A.JPG.iris

Each file contains per-colony measurements and grid coordinates.

Strain map (`data/strain_names.csv`)

Must contain at minimum: ID, Row, Column, Plate, GenBank_acc, ENA_acc.

Kleborate file (`data/kleborate_all.tsv`)

Standard Kleborate output TSV; matched to strains via strain, GenBank_acc, or ENA_acc.

GWAS files (`data/GWAS_files/`)

gene_presence_absence_roary.csv — Roary pan-genome presence/absence matrix
pan_genome_reference.fa — pan-genome reference sequences
significant_snps_presence_absence.csv — SNP presence/absence matrix

Configuration

Edit config/config.yaml to point to your data:

files:
  strain_file: "data/strain_names.csv"
  kleborate_file: "data/kleborate_all.tsv"

directories:
  image_directory: "data/plate_images/"
  iris_directory:  "data/iris_measurements/"

Live Demo

https://colonyexplorer.kaust.edu.sa

Installation

Option A — Docker (recommended)

docker pull gzhoubioinf09/colonyexplorer:latest
docker run -p 8501:8501 gzhoubioinf09/colonyexplorer:latest

Open http://localhost:8501 in your browser.

Option B — Local install

Requirements: Python 3.11+

git clone https://github.com/gzhoubioinf/ColonyExplorer.git
cd ColonyExplorer

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate          # macOS / Linux
# venv\Scripts\activate           # Windows

# Install dependencies
pip install -r requirements.txt

Running the App

Locally (after Option B install):

streamlit run app/main.py

Opens at http://localhost:8501.

On Render (or any server):

streamlit run app/main.py --server.port $PORT --server.address 0.0.0.0

Usage

Colony Viewer

Navigate to Colony Viewer in the left panel.
Select a Condition — the plate/batch list updates automatically.
Select a Plate / Batch.
Choose a lookup method:
- Search by accession number — select a strain from the dropdown, shown as ID (Accession). Reference strains without an accession show ID (NA); if the same ID appears at multiple wells, the well position is appended as ID (NA, R#, C#).
- Enter grid position — enter Row (1–32) and Column (1–48) directly.
Optionally adjust which Metrics to display.
Click Analyse to load the plate image and display results.

Gen-GWAS Explorer

Navigate to Gen-GWAS Explorer.
Select a condition and sample from the dropdowns.
Browse the GWAS results table; filter by significance threshold.
Expand any gene to view the isolate presence/absence table and pan-genome sequence.
Click any isolate row to jump directly to Colony Viewer.

SNP-GWAS Explorer

Navigate to SNP-GWAS Explorer.
Select a condition and SNP from the results table.
View isolates carrying the SNP, sorted by p-value or effect size.
Click any isolate row to jump to Colony Viewer.

Available Metrics

Metric	Description
Colony size	Total colony area in pixels
Circularity	Roundness (1 = perfect circle)
Opacity	Optical density proxy for colony density
Colony color intensity	Mean pixel intensity of the colony
Biofilm area size	Area covered by biofilm
Biofilm color intensity	Mean intensity within the biofilm region
Biofilm area ratio	Fraction of colony area covered by biofilm
Size normalized color intensity	Color intensity corrected for colony size
Mean sampled color intensity	Sampled mean intensity
Average pixel saturation	Mean HSV saturation across colony
Max 10% opacity	90th-percentile opacity value

References

If you use ColonyExplorer in your research, please cite our work:

(citation forthcoming)

This application relies on the following foundational workflows and tools. Please also consider citing them:

High-Throughput Phenotypic Screening Pipeline: Williams G., Ahmad H., Sutherland S., et al. (2025). High-throughput chemical genomic screening: a step-by-step workflow from plate to phenotype. mSystems, 10(12), e00885-25. DOI: 10.1128/msystems.00885-25
Kleborate (Genomic Profiling): Lam, M. M. C., et al. (2021). A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex. Nature Communications, 12(1), 4188. DOI: 10.1038/s41467-021-24448-3
IRIS (Phenotypic Image Analysis): Kritikos, G., Banzhaf, M., Herrera-Dominguez, L., et al. (2017). A tool named Iris for versatile high-throughput phenotyping in microorganisms. Nature Microbiology, 2(5), 17014. DOI: 10.1038/nmicrobiol.2017.14
Scoary (Bacterial GWAS): Brynildsrud, O., Bohlin, J., Scheffer, L., & Eldholm, V. (2016). Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biology, 17(1), 238. DOI: 10.1186/s13059-016-1108-8
Panaroo (Pan-genome Pipeline): Tonkin-Hill, G., MacAlasdair, N., Ruis, C., et al. (2020). Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biology, 21(1), 180. DOI: 10.1186/s13059-020-02090-4

Contact and Collaboration

ColonyExplorer is a joint project developed by the Infectious Disease Epidemiology Lab (KAUST) and the Banzhaf Lab (Newcastle University).

Support and Inquiries:

Name	Email
Ge Zhou (PhD student)	ge.zhou@kaust.edu.sa
Danesh Moradigaravand (PI)	danesh.moradigaravand@kaust.edu.sa
Manuel Banzhaf (PI)	manuel.banzhaf@newcastle.ac.uk

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Workflow and Methodology

Features

Project Structure

Data Format

Plate images

IRIS files

Strain map (`data/strain_names.csv`)

Kleborate file (`data/kleborate_all.tsv`)

GWAS files (`data/GWAS_files/`)

Configuration

Live Demo

Installation

Option A — Docker (recommended)

Option B — Local install

Running the App

Usage

Colony Viewer

Gen-GWAS Explorer

SNP-GWAS Explorer

Available Metrics

References

Contact and Collaboration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.streamlit		.streamlit
app		app
config		config
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Folders and files

Latest commit

History

Repository files navigation

Overview

Workflow and Methodology

Features

Project Structure

Data Format

Plate images

IRIS files

Strain map (data/strain_names.csv)

Kleborate file (data/kleborate_all.tsv)

GWAS files (data/GWAS_files/)

Configuration

Live Demo

Installation

Option A — Docker (recommended)

Option B — Local install

Running the App

Usage

Colony Viewer

Gen-GWAS Explorer

SNP-GWAS Explorer

Available Metrics

References

Contact and Collaboration

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Strain map (`data/strain_names.csv`)

Kleborate file (`data/kleborate_all.tsv`)

GWAS files (`data/GWAS_files/`)

Packages