Map microbial community activity onto the physical and chemical landscape of a mine site.
Acid mine drainage (AMD) is one of the most persistent environmental challenges in mining. When sulphide minerals are exposed to water and air, sulphur-oxidizing bacteria like Acidithiobacillus thiooxidans catalyze the production of sulfuric acid. This dissolves heavy metals — iron, arsenic, copper, zinc — into drainage water, contaminating downstream ecosystems for decades. The key insight is that this isn't random: physical terrain drives where water flows, water flow drives where chemistry concentrates, and chemistry drives where these microbial communities assemble. Understanding this spatial cascade is the difference between reactive cleanup and predictive management.
MineScope integrates three data streams — metagenomics sequencing, LiDAR terrain mapping, and soil chemistry — into a unified spatial model. Every grid cell on the mine site gets a complete profile: what the terrain looks like, what the chemistry is, and what the microbial community is doing. The result is a platform that shows not just where AMD is happening, but why it's happening there — and where it will happen next.
→ Live Dashboard · → Pipeline Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ Nextflow 26 + Wave Containers │
│ │
│ FASTQ ──→ MEGAHIT Assembly ──→ MetaPathways v3.5 ──→ BLAST output │
│ (224K contigs) (SwissProt annotation) (220K hits) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Bronze │ │ Silver │ │ Gold │
│ Raw data │───▶│ Validated │───▶│ Integrated │──▶ Dashboard
│ + Pydantic │ │ + Parquet │ │ Spatial │
│ validation │ │ + CLR norm │ │ Merge │
└──────────────┘ └──────────────┘ └──────────────┘
▲ │
│ ▼
LiDAR Grid ─────────────────────────────▶ 2,505 cells × 18 columns
Soil Chemistry ─────────────────────────▶ (x,y) spatial join
Three parallel data streams converge through a medallion architecture into a single gold layer. Each cell in the gold layer contains terrain topology, soil chemistry, and microbial functional annotation — merged on spatial coordinates.
| Decision | Why |
|---|---|
| Hexagonal ports & adapters | Each data source gets its own adapter. If the LiDAR format changes from CSV to GeoTIFF, only the adapter changes — domain logic stays untouched. |
| Medallion layers (Bronze → Silver → Gold) | Clear data lineage. Every value is traceable back to its raw source. Validation at every boundary catches corruption early. |
| MetaPathways v3.5 containerized via Wave | Koonkie's own pathway reconstruction tool, running in Docker with automatic multi-arch provisioning. Same pipeline runs on Apple Silicon, x86 servers, and cloud HPC without configuration changes. |
| D8 flow direction encoding | Industry-standard hydrology encoding (powers of 2 for 8 compass directions). Any GIS tool recognizes it. Derived from terrain aspect — tells us where water flows at every cell. |
| Pydantic at every boundary | Schema-as-code. Constraints encode domain knowledge: pH can't exceed 14, concentrations can't be negative, slope can't be negative. Bad data fails at ingestion, not silently downstream. |
| Polars over Pandas | Type-strict, native Parquet I/O, 5-10x faster. Aligns with the validation-first philosophy. |
| Seqera Platform (Tower) | Pipeline runs are tracked in Seqera Cloud. Full provenance, monitoring, and scheduling for production deployment. |
| CLR normalization in Silver | Centered log-ratio transformation prepares functional counts for multi-sample comparison. Ready for when per-location metagenomes arrive. |
| Stream | Source | Details |
|---|---|---|
| Metagenomics | SRR6189722 (NCBI SRA) | Gold mine tailings metagenome, Kuzbass Russia. 454 GS FLX Titanium, single-end, 438K reads. Annotated with MetaPathways v3.5 (Hallam Lab, UBC). |
| LiDAR | Synthetic (realistic) | 50×50m grid at 1m resolution. Drainage channels, ridge lines, tailings mound, excavation pit. Based on published mine site topography. |
| Soil Chemistry | Synthetic (realistic) | 150 GPS-tagged samples. pH 1–5, Fe up to 6000 mg/L, As up to 800 mg/L. Values correlated with terrain drainage — based on published chemistry from the SRR6189722 paper. |
Prerequisites: Python 3.12+, Node.js 18+, Docker, Nextflow, uv
# Clone
git clone https://github.com/hrrysprk/MineScope.git
cd MineScope
# Install Python dependencies
uv sync
# Generate synthetic data
uv run python scripts/data_generation/generate_lidar.py
uv run python scripts/data_generation/generate_chemistry.py
# Run the data pipeline (bronze → silver → gold → dashboard JSON)
uv run python scripts/run_pipeline.py
# Start the dashboard
cd dashboard
npm install
npm run dev
# Open http://localhost:5173Full pipeline (requires Docker + reference databases):
nextflow run main.nf -profile docker \
--input_fastq data/bronze/metagenomics/SRR6189722.fastqSee SETUP.md for database configuration and full pipeline setup.
| Process | Tool | Input | Output |
|---|---|---|---|
ASSEMBLE_READS |
MEGAHIT | Single-end FASTQ | Assembled contigs (FASTA) |
PATHWAY_PROFILING |
MetaPathways v3.5 | Contigs + SwissProt DB | Functional annotations (BLAST output) |
Conditional bypass: if pre-assembled contigs exist (--input_fasta), assembly is skipped. Supports both Docker and Conda execution profiles.
| Layer | Transform | Output |
|---|---|---|
| Bronze → Silver (LiDAR) | CSV → Parquet | Validated terrain grid |
| Bronze → Silver (Chemistry) | CSV → Parquet | Validated sample points |
| Bronze → Silver (Metagenomics) | BLAST → best hit per ORF → functional counts + CLR | Normalized pathway abundance |
| Silver → Gold | Spatial join on (x, y) coordinates | Unified 2,505 × 18 Parquet |
The dashboard reveals the AMD spatial cascade:
-
Elevation layer — The drainage channel is visible as a blue-green trough running diagonally across the site. Water accumulates here.
-
pH layer — Acidity concentrates along the drainage path (pH 1.2–2.5 in the channel vs 4+ on ridges). This is where sulfuric acid pools.
-
Iron layer — Dissolved iron peaks where pH is lowest (>6000 mg/L in the channel). Classic AMD geochemistry signature.
-
Pathway enrichment — The top annotated functions from MetaPathways are biofilm signaling, heavy metal efflux, and cation resistance. These are survival strategies for organisms living in extreme metal/acid conditions.
The pattern confirms: terrain → chemistry → biology. The microbial community isn't randomly distributed — it's spatially organized by the physical and chemical landscape.
| Layer | Technology |
|---|---|
| Pipeline orchestration | Nextflow 26 + Wave containers |
| Pipeline monitoring | Seqera Platform (Tower) |
| Assembly | MEGAHIT (containerized) |
| Pathway annotation | MetaPathways v3.5 (containerized) |
| Data processing | Python 3.12, Polars, Pydantic |
| Package management | uv (lockfile-based reproducibility) |
| 3D Visualization | React, Three.js, React Three Fiber |
| Charts | D3.js |
| Deployment | GitHub Pages (static) |
MineScope/
├── main.nf # Nextflow pipeline (assembly → annotation)
├── nextflow.config # Wave containers, Docker/Conda profiles
├── modules/
│ ├── assemble_reads.nf # MEGAHIT process
│ └── pathway_profiling.nf # MetaPathways process
├── src/
│ ├── adapters/ # Hexagonal ports (one per data source)
│ │ ├── chemistry/reader.py
│ │ ├── lidar/reader.py
│ │ └── metagenomics/reader.py
│ ├── domain/models/ # Pydantic schemas
│ │ ├── chemistry.py
│ │ ├── lidar.py
│ │ └── pathway.py
│ └── layers/ # Medallion transforms
│ ├── silver/ # Validation + normalization
│ └── gold/ # Spatial merge
├── scripts/
│ ├── data_generation/ # Synthetic LiDAR + chemistry
│ └── run_pipeline.py # Bronze → Silver → Gold orchestration
├── dashboard/ # React + Three.js + D3
│ └── src/
│ ├── components/ # TerrainViewer, RadarChart, HeatmapGrid...
│ └── pages/ # Terrain, Analytics, Data, Pipeline
├── data/
│ ├── bronze/ # Raw inputs
│ ├── silver/ # Validated Parquet
│ └── gold/ # Integrated spatial dataset
└── databases/ # MetaPathways reference DBs (not in repo)
- Automated protein-to-function mapping for pathway enrichment (replace manual accession lookup)
- Per-location metagenomes for true spatial resolution of functional profiles
- MetaCyc integration when institutional license access is resolved
- Multi-format amplicon pipeline: FASTQ, QIIME2 artifacts, OTU tables, BIOM format
- Time-series monitoring via Seqera Platform (scheduled pipeline runs + threshold alerts)
- AMD risk score — composite spatial index for predictive site management
MIT

