Skip to content

PyMOSAIC (Python Multi-Object Sensor Alignment for Integrated Convection) provides a framework for fusing convective objects detected across weather radar, satellite, and lightning detection networks.

Notifications You must be signed in to change notification settings

femiomitusa/PyMOSAIC

Repository files navigation

PyMOSAIC V2 Coregistration Pipeline

Multi-sensor storm object tracking and coregistration for NEXRAD, GOES, and NALMA data.

Quick Start

1. Configure

Edit config.yaml with your data paths and parameters:

# Dates to process
start_date: "2025-08-15"
end_date: "2025-08-31"

# Data paths
base_data_dir: "/mnt/data0/data/maas"
grid_file: "/mnt/data0/data/maas/domain/huntsville_grid_new.nc"
output_dir: "/home/femi/PyMOSAIC/Data/Results"

# Pipeline control
run_segmentation: false  # Use existing TOBAC outputs
run_analysis: true       # Generate plots and statistics

2. Run Pipeline

Option 1: Full pipeline (recommended)

bash run_pipeline.sh

Option 2: Python script directly

python run_coregistration.py

Option 3: Individual stages

# Run only segmentation
python object_segmentation/tobac_cappi_tracking.py
python object_segmentation/tobac_maas_goes.py

# Run only coregistration
python run_coregistration.py

# Run only analysis
python analysis/analyze_results.py

Pipeline Stages

Stage 0: Environment Setup & Validation

  • Checks Python version (3.8+)
  • Validates config.yaml exists
  • Verifies required packages installed
  • Automatic, always runs

Stage 1: Object Segmentation (Optional)

Controls: Multiple flags in config.yaml for granular control

run_segmentation: true            # Master toggle - enable segmentation stage
run_radar_segmentation: true      # Run NEXRAD CAPPI tracking
run_satellite_segmentation: true  # Run GOES tracking
run_lightning_segmentation: false # Run NALMA tracking
run_coregistration: true          # Run spatial matching
run_analysis: true                # Run post-processing

Runs TOBAC tracking on enabled sensors:

  • NEXRAD CAPPI - Radar reflectivity storm cores
  • GOES-16/19 - Infrared brightness temperature
  • NALMA - Lightning flash rate

How it works:

  1. If run_segmentation: false → Skip all segmentation, use existing outputs
  2. If run_segmentation: true → Run only the sensors enabled by individual flags

Outputs: Data/tobac/*.nc - Segmentation masks and feature statistics

Examples:

# Run only radar segmentation
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: false
run_lightning_segmentation: false

# Run radar + satellite (no lightning)
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false

# Skip all segmentation (use existing)
run_segmentation: false

Stage 2: Multi-Sensor Coregistration (Optional)

Controls: run_coregistration: true/false in config.yaml Matches storm objects across sensors using:

  • Spatial overlap (IoU threshold)
  • Centroid distance (<25 km default)
  • Temporal alignment (±5 minutes default)

Outputs:

  • Data/Results/<date>/output_<timestamp>.nc - NetCDF with coregistered objects
  • Data/Results/<date>/output_<timestamp>.pkl - Pickle with Python objects
  • Data/Results/<date>/processing_summary.json - Statistics and metadata

Stage 3: Post-Processing Analysis (Optional)

Controls: run_analysis: true/false in config.yaml

Generates:

  • Match statistics and quality metrics
  • Storm type distributions
  • Spatial/temporal visualizations
  • Validation reports

Configuration Reference

Core Settings

start_date: "2025-08-15"        # Start of processing period
end_date: "2025-08-31"          # End of processing period
base_data_dir: "/path/to/data"  # Root directory for input data
output_dir: "./Data/Results"    # Where to save results

Pipeline Control

# Master toggles
run_segmentation: false           # Run any TOBAC segmentation
run_coregistration: true          # Run spatial matching
run_analysis: true                # Generate post-processing reports

# Granular segmentation control (only applies if run_segmentation: true)
run_radar_segmentation: true      # NEXRAD CAPPI tracking
run_satellite_segmentation: true  # GOES tracking
run_lightning_segmentation: false # NALMA tracking

Matching Parameters

max_centroid_distance: 25.0    # Maximum distance (km)
iou_threshold: 0.01            # Minimum IoU for match
min_overlap_fraction: 0.01     # Minimum overlap fraction
max_area_ratio: 10.0           # Max GOES/NEXRAD area ratio

Storm Classification

deep_storm_min_tb: 242.15      # Deep storm TB threshold (K)
deep_storm_min_vil: 5.0        # Deep storm VIL threshold (kg/m²)
deep_min_ref_max: 50.0         # Deep storm reflectivity (dBZ)
deep_nalma_min_flash_rate: 10  # Deep storm flash rate (min⁻¹)
deep_consensus_min_sensors: 2  # Sensors required for consensus

Performance

processing_chunk_size: 10  # Files per chunk (1-50)
debug: false               # Enable detailed logging

NALMA Lightning

use_nalma: true                    # Enable lightning integration
nalma_min_flash_count: 1.0         # Minimum flashes for validity
nalma_temporal_tolerance_minutes: 5 # Temporal matching window

Output Files

Per-Timestamp Outputs

Data/Results/20250815/
├── output_20250815_140000.nc     # NetCDF with xarray Dataset
├── output_20250815_140000.pkl    # Pickle with Python objects
├── output_20250815_140000.json   # Metadata
├── output_20250815_143000.nc
└── ...

Daily Summaries

Data/Results/20250815/
├── processing_summary.json                    # Statistics
└── height_filtering_validation_report.txt    # Quality report

Batch Summaries (multi-day processing)

Data/Results/
└── summary_20250815_20250831.json  # Cross-day statistics

Pipeline Stage Control

The pipeline has 3 main stages you can independently enable/disable:

Stage Config Flag What It Does Required?
Segmentation run_segmentation Runs TOBAC object tracking No - use existing
Coregistration run_coregistration Matches objects across sensors No - use existing
Analysis run_analysis Generates plots and statistics No - optional

Segmentation Sub-Stages

When run_segmentation: true, you can control individual sensors:

Sensor Config Flag Data Source
Radar run_radar_segmentation NEXRAD CAPPI reflectivity
Satellite run_satellite_segmentation GOES brightness temperature
Lightning run_lightning_segmentation NALMA flash rate

Common Workflows

Workflow 1: First-Time Run (Full Pipeline - All Sensors)

# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: true  # If you have NALMA data
run_analysis: true
bash run_pipeline.sh

Workflow 2: Re-run Coregistration Only

# config.yaml
run_segmentation: false  # Skip TOBAC (already done)
run_analysis: true
bash run_pipeline.sh

Workflow 3: Radar + Satellite Only (No Lightning)

# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false  # Skip NALMA
run_analysis: true

Workflow 4: Update Only Satellite Segmentation

# config.yaml
run_segmentation: true
run_radar_segmentation: false      # Use existing
run_satellite_segmentation: true   # Re-run this
run_lightning_segmentation: false
run_coregistration: false          # Will run later
run_analysis: false

Workflow 5: Only Segmentation (Generate TOBAC Outputs)

# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false
run_coregistration: false          # Skip matching, only create segmentation
run_analysis: false

Workflow 6: Only Analysis (Visualize Existing Results)

# config.yaml
run_segmentation: false
run_coregistration: false          # Use existing coregistration outputs
run_analysis: true                 # Generate new plots/reports

Workflow 7: Quick Test (Single Day, Small Chunk)

# config.yaml
start_date: "2025-08-15"
end_date: "2025-08-15"
processing_chunk_size: 5
debug: true

Workflow 8: Production Run (Month-Long)

# config.yaml
start_date: "2025-08-01"
end_date: "2025-08-31"
processing_chunk_size: 20
debug: false
keep_nexrad_only: false
save_only_deep_storms: true

Troubleshooting

"config.yaml not found"

Solution: Copy config.yaml.example to config.yaml and edit it

"Missing Python packages"

pip install numpy pandas xarray shapely pyyaml tobac trackpy

"No CAPPI files found"

Causes:

  • Wrong base_data_dir path
  • Dates outside available data range
  • Missing data for that date

Solution: Verify data exists at:

{base_data_dir}/nexrad/huntsville/cappi/{YYYYMMDD}/KHTX/*.nc

Low match rates (<10%)

Possible causes:

  • iou_threshold too high (try 0.001)
  • max_centroid_distance too small (try 50 km)
  • Temporal mismatch (check GOES data availability)
  • Using wrong segmentation source (mcit vs cappi)

Diagnosis:

# Enable debug mode
debug: true

Check logs for:

  • "No GOES objects within tolerance"
  • "GOES load failed"
  • "No valid NEXRAD objects after validation"

Memory issues

Solutions:

  1. Reduce processing_chunk_size (try 5)
  2. Disable 2D array export: export_2d_arrays: false
  3. Process fewer dates at once

Data Requirements

Required Input Data

  1. NEXRAD CAPPI files: {base_data_dir}/nexrad/huntsville/cappi/{YYYYMMDD}/KHTX/*.nc
  2. GOES-16/19 files: {base_data_dir}/goes16/huntsville/grid/{YYYYMMDD}/*.nc
  3. Static grid file: Domain definition with lat/lon coordinates
  4. TOBAC outputs (if run_segmentation: false):
    • {tobac_dir}/segmentation_mask_cappi_{YYYYMMDD}.nc
    • {tobac_dir}/feature_stats_cappi_{YYYYMMDD}.nc

Optional Input Data

  • NALMA lightning: {base_data_dir}/nalma/huntsville/grid/{YYYYMMDD}/*.nc
  • MCIT files: {base_data_dir}/nexrad/huntsville/mcit/{YYYYMMDD}/*.nc

Performance Benchmarks

Typical processing speeds (Intel Xeon, 32 GB RAM):

  • TOBAC Segmentation: ~2-5 minutes per day
  • Coregistration: ~1-3 minutes per day (1000-2000 objects)
  • Total: ~5-10 minutes per day for full pipeline

Memory usage:

  • Peak: ~4-8 GB (depends on chunk size)
  • Average: ~2-4 GB

Output Data Format

NetCDF Variables

import xarray as xr
ds = xr.open_dataset("output_20250815_140000.nc")

# Available variables
ds.data_vars
# - object_id, timestamp, latitude, longitude
# - nexrad_max_ref, nexrad_vil, nexrad_cth
# - goes_min_tb, goes_cth, goes_area
# - nalma_flash_rate, nalma_flash_count
# - match_confidence, vertical_storm_type
# - overlap_area_km2, centroid_distance_km

Pickle Objects

import pickle
with open("output_20250815_140000.pkl", "rb") as f:
    results = pickle.load(f)

# Access matched objects
for obj in results.enhanced_objects:
    print(f"Object {obj.object_id}:")
    print(f"  GOES matched: {obj.has_goes_match()}")
    print(f"  Storm type: {obj.vertical_storm_type}")
    print(f"  Confidence: {obj.match_confidence:.2f}")

Architecture

project_object_coregistration_v2/
├── run_pipeline.sh          # Main orchestration script (NEW!)
├── run_coregistration.py    # Python coregistration engine
├── config.yaml              # User configuration
├── coregistration/          # Core package
│   ├── core.py              # Matching engine
│   ├── config_loader.py     # Configuration handling
│   ├── logging_utils.py     # Logging setup
│   ├── utils.py             # Data loading utilities
│   ├── data_structures.py   # Pydantic models
│   ├── spatial_analysis.py  # IoU, distances
│   └── quality_assessment.py
├── object_segmentation/     # TOBAC scripts
│   ├── tobac_cappi_tracking.py
│   ├── tobac_maas_goes.py
│   └── tobac_nalma_tracking.py
├── analysis/                # Post-processing
│   └── analyze_results.py
└── tests/
    └── test_*.py

References

TOBAC Framework

Heikenfeld, M., et al. (2019). tobac 1.2: towards a flexible framework for tracking and analysis of clouds. Geoscientific Model Development, 12, 4551-4570.

Data Sources

  • NEXRAD: WSR-88D Level-II radar data
  • GOES-16/19: Geostationary satellite imagery
  • NALMA: North Alabama Lightning Mapping Array

Support

For issues or questions:

  1. Check the troubleshooting section above
  2. Review log files in {output_dir}/Logs/
  3. Enable debug: true for detailed diagnostics

License

Internal research tool - Oluwafemi Omitusa, 2025

About

PyMOSAIC (Python Multi-Object Sensor Alignment for Integrated Convection) provides a framework for fusing convective objects detected across weather radar, satellite, and lightning detection networks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published