Multi-sensor storm object tracking and coregistration for NEXRAD, GOES, and NALMA data.
Edit config.yaml with your data paths and parameters:
# Dates to process
start_date: "2025-08-15"
end_date: "2025-08-31"
# Data paths
base_data_dir: "/mnt/data0/data/maas"
grid_file: "/mnt/data0/data/maas/domain/huntsville_grid_new.nc"
output_dir: "/home/femi/PyMOSAIC/Data/Results"
# Pipeline control
run_segmentation: false # Use existing TOBAC outputs
run_analysis: true # Generate plots and statisticsOption 1: Full pipeline (recommended)
bash run_pipeline.shOption 2: Python script directly
python run_coregistration.pyOption 3: Individual stages
# Run only segmentation
python object_segmentation/tobac_cappi_tracking.py
python object_segmentation/tobac_maas_goes.py
# Run only coregistration
python run_coregistration.py
# Run only analysis
python analysis/analyze_results.py- Checks Python version (3.8+)
- Validates config.yaml exists
- Verifies required packages installed
- Automatic, always runs
Controls: Multiple flags in config.yaml for granular control
run_segmentation: true # Master toggle - enable segmentation stage
run_radar_segmentation: true # Run NEXRAD CAPPI tracking
run_satellite_segmentation: true # Run GOES tracking
run_lightning_segmentation: false # Run NALMA tracking
run_coregistration: true # Run spatial matching
run_analysis: true # Run post-processingRuns TOBAC tracking on enabled sensors:
- NEXRAD CAPPI - Radar reflectivity storm cores
- GOES-16/19 - Infrared brightness temperature
- NALMA - Lightning flash rate
How it works:
- If
run_segmentation: false→ Skip all segmentation, use existing outputs - If
run_segmentation: true→ Run only the sensors enabled by individual flags
Outputs: Data/tobac/*.nc - Segmentation masks and feature statistics
Examples:
# Run only radar segmentation
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: false
run_lightning_segmentation: false
# Run radar + satellite (no lightning)
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false
# Skip all segmentation (use existing)
run_segmentation: falseControls: run_coregistration: true/false in config.yaml
Matches storm objects across sensors using:
- Spatial overlap (IoU threshold)
- Centroid distance (<25 km default)
- Temporal alignment (±5 minutes default)
Outputs:
Data/Results/<date>/output_<timestamp>.nc- NetCDF with coregistered objectsData/Results/<date>/output_<timestamp>.pkl- Pickle with Python objectsData/Results/<date>/processing_summary.json- Statistics and metadata
Controls: run_analysis: true/false in config.yaml
Generates:
- Match statistics and quality metrics
- Storm type distributions
- Spatial/temporal visualizations
- Validation reports
start_date: "2025-08-15" # Start of processing period
end_date: "2025-08-31" # End of processing period
base_data_dir: "/path/to/data" # Root directory for input data
output_dir: "./Data/Results" # Where to save results# Master toggles
run_segmentation: false # Run any TOBAC segmentation
run_coregistration: true # Run spatial matching
run_analysis: true # Generate post-processing reports
# Granular segmentation control (only applies if run_segmentation: true)
run_radar_segmentation: true # NEXRAD CAPPI tracking
run_satellite_segmentation: true # GOES tracking
run_lightning_segmentation: false # NALMA trackingmax_centroid_distance: 25.0 # Maximum distance (km)
iou_threshold: 0.01 # Minimum IoU for match
min_overlap_fraction: 0.01 # Minimum overlap fraction
max_area_ratio: 10.0 # Max GOES/NEXRAD area ratiodeep_storm_min_tb: 242.15 # Deep storm TB threshold (K)
deep_storm_min_vil: 5.0 # Deep storm VIL threshold (kg/m²)
deep_min_ref_max: 50.0 # Deep storm reflectivity (dBZ)
deep_nalma_min_flash_rate: 10 # Deep storm flash rate (min⁻¹)
deep_consensus_min_sensors: 2 # Sensors required for consensusprocessing_chunk_size: 10 # Files per chunk (1-50)
debug: false # Enable detailed logginguse_nalma: true # Enable lightning integration
nalma_min_flash_count: 1.0 # Minimum flashes for validity
nalma_temporal_tolerance_minutes: 5 # Temporal matching windowData/Results/20250815/
├── output_20250815_140000.nc # NetCDF with xarray Dataset
├── output_20250815_140000.pkl # Pickle with Python objects
├── output_20250815_140000.json # Metadata
├── output_20250815_143000.nc
└── ...
Data/Results/20250815/
├── processing_summary.json # Statistics
└── height_filtering_validation_report.txt # Quality report
Data/Results/
└── summary_20250815_20250831.json # Cross-day statistics
The pipeline has 3 main stages you can independently enable/disable:
| Stage | Config Flag | What It Does | Required? |
|---|---|---|---|
| Segmentation | run_segmentation |
Runs TOBAC object tracking | No - use existing |
| Coregistration | run_coregistration |
Matches objects across sensors | No - use existing |
| Analysis | run_analysis |
Generates plots and statistics | No - optional |
When run_segmentation: true, you can control individual sensors:
| Sensor | Config Flag | Data Source |
|---|---|---|
| Radar | run_radar_segmentation |
NEXRAD CAPPI reflectivity |
| Satellite | run_satellite_segmentation |
GOES brightness temperature |
| Lightning | run_lightning_segmentation |
NALMA flash rate |
# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: true # If you have NALMA data
run_analysis: truebash run_pipeline.sh# config.yaml
run_segmentation: false # Skip TOBAC (already done)
run_analysis: truebash run_pipeline.sh# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false # Skip NALMA
run_analysis: true# config.yaml
run_segmentation: true
run_radar_segmentation: false # Use existing
run_satellite_segmentation: true # Re-run this
run_lightning_segmentation: false
run_coregistration: false # Will run later
run_analysis: false# config.yaml
run_segmentation: true
run_radar_segmentation: true
run_satellite_segmentation: true
run_lightning_segmentation: false
run_coregistration: false # Skip matching, only create segmentation
run_analysis: false# config.yaml
run_segmentation: false
run_coregistration: false # Use existing coregistration outputs
run_analysis: true # Generate new plots/reports# config.yaml
start_date: "2025-08-15"
end_date: "2025-08-15"
processing_chunk_size: 5
debug: true# config.yaml
start_date: "2025-08-01"
end_date: "2025-08-31"
processing_chunk_size: 20
debug: false
keep_nexrad_only: false
save_only_deep_storms: trueSolution: Copy config.yaml.example to config.yaml and edit it
pip install numpy pandas xarray shapely pyyaml tobac trackpyCauses:
- Wrong
base_data_dirpath - Dates outside available data range
- Missing data for that date
Solution: Verify data exists at:
{base_data_dir}/nexrad/huntsville/cappi/{YYYYMMDD}/KHTX/*.nc
Possible causes:
iou_thresholdtoo high (try 0.001)max_centroid_distancetoo small (try 50 km)- Temporal mismatch (check GOES data availability)
- Using wrong segmentation source (
mcitvscappi)
Diagnosis:
# Enable debug mode
debug: trueCheck logs for:
- "No GOES objects within tolerance"
- "GOES load failed"
- "No valid NEXRAD objects after validation"
Solutions:
- Reduce
processing_chunk_size(try 5) - Disable 2D array export:
export_2d_arrays: false - Process fewer dates at once
- NEXRAD CAPPI files:
{base_data_dir}/nexrad/huntsville/cappi/{YYYYMMDD}/KHTX/*.nc - GOES-16/19 files:
{base_data_dir}/goes16/huntsville/grid/{YYYYMMDD}/*.nc - Static grid file: Domain definition with lat/lon coordinates
- TOBAC outputs (if
run_segmentation: false):{tobac_dir}/segmentation_mask_cappi_{YYYYMMDD}.nc{tobac_dir}/feature_stats_cappi_{YYYYMMDD}.nc
- NALMA lightning:
{base_data_dir}/nalma/huntsville/grid/{YYYYMMDD}/*.nc - MCIT files:
{base_data_dir}/nexrad/huntsville/mcit/{YYYYMMDD}/*.nc
Typical processing speeds (Intel Xeon, 32 GB RAM):
- TOBAC Segmentation: ~2-5 minutes per day
- Coregistration: ~1-3 minutes per day (1000-2000 objects)
- Total: ~5-10 minutes per day for full pipeline
Memory usage:
- Peak: ~4-8 GB (depends on chunk size)
- Average: ~2-4 GB
import xarray as xr
ds = xr.open_dataset("output_20250815_140000.nc")
# Available variables
ds.data_vars
# - object_id, timestamp, latitude, longitude
# - nexrad_max_ref, nexrad_vil, nexrad_cth
# - goes_min_tb, goes_cth, goes_area
# - nalma_flash_rate, nalma_flash_count
# - match_confidence, vertical_storm_type
# - overlap_area_km2, centroid_distance_kmimport pickle
with open("output_20250815_140000.pkl", "rb") as f:
results = pickle.load(f)
# Access matched objects
for obj in results.enhanced_objects:
print(f"Object {obj.object_id}:")
print(f" GOES matched: {obj.has_goes_match()}")
print(f" Storm type: {obj.vertical_storm_type}")
print(f" Confidence: {obj.match_confidence:.2f}")project_object_coregistration_v2/
├── run_pipeline.sh # Main orchestration script (NEW!)
├── run_coregistration.py # Python coregistration engine
├── config.yaml # User configuration
├── coregistration/ # Core package
│ ├── core.py # Matching engine
│ ├── config_loader.py # Configuration handling
│ ├── logging_utils.py # Logging setup
│ ├── utils.py # Data loading utilities
│ ├── data_structures.py # Pydantic models
│ ├── spatial_analysis.py # IoU, distances
│ └── quality_assessment.py
├── object_segmentation/ # TOBAC scripts
│ ├── tobac_cappi_tracking.py
│ ├── tobac_maas_goes.py
│ └── tobac_nalma_tracking.py
├── analysis/ # Post-processing
│ └── analyze_results.py
└── tests/
└── test_*.py
Heikenfeld, M., et al. (2019). tobac 1.2: towards a flexible framework for tracking and analysis of clouds. Geoscientific Model Development, 12, 4551-4570.
- NEXRAD: WSR-88D Level-II radar data
- GOES-16/19: Geostationary satellite imagery
- NALMA: North Alabama Lightning Mapping Array
For issues or questions:
- Check the troubleshooting section above
- Review log files in
{output_dir}/Logs/ - Enable
debug: truefor detailed diagnostics
Internal research tool - Oluwafemi Omitusa, 2025