A comprehensive data processing pipeline that standardizes global, national, and regional methane emission inventories into the rHEALPix DGGS. This project addresses the challenge of harmonizing diverse methane emission gridded inventories from multiple sources, temporal periods, spatial resolutions, and reporting units into a unified, spatially consistent framework for climate research and policy analysis.
- Spatial consistency: Equal-area grid cells for accurate spatial analysis
- Unit standardization: All outputs in Mg a⁻¹ (megagrams per year)
- IPCC2006 sector code: Standardized emission categorization
- Scalable processing: Optimized for large-scale datasets with HPC support
- Quality assurance: Area-weighted distribution preserving total emissions
The project is organized into several script categories:
Scripts for creating and preparing DGGS grids for all countries:
-
create_global_country_geojson.py- Creates global country boundaries from pygadm
-
simplify_global_countries.py- Simplifies country geometries to reduce file size
-
convert_country_geojson_to_dggs.py- Converts country boundaries to DGGS grid cells
- Uses rhealpix grid type at resolution 6
- Outputs: Individual country DGGS files
-
convert_offshore_to_dggs.py- Converts offshore areas to DGGS grid cells
- Handles marine emission zones
-
convert_single_geojson_to_dggs.py- Converts individual GeoJSON files to DGGS format
- Utility for single-country processing
Scripts for converting NetCDF data to DGGS using pre-calculated grids:
-
canada_netcdf_to_dggs_converter.py- Converts Canada NetCDF methane data to DGGS
- Uses IPCC2006 code aggregation
- Handles 2018 Canada anthropogenic methane emissions
- Units: molecules CH₄ cm⁻² s⁻¹ → Mg a⁻¹
-
china_SACMS_netcdf_to_dggs_converter.py- Converts China SACMS NetCDF data to DGGS
- Handles emission rate units (Mg km⁻² a⁻¹)
- Processes 2011 China anthropogenic methane emissions
- Uses 0.25° × 0.25° resolution
-
cms_netcdf_to_dggs_converter.py- Converts CMS NetCDF files to DGGS
- Handles flux units (molecules CH₄ cm⁻² s⁻¹)
- Processes Canada and Mexico files separately
- Uses external area data for calculations
-
global_edgar_netcdf_to_dggs_optimize.py- Converts EDGAR v8.0 global NetCDF data to DGGS
- Optimized for large-scale processing
- Handles 1970-2022 EDGAR v8.0 greenhouse gas CH4 emissions
- Multi-year processing capability
-
global_gfei_netcdf_to_dggs_optimize.py- Converts GFEI global NetCDF data to DGGS
- Handles 2016-2020 Global Fuel Exploitation Inventory
- Multi-year processing capability
-
mexico_netcdf_to_dggs_converter.py- Converts Mexico NetCDF methane data to DGGS
- Uses area data from Canada files
- Handles 2015 Mexico anthropogenic methane emissions
-
nys_netcdf_to_dggs_converter.py- Converts New York State (GNYS) NetCDF to DGGS
- Handles flux units (kg m⁻² s⁻¹)
- Uses EPSG:26918 projection (UTM Zone 18N)
- Processes 2020 data with 100m resolution
-
swiss_netcdf_to_dggs_converter.py- Converts Swiss (SGHGI) NetCDF to DGGS
- Handles units (g m⁻² yr⁻¹)
- Uses EPSG:21781 projection (CH1903/LV03)
- Processes 2011 data with 500m resolution
-
us_netcdf_to_dggs_converter.py- Converts US NetCDF methane data to DGGS
- Handles flux units (molecules CH₄ cm⁻² s⁻¹)
- Processes 2012-2018 US anthropogenic methane emissions
-
us_OG_netcdf_to_dggs_converter.py- Converts US Oil and Gas NetCDF to DGGS
- Handles flux units (kg/h)
- Processes 2021 US oil and gas emissions
- No area calculation needed (total emissions per pixel)
-
europe_netcdf_to_dggs_converter.py- Converts CAMS-REG-ANT Europe NetCDF data to DGGS
- Handles units (Tg) converted to Mg (1 Tg = 1e6 Mg)
- Processes 2005-2022 European anthropogenic methane emissions
- Uses 0.05° × 0.10° resolution (lat × lon)
- Aggregates ~14 variables to IPCC2006 codes via lookup table
- Uses raster-first area-weighted distribution with parallel processing
Scripts for converting GeoTIFF raster data to DGGS:
china_tiff_to_dggs_converter.py- Converts China GeoTIFF methane emission rasters to DGGS
- Handles units (Mg km⁻² a⁻¹)
- Processes 1990-2020 time series data
- Uses variable-to-IPCC2006 code mapping
Scripts for converting CSV point data to DGGS:
ind_aus_csv_to_dggs_converter.py- Converts India/Australia coal methane emissions CSVs to DGGS
- Handles point data (lat, lon, value in ton/year)
- Processes 2018 coal mining emissions
- Uses IPCC2006 code 1B1a (coal mining)
Utility scripts for data processing, combining, and cleanup:
-
combine_geojson_folder.py- Combines individual country DGGS files into a single file
- Generates:
data/geojson/global_countries_dggs_merge.geojson
-
merge_country_offshore_dggs_geometries.py- Merges country and offshore DGGS grids
- Handles duplicate zoneID removal
Analysis scripts for post-processing and summarizing DGGS-converted datasets:
-
create_dggs_coverage_geojsons.py- Creates dissolved and simplified DGGS coverage geometries for each inventory CSV
- Extracts unique DGGS cell IDs from CSV files and filters corresponding geometries
- Supports parallel processing for large datasets with automatic CPU detection
-
compute_sectoral_breakdown.py- Computes sectoral methane emission totals for selected national datasets
- Aggregates emissions by four broad IPCC sectors (Energy, Industrial Processes, Agriculture/Forestry, Waste)
- Processes US, Canada, Mexico, China, and Switzerland datasets
- Generates temporal breakdowns for China (1990-2020) and US (2012-2018)
-
generate_dggs_dataset_summary.py- Generates descriptive summary table for DGGS-converted methane inventory CSVs
- Computes dataset metadata: number of DGGS cells, resolution, year range, IPCC categories
- Uses configuration mappings for resolution and year range information
- Optimized for large files with chunked reading
HPC job scripts for running conversions on cluster systems:
run_*_conversion.sh: Individual conversion job scriptscombine_*.sh: Data combination job scriptscreate_global_dggs_geom.sh: DGGS grid creation job script
- Create global country boundaries (
create_global_country_geojson.py) - Simplify country geometries (
simplify_global_countries.py) - Convert to DGGS grid cells (
convert_country_geojson_to_dggs.py) - Convert offshore areas to DGGS (
convert_offshore_to_dggs.py) - Merge offshore grids and country grids (
merge_country_offshore_dggs_geometries.py) - Combine all grids to one single GeoJSON (
combine_geojson_folder.py) - Create local grids as needed (
convert_single_geojson_to_dggs.py)
All conversion processes output standardized CSV files with DGGS cell values in Mg a⁻¹ units:
Input Units: Various (molecules CH₄ cm⁻² s⁻¹, Mg km⁻² a⁻¹, kg m⁻² s⁻¹, kg/h, g m⁻² yr⁻¹) Output Units: Mg a⁻¹ (Megagrams per year)
- Load NetCDF data and extract variables
- Apply IPCC2006 code aggregation using lookup tables
- Convert NetCDF data to raster format
- Calculate pixel areas from coordinate reference system
- Convert input units to Mg a⁻¹ using appropriate formulas:
- Flux units (molecules CH₄ cm⁻² s⁻¹):
mass_Mg = (flux × area × seconds_per_year / AVOGADRO) × M_CH4 × (1e-6) - Emission rate units (Mg km⁻² a⁻¹):
mass_Mg = emission_rate × area_km2 - Mass flux units (kg m⁻² s⁻¹):
mass_Mg = flux × pixel_area_m2 × seconds_per_year / 1000 - Mass rate units (kg/h):
mass_Mg = flux_kg_h × hours_per_year × (1e-3) - Mass per area per year (g m⁻² yr⁻¹):
mass_Mg = (value × pixel_area_m2) / 1e6
- Flux units (molecules CH₄ cm⁻² s⁻¹):
- Apply area-weighted distribution to DGGS cells
- Apply scaling to preserve total emissions
- Output CSV files with DGGS cell values
Input Units: Mg km⁻² a⁻¹ (megagrams per square kilometer per year) Output Units: Mg a⁻¹ (Megagrams per year)
- Load GeoTIFF raster data
- Map variable names to IPCC2006 codes using lookup tables
- Rasterize DGGS cells to zone index raster aligned with GeoTIFF grid
- Calculate pixel areas in km² from raster CRS and transform
- Convert pixel values to total emissions:
mass_Mg = pixel_value × pixel_area_km2 - Aggregate to DGGS cells using numpy.bincount
- Apply scaling to preserve total emissions
- Output CSV files with DGGS cell values
Input Units: ton/year (tons per year) Output Units: Mg a⁻¹ (Megagrams per year)
- Load CSV point data (lat, lon, value)
- Create regular grid raster from point data in EPSG:4326
- Rasterize DGGS polygons to label raster aligned with the grid
- Aggregate per-pixel values to DGGS cells via numpy.bincount
- Convert units:
mass_Mg = value_ton × 1.0(1 ton = 1 Mg) - Apply scaling to preserve total emissions
- Output CSV files with DGGS cell values
The project processes various gridded methane emission inventories from multiple sources. Below is a comprehensive table of all datasets used:
| Dataset Name | Spatial Coverage | CRS | Resolution | Temporal Coverage | Files | Category Code Scheme | Sector | Unit | URL |
|---|---|---|---|---|---|---|---|---|---|
| EDGAR v8.0 Greenhouse Gas CH4 Emissions | Global | EPSG 4326 | 0.1° × 0.1° | 1970-2022 | 1272 | IPCC 2006 and IPCC 1996 | Agriculture, chemical, fuel, energy, natural gas, petroleum, waste | ton/year | EDGAR |
| U.S. Anthropogenic Methane Emissions | U.S. | EPSG 4326 | 0.1° × 0.1° | 2012-2018 | 7 | CRT | Coal mines, oil and gas, residential combustion, solid waste, wastewater | molecules CH₄ cm⁻² s⁻¹ | US EPA |
| Mexico Anthropogenic Methane Emissions | Mexico | EPSG 4326 | 0.1° × 0.1° | 2015 | 1 | IPCC 2006 code | Coal mines, oil and gas, residential combustion, solid waste, wastewater | Mg a⁻¹ km⁻² | Mexico Inventory |
| Global Fuel Exploitation Inventory GFEI | Global | EPSG 4326 | 0.1° × 0.1° | 2016(v1) 2019(v2) 2020(v3) | 21(v1) 20(v2) 20(v3) | IPCC 2006 | Coal mines, oil and gas | Mg a⁻¹ km⁻² | GFEI |
| Canada Anthropogenic Methane Emissions | Canada | EPSG 4326 | 0.1° × 0.1° | 2018 | 1 | CRT | Coal mines, oil and gas, residential combustion, solid waste, wastewater | Mg a⁻¹ km⁻² | Canada Inventory |
| Gridded New York State methane emissions inventory (GNYS) | New York State | UTM Zone 18N projection EPSG:26918 | 100m × 100m | 2020 | 1 | IPCC 1996 | Coal mines, oil and gas, residential combustion, solid waste, wastewater | kg m⁻² s⁻¹ | NYS Inventory |
| US oil and gas methane emissions | U.S. | EPSG 4326 | 0.1° × 0.1° | 2021 | 1 | - | Oil and gas | kg h⁻¹ | US OG |
| Carbon Monitoring System (CMS) data sets on Methane (CH₄) Flux | Canada and Mexico | EPSG 4326 | 0.1° × 0.1° | 2013, 2010 | 2 | - | Oil and gas | molecules CH₄ cm⁻² s⁻¹ | CMS Mexico, CMS Canada |
| Swiss Greenhouse Gas Inventory (SGHGI) | Switzerland | CH1903/LV03 EPSG:21781 | 500 m × 500 m | 2011 | 1 | - | Coal mines, oil and gas, residential combustion, solid waste, wastewater | g m⁻² a⁻¹ | SGHGI |
| CAMS-REG-ANT European Anthropogenic Methane Emissions | Europe | EPSG 4326 | 0.05° × 0.10° | 2005-2022 | 1 | GNFR14 | Multiple anthropogenic sectors | Tg or kg m⁻² s⁻¹ | CAMS-REG |
| China coal mine methane emissions | China | EPSG 4326 | 0.25° × 0.25° | 2011 | 1 | - | Coal mines | Mg a⁻¹ km⁻² | China SACMS |
| India and Australia coal mine methane emissions | India and Australia | EPSG 4326 | 0.1° × 0.1° | 2018 | 2 (csv) | - | Coal mines | metric ton a⁻¹ | India/Australia |
| CHN-CH₄ Anthropogenic Methane Emission Inventory of China | China | Krasovsky 1940 Albers projection | ~10 × 10 km | 1990-2020 | 8×31 (tiff) | - | Coal mines, oil and gas, residential combustion, solid waste, wastewater | Mg a⁻¹ km⁻² | China CHN-CH₄ |
- Spatial Coverage: Ranges from country-specific (Switzerland, New York State) to global coverage
- Resolution: Varies from high-resolution (100m × 100m) to coarser resolution (0.25° × 0.25°)
- Temporal Coverage: Spans from 1970 to 2024, with most datasets covering recent years
- Units: Diverse units including flux (molecules CH₄ cm⁻² s⁻¹), mass per area per time (Mg km⁻² a⁻¹), and total emissions (ton/year)
- File Formats: Primarily NetCDF files, with some CSV and GeoTIFF formats
- Source Categories: Mixed category systems including IPCC 2006 codes, CRT codes, and some datasets without standardized codes
- NetCDF files: Various methane emission datasets (EDGAR, GFEI, country-specific, local inventories, etc.)
- GeoTIFF files: Raster methane emission data (China time series)
- CSV files: Point-based emission data (India/Australia coal mining)
- Lookup tables: IPCC2006 code mappings in
data/lookup/ - Area data: Grid cell area information in
data/area_npy/ - GeoJSON files: Pre-calculated rHEALPix DGGS grid geometries
- CSV files: DGGS cell values with emission data
- NetCDF: Multi-dimensional scientific data format
- GeoTIFF: Georeferenced raster images
- CSV: Point data with latitude, longitude, and values
- Pre-calculated grids: Efficient processing using pre-computed DGGS grids
- Area-weighted distribution: Accurate spatial allocation of emission values
- IPCC2006 aggregation: Standardized emission categorization
- Multi-source support: Handles various data formats and units (NetCDF, GeoTIFF, CSV)
- Parallel processing: Optimized for large-scale data processing with multiprocessing
- Resume capability: Can restart from intermediate results
- Unit conversion: Automatic conversion between different emission units and final output unit as Mg a⁻¹
- HPC support: SLURM job scripts for cluster computing
- Comprehensive logging: Detailed processing logs for debugging and monitoring
# Create global country boundaries and simplify the geometries
python scripts/dggs_grid_creation/create_global_country_geojson.py
python scripts/dggs_grid_creation/simplify_global_countries.py
# Convert to DGGS format
python scripts/dggs_grid_creation/convert_country_geojson_to_dggs.py
# Convert offshore areas
python scripts/dggs_grid_creation/convert_offshore_to_dggs.py
# Merge offshore grids and country grids
python scripts/utilities/merge_country_offshore_dggs_geometries.py
# Combine all grids to one single geojson
python scripts/utilities/combine_geojson_folder.py
# Create local grids
python scripts/dggs_grid_creation/convert_single_geojson_to_dggs# Global datasets
python scripts/netcdf_conversion/global_edgar_netcdf_to_dggs_optimize.py
python scripts/netcdf_conversion/global_gfei_netcdf_to_dggs_optimize.py
# Country-specific datasets
python scripts/netcdf_conversion/canada_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/us_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/mexico_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/swiss_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/china_SACMS_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/europe_netcdf_to_dggs_converter.py
# Specialized datasets
python scripts/netcdf_conversion/cms_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/us_OG_netcdf_to_dggs_converter.py
python scripts/netcdf_conversion/nys_netcdf_to_dggs_converter.py# China time series data
python scripts/geotiff_conversion/china_tiff_to_dggs_converter.py# India/Australia coal mining data
python scripts/csv_conversion/ind_aus_csv_to_dggs_converter.py# Submit individual conversion jobs
sbatch SLURM_job_scripts/run_canada_netcdf_conversion.sh
sbatch SLURM_job_scripts/run_edgar_netcdf_conversion.sh
sbatch SLURM_job_scripts/run_china_tiff_conversion.sh