This repo contains our code for mapping 7 major landcover types in India's Open Natural Ecosystems (ONE): saline areas, sand dunes, ravines, sparsely vegetated areas, open savannas, shrub savannas and woodland savannas. A visualization of this map is available here.
This probabilistic land cover map was produced using an explicit hierarchical classification approach. See below for more on this approach.
NOTE: This code base got us our current map, and is being published here in the spirit of openness and transparency. Expect it to contain a few code chunks that are related to, but not directly involved in, producing this map. They were useful while we iterated through earlier versions of the map.
The building blocks of our approach, and how they have been organized into modules in our code, are depicted in the schematic below. The steps involved in our processing workflow are described below. More details about how the code and modules are organized are in the sections that follow.
The rest of this document is arranged as follows:
- Information about the output
- Steps to reproduce our analysis
- Additional module-level details
- Funding and support
Description | |
---|---|
![]() |
This dataset provides a probabilistic land cover classification for semi-arid India, covering 18 of its non-Himalayan states. It maps 7 different types of Open Natural Ecosystems and 5 classes of other land cover types. Earth Engine asset ID: ee.Image('projects/ee-open-natural-ecosystems/assets/publish/onesWith7Classes/landcover_hier') (link) |
See here for an Earth Engine sample script to get started.
Resolution: 30 meters per pixel.
Index | Name | Scaling | Min | Max | Description |
---|---|---|---|---|---|
0 | l1LabelNum | None | 100 | 200 | Numeric Code of Level 1 Label. 100: Non-ONE, 200: ONE. |
1 | l2LabelNum | None | 1 | 12 | Numeric Code of Level 2 Label. 1: agri_hiBiomass, 2: agri_loBiomass, 3: bare, 4: built, 5: dune, 6: forest, 7: ravine, 8: saline, 9: savanna_open, 10: savanna_shrub, 11: savanna_woodland, 12: water_wetland. |
2 | probL1Label | 10000 | 0 | 10000 | Probability Value of 'Winning' Level 1 Label. |
3 | probL2Label | 10000 | 0 | 10000 | Probability Value of 'Winning' Level 2 Label. |
4 | prob_nonone | 10000 | 0 | 10000 | Probability that a pixel is a Non-ONE. |
5 | prob_one | 10000 | 0 | 10000 | Probability that a pixel is an ONE. |
6 | prob_one_bare | 10000 | 0 | 10000 | Probability that a pixel is Bare or Sparsely-Vegetated. |
7 | prob_one_dune | 10000 | 0 | 10000 | Probability that a pixel is a Dune. |
8 | prob_one_ravine | 10000 | 0 | 10000 | Probability that a pixel is a Ravine. |
9 | prob_one_saline | 10000 | 0 | 10000 | Probability that a pixel is a Saline area. |
10 | prob_one_savanna_open | 10000 | 0 | 10000 | Probability that a pixel is an Open Savanna. |
11 | prob_one_savanna_shrub | 10000 | 0 | 10000 | Probability that a pixel is a Shrub Savanna. |
12 | prob_one_savanna_woodland | 10000 | 0 | 10000 | Probability that a pixel is a Woodland Savanna. |
13 | prob_nonone_agri_hiBiomass | 10000 | 0 | 10000 | Probability that a pixel is under 'high biomass' agriculture (e.g., orchards, groves, tree-crops, agroforestry). |
14 | prob_nonone_agri_loBiomass | 10000 | 0 | 10000 | Probability that a pixel is under 'low biomass' open agriculture (e.g., cereals, pulses, vegetables & oilseeds). |
15 | prob_nonone_built | 10000 | 0 | 10000 | Probability that a pixel is a built-up area. |
16 | prob_nonone_forest | 10000 | 0 | 10000 | Probability that a pixel is a forest. |
17 | prob_nonone_water_wetland | 10000 | 0 | 10000 | Probability that a pixel is a waterbody, or seasonal wetland. |
The map is generated by running 3 Python notebooks which call functions from 3 different modules, each of which require a variety of input parameters. The sections below pertain to these input parameters and notebooks, followed by additional module-level details.
Config file: config.ini
.
Values of nearly all running configuration and algorithm input parameters of the various code modules are stored as variables in the config file. They are read from this file at runtime. Change their values in config.ini
, before re-running the code.
An exception to this rule is the classification module. Here, a few key parameters determining the configuration of the final classification run can be specified as function arguments. See
classifyHierarch.ipynb
, for how to do so.
When starting a fresh round of analysis: create a new folder in your Earth Engine asset space, and set its path as the value of assetFolderWithFeatures
in config.ini
.
When refreshing labeled points data for modeling: upload a points table (from, eg., a CSV or a Shapefile) into an Earth Engine Feature Collection, and set its path as the value of lulcLabeledPoints
in config.ini
. Ensure that the column with labels is named as label_2024
. The table used in our analysis is at trainingData/trPts_2024.csv
.
Python notebook: generateAOI.ipynb
.
Module used: areaOfInterestMask
.
The generateAOI.ipynb
notebook uses the areaOfInterestMask
module (see here for additional module-level details) to take area-of-interest and biogeographic / geomorphological / geological classification information as inputs and produces corresponding mask rasters, in numeric- and one-hot-encoded formats.
All inputs and running parameters for this step are set in, and used from, config.ini
.
Python notebook: calcFeatureRasterAndPoints.ipynb
.
Module used: featuresForModeling
.
The calcFeatureRasterAndPoints.ipynb
notebook uses the featuresForModeling
module (see here for additional module-level details) to take historical satellite imagery and various other gridded geospatial datasets and generates several features capturing the biophysical characteristics of landscapes at the pixel scale. It also uses labelHierarchy.json
which contains a representation of the label hierarchy.
It then attaches these features and the zonation masks from the previous step to the labeled points, thus producing a table of labeled points with their corresponding feature vectors.
All inputs and running parameters for this step are set in, and used from, config.ini
.
Python notebook: classifyHierarch.ipynb
.
Module used: classification
.
The classifyHierarch.ipynb
notebook uses the classification
module (see here for more on how it is organized) to train classifiers hierarchically using the table of feature-vector-attached labeled points. It then predicts with the herarchical classifiers to produce multiple intermediate probabilistic predictions. These intermediate predictions are then combined to produce a final map containing, for each hierarchical level, pixel-wise probability of each landcover type and the top-ranking landcover type label for that pixel.
Four ways of building hierarchical classification are considered here:
- implicit
- dependent
- explicit
- using multiplicative rule
- using step-wise rule
After evaluations, the final map we published is the result of multiplicative explicit hierarchical classification approach. See classifyHierarch.ipynb
notebook for how to perform each of these classifications.
Many inputs and running parameters for this step are set in, and used from, config.ini
. However, unlike in the previous steps, some of the key running parameters in this step are possible to be specified in-line in the notebook and passed as arguments into the training & prediction routine. These include:
- The type of classifier to use (Random Forests / Gradient Boosted Trees).
- The set of features to use for training the classifier (and, hence, while predicting with it).
- The Earth Engine Asset folder to save the results in.
This programmatic flexibility allows for producing maps under different processing configurations (in parallel, even) relatively easily and reliably. Hence, it becomes possible to systematically and iteratively refine the classification towards better maps at-scale.
The following are details about the modules, the code they contain, and how they are strung together into our processing workflow.
-
areaOfInterestMask/semiarid.py
: Produces 0/1 masks of zones based on states, biogeographic, geomorphological and geological zones, in both numeric and one-hot encoding modes, as well as a legacy mask for India's semi-arid zone used in a previous work. The following functions produce these masks based on Feature Collections that user needs to provide for states, biogeographic, geomorphological and geological zones.classificationZonesFromStatesNumeric()
andclassificationZonesFromStatesOneHotEncoded()
classificationZonesFromBiomesNumeric()
andclassificationZonesFromBiomesOneHotEncoded()
classificationZonesFromGeologicalAgeNumeric()
andclassificationZonesFromGeologicalAgeOneHotEncoded()
maskWithClassLabels()
Run these, from the notebook
generateAoi.ipynb
, to generate & store these rasters.
-
featuresForModeling/generateFeatures.py
: Uses satellite imagery and other gridded geospatial datasets to generate feature-rasters. Samples these feature rasters to attach labeled points with their features. The following functions produce the features and perform the sampling.seasonalityParamsL8()
tasseledCalCoeffsL8()
multiTemporalInterPercentileDifferencesL8()
palsarAggregation()
- ...
geomorphologyTerrainRuggednessAggregation()
assembleFeatureBandsAndExport()
First, run the functions to generate the feature rasters individually. Once those are completed, run the function
assembleFeatureBandsAndExport()
to assemble them all into a composite raster and sample it to attach feature vectors to all the labeled points. Run all these functions from the notebookcalcFeatureRasterAndPoints.ipynb
.
-
classification/classifyAndAssess.py
: Performs training of hierarchical classifiers, predicts with each of them and then combines these predictions into a final land cover map and calculates the classification performance metrics.trainAndPredictHierarchical_master()
To run classification with different classifiers, choice of features to use, etc., define these as variables and pass them appropriately into
trainAndPredictHierarchical_master()
as arguments. UseclassifyHierarch.ipynb
to see how to do this.
Financial support for various aspects of this mapping work came from:
- The Habitats Trust
- National Centre for Biological Sciences, its Archives, TNQ Technologies
- The Nadathur Foundation
- Azim Premji University as part of the Research Funding Programme
- ATREE
Technical and logistical support came from:
Further, our work would not be possible without the creativity and generosity of efforts behind many free, public and open source scientific computation resources and software tools, chief among them being:
- geemap by Qiusheng Wu
- Spatial Thoughts by Ujaval Gandhi
- awesome-gee-community-catalog by Samapriya Roy
- Google Earth Engine Developers Group
- Google Earth Engine on Stack Exchange
- QGIS
- Yoni Gavish of Gavish et al. (2018)
- Multiple publicly-funded central and state government portals and repositories.
These analyses were carried out on the Google Earth Engine cloud computing platform.