This repository contains python code and sample data developed for multi-tier land use land cover mapping project in Vogeler's Lab during 2021-2024. The code is tailored to that specific project needs, but it provides a complete walk through all the steps of remote sensing data download, processing, model building, testing, and validation. It does not need any installation but the modules required to run each script are specified in its heading.
There are four main python scripts with file names starting with numbers 1. to 4., indicating their position in the workflow. There are two additional files starting with the word 'Modules', which are general reusable modules to process Sentinel-1 and Sentinel-2 data, and are used throughout the numbered scripts. Each of four main scripts contain a variable declaration block at the start of the file, which should be carefully looked at and updated for user specific settings and file locations before running.
Folder SampleData contains sample files that are generated during this workflow. More detail is given below in the workflow description section.
The workflow starts by running 1.MakeSampleData.py
, which uses an already defined set of points in a Google Earth Engine (GEE) repository to sample a variety of remote sensing sources such as Sentinel-1, Sentinel-2, SRTM, WorldClim, etc. The results are saved in a structured binary file using python pickle format (.p
) along with a text file (.txt
) describing the data export parameters. A training and a validation dataset for points in South Africa are placed in the SampleData folder, named SA_urbanTrainSet
and SA_urbanValidationSet
. The train dataset will be used for model development in step 2, and the validation dataset is set aside for model validation in step 3.
The second step is to analyze the extracted features, find their correlation, and investigate what will be the best set of features for the mapping classifier (Random Forest in this example), which is done by running 2.AnalyzeFeatures.py
. Before analyzing the extracted features, trained interpreters need to manually assign reference labels to the training and validation points. Two sample .csv
files for this purpose are placed in SampleData
folder, named SA_urbanTrainSetLabels.csv
and SA_urbanValidationSetLabels.csv
. The 2. script first calculates extracted features correlation over observed points and clusters them based on their distance in feature space and creates a dendrogram. The sample correlation matrix and dendrogram are saved in the SampleData
folder as correlations.csv
and Dendrogram.png
, respectively. Next the code cuts the dendrogram at different levels and randomly selects one feature from each cluster resulting in a reduced feature set. A Random Forest model is fit using the reduced features and reference labels, and calculates the model score. The code repeats this process over many iterations and reports the outcome in a text file (SampleData\clustering_iter30x20.txt
).
The text file obtained in the previous step can be converted to an Excel compatible format for easier sorting and investigation of the best feature combination (giving the best score). The best model can also be further examined by other means such as cross-validation or creating actual maps using the trained models and doing visual inspection to select the best model.
The third step is to create a full model trained on all training data and tested against the validation data. Script 3.BuildandTestModel.py
does those tasks but the user should manually enter the final selected set of features in the script (which was determined by manually investigating the results of step 2). The script then generates the final RF model and validates it. The model is saved as a .joblib
file (SampleData\SA_finalModel.joblib
) that is accompanied with a .txt
file (SampleData\SA_finalModel.txt
) describing model parameters, performance data, and features ranking. It also creates the model's evaluation report separately for each year using the validation data and creates a .txt
file (SampleData\SA_finalModel_validation.txt
) to report it and also a .csv
file (SampleData\SA_finalModel_validation.csv
) to provide the reference and predicted labels for each point. The last two files will be used in the fourth step.
The fourth and last step is to do an area-adjusted map accuracy assessment to evaluate the model for the real application of map generation. This step is done using 4.MapAccuracyAssessment.py
, which implements the method described in Stehman (2014) paper at DOI: 10.1080/01431161.2014.930207. Before doing that, we need to have data from the original map that was used to create validation points (same points that was mentioned in the step 1). The sample data provided for this purpose are placed in SampleData\ValidationData
folder and includes the reference map (SampleData\ValidationData\SA_urbanCentersMap.tif
), validation points (SA_urbanValidationSet
shapefiles), reference labels obtained from the map at the validation points (SampleData\ValidationData\SampleStrata.csv
), and number of pixels for each map class (SampleData\ValidationData\PixelCounts.txt
). The SampleStrata.csv data should be added as a new column to the points validation data created in the previous step (SampleData\SA_finalModel_validation.csv
) but care should be taken to match the point IDs when inserting the strata data. A sample for the new file is placed at SampleData\ValidationData\SA_finalModel_validation_w_strata.csv
. Having this csv file and data within PixelCounts.txt
file, we can manually update the code 4.MapAccuracyAssessment.py
, specify assessment year, and run the code. It calculates map overall accuracy, and producer’s accuracy, and user’s accuracy along with their 95% confidence intervals and writes the results to a .csv
file (SampleData\ValidationData\map_assessment_results.csv
).
Shahriar Shah Heydari, Jody C. Vogeler, Orion Cardenas-Ritzert, Steven K. Filippelli, Melissa McHale, and Melinda Laituri, "Multi-tier land use and land cover mapping framework and its application in urbanization analysis in three African countries", submitted for publication in Remote Sensing journal www.mdpi.com/journal/remotesensing)