GBMI is an open project on systematising, computing, and storing individual and aggregated building form metrics, which may be useful for researchers and practitioners across multiple domains, e.g. urban data science.
This repository contains the list of hundreds of indicators from building footprints, which we have identified in our research, and the code to implement them in a database. For an overview of the entire research (method, systematic literature review, data, code, examples of analyses...), please visit the GBMI project website or the research paper published in CEUS. For the repository (i.e. ready-to-use pre-computed datasets) covering dozens of selected urban areas around the world, please visit the website.
The code is a collection of bash scripts and SQL scripts that are run to perform the database setup, data ingestion, data transformation and analysis that yields the GBMI output, and the final data export. It takes OpenStreetMap (OSM) as input, but it may be adapted for other sources of data. The aggregation of the indicators works both for administrative areas and according to a regular grid (raster).
The scope of GBMI datasets could be global, extracting building data from the planet
OSM data; or it could be focused on a selection of geographical regions such of countries,
states, or cities.
Each defined scope will be contained in its own database. The raster system (grid) is adopted from WorldPop, but it is also something that can be defined and swapped as it fits the research purpose.
The GBMI implementation requires a PostgreSQL 12 (or newer) database running on cloud services such as Amazon Web Services (AWS) RDS or a self-managed server. PostGIS and a few other extensions are also necessary for the spatial analysis. These extensions are: postgis, hstore, fuzzystrmatch, postgis_tiger_geocoder, postgis_topology.
An open access paper describing the project, and from which the description above was adopted, was published in Computers, Environment and Urban Systems. Please refer to the paper for detailed information, while this website summarises the project and provides the links to the datasets and code.
If you use GBMI in a scientific context, please cite the paper:
Biljecki F, Chow YS (2022): Global Building Morphology Indicators. Computers, Environment and Urban Systems 95: 101809. doi: 10.1016/j.compenvurbsys.2022.101809
@article{2022_ceus_gbmi,
author = {Biljecki, Filip and Chow, Yoong Shin},
doi = {10.1016/j.compenvurbsys.2022.101809},
journal = {Computers, Environment and Urban Systems},
pages = {101809},
title = {Global Building Morphology Indicators},
volume = {95},
year = {2022}
}
All aspects of the project are licenced according to CC BY 4.0. That means that you can use our work for pretty much anything as long as you attribute it (i.e. cite our paper above). The paper has been released under the same licence and it is open access.
The GBMI dataset generation process generate more than 50 tables per country per raster system. The number of tables increases when more than one raster system is implemented. Keeping analysis pipeline consistent across selected geographical scopes, many parts of the scripts are repetitive. For ease of maintenance, this repository offers a framework that generates these bash scripts and SQL queries using templates.
The Python package is developed using Jinja2 templating engine to produce bash scripts and sql queries using predefined templates. The scripts are then executed for each of the pre-defined geographical scope of studies. The package aim to achieve the following goals:
- enforce re-usable script as much as possible
- maintain consistency of analysis across all databases (each of which represents geographical scope)
- manage/apply/adopt changes with ease across all databases
The package consists of these modules:
- configurations.py: this modules reads and validates the
config.json
file - core.py: this module consists of the
QueryParamExpander
andQueryGenerator
classes. TheQueryParamExpander
expands parameters defined inconfig.json
for all combination of database, rasters and aggregation levels; whereas theQueryGenerator
calls the Jinja2 template engine API to generate all the bash scripts and query scripts - logging.py: this module consists of a
logging
class that logs and feedback on the running/output status of the python package in generating those scripts
We run the package by python main.py
. The generation of all queries and scripts takes more
than 5 mins. To speed up the process or updates and/or bug fixes in specific sections, the
python main.py
command also takes one or more section key arguments (namely a-db-setup
b-osm-rasters-gadm
c0-misc
c1-gbmi
and/or d-export
). This way, we could selectively
generate
scripts and queries for one or more specific sections.
The python package relies on the configurations in the config.json
to expand and generate bash
scripts and sql query scripts for respective database, targeting corresponding raster systems and
aggregated levels.
The following are the highlights of key configurations:
template_dirname
: the directory name where template is store, relative to root of the package
output_dirname
: the directory name where scripts are output, relative to root of the package
parameters
> common
: these are the commonly shared database/server instance related properties:
host_address
: database host addressdb_script_dir
: directory path where db setup queries are storedpublic_schema
: public schema namepublic_script_dir
: directory path where queries for generating tables in public queries are stored (generally GADM and raster related)misc_schema
: schema name for miscellaneous tablesmisc_script_dir
: directory path where queries for generating tables in miscellaneous schema are storedgbmi_schema
: schema name for gmbi tablesgbmi_script_dir
: directory path where queries for generating tables in gbmi schema are storedqa_schema
: schema name for quality analysis related tablesqa_script_dir
: directory path where queries for generating tables in quality analysis schemabase_source_dir
: base directory path where base source such as GADM and country codes are storedsite_source_dir
: base directory path where gbmi source data are stored. Currently, each geographical scope has its own sub-directory that contains the source OSM and rasterscountry_codes_dir
: directory name where country codes csv is storedcountry_codes_file
: file name of the country code filesgadm_source_dir
: directory name where gadm source files are storedgadm_source_file
: source file name of the gadm shapefilegadm_target_table
: target table when loading gadm shapefileexport_base_dir
: base directory path when exporting gbmi databases/tablesexport_script_dir
: directory path where export query templates are stored
parameters
> a-db-setup
: these are parameters that are expanded to generate database set up
scripts for each database respectively
databases
: an array of database to be created/setupusers
: an array of users/roles to be created, who are granted connect and read permissions to databases, schemas, tables and viewssuperusers
: an array of superusers to be created, who will be granted superuser privileges to create/drop databases, schemas, tables and views.
parameters
> b-osm-rasters-gadm
: these are parameters that are expanded for each database to
generate scripts that load osm, gadm and rasters, and queries that generate subsequent related
tables
osm_source_files
: an array of osm source files and its corresponding databaseraster_names
: an array of raster names and corresponding file suffix. Theraster_population
is optional, as it is only available withWorld Population
rasters.agg_levels
: an array of aggregation levels for merging the GADM areas and geoms
parameters
> c0-misc
: these are parameters that are expanded to generate queries in
miscellaneous schema, which is a pre-requisite before running the GBMI scripts
databases
: an array of databasesraster_names
: an array of raster namesagg_levels
: an array of aggregation levels applied for analyzing building height and levels
parameters
> c1-gbmi
: there are the parameters that are expanded to generate bash scripts and
queries for the GBMI pipeline
databases
: an array of databasesraster_names
: an array of rasters with optionalraster_population
andlimit_buffer
where applicable. For our research we apply the buffer limit of 50 to the global study.buffers
: an array of buffers for neighbour computationsagg_levels
: an array of aggregation levels
parameters
> d-export
: these are the parameters expanded to generate bash scripts and query
templates for GBMI data export
databases
: an array of databasesraster_names
: an array of rasters with optionalraster_population
andlimit_buffer
where applicable.agg_levels
: an array of aggregation levels
Before starting the GBMI process, please configure postgresql credentials in .pgpass
and/or
pg_hba.conf
so the scripts can be run without being prompted for password.
The scripts are organized according to the following sections:
- Database setup (a-db-setup)
- OSM loading, global administrative boundaries and raster setup (b-osm-raster-gadm)
- Misc scripts that analyze the building height and levels validity, and building tag value frequencies (c0-misc)
- GBMI tables (c1-gbmi)
- GBMI export templates and scripts (d-export)
The first step of the GBMI process is to create neccesary databases for each geographical scope, set
up schemas, users and user privileges. The bash scripts and queries under this sections are
generated based on the configurations in the config.json
.
The bash scripts and queries under this section are to perform the following tasks:
- create a database to host the openstreetmap data and install PostGIS and other necessary extensions
- install these extensions: postgis, hstore, fuzzystrmatch, postgis_tiger_geocoder, postgis_topology
- set up necessary schemas as configured in the configuration
- create new users, if specified in the configuration, that set up privileges accordingly
In this section, the scripts loads the Openstreetmap data, the GADM shapefiles and selected raster
systems from the specified source directory in config.json
.
The OSM data are downloaded from Geofabrik server. If the
scope/area of interest are not directly available, the larger area of the corresponding city/region
is downloaded and further extracted using the osmium-tool
program.
The extracted OSM is then loaded to PostGIS via osm2pgsql
command.
To map the rasters and osm data to country, province/state, city/town etc more comprehensively, we use the Global Database of Global Administrative Areas. The GADM data is available in GeoPKG and Shapefile format. The GADM data can be downloaded from here.
The shapefile is then loaded to the PostGIS database
via shp2pgsql
command.
After the GADM shapefile is loaded, we subsequently compute the agggreated area and geometry of the administrative boundaries at admin level zero (country) through 5.
In this research, we used the World Population 2020 rasters at two different resolutions: the 100m and the (aggregated globally) 1km resolutions. The geotiff files for worldpop could be downloaded from here and here.
Since the worldpop rasters are only available by country, these geotiff files are usually further extracted using QGIS to extract only the areas of interest.
After that, these rasters are loaded to the corresponding PostGIS database, based on the
configurations in config.json
,
via raster2pgsql
command.
The loaded rasters are then vectorized and mapped against the previously loaded GADM to obtain the country codes, admin division names.
There are also a few tables that are helpful to inspect the quality and state of the OSM building data before proceeding to the GBMI data. These tables are:
osm_polygon_attr_freqs
: this shows the frequency of values of building (osm_polygon) related tags. This is a pre-requisite table before started the GBMI pipeline.agg_buildings_height_levels_qa_by_agg_level_raster_name
: this shows aggregated statistics of completeness of building height and building levels
This section is the bash scripts and queries that create the building morphology indicator tables. The workflow and steps of computing the indicators are as follows:
Building Level Indicators:
buildings
: extract mass majority of the buildings from OSM polygon table based on 12 most frequent 'building' tag valuebuilding_by_raster
: map buildings to each rasters we use in the system so we identify the country, province, state and city of the buildingsbga_by_raster
: extract the geometric attributes that are needed for calculating the geometric building indicators frombuilding_by_raster
tablesbgi_by_raster
: calculate the geometric building indicators frombuidling_geom_attributes_by_rasster
tablesbn_by_raster
: extract neighbours for each of the buildings within the maximum buffer of our study defined inconfig.json
. By definition of neighbours, the neighbour centroid has to fall within the ring buffer computed from the centroid of each respective buildings. ThenST_distance
is used to compute distance between polygon to polygon.bn_by_raster_centroid
: extract neighbours for each of the buildings within the maximum buffer of our study defined inconfig.json
. By definition of neighbours, the neighbour centroid has to fall within the ring buffer computed from the centroid of each respective buildings. ThenST_distance
is used to compute distance between centroid to centroid.bn_buffer_by_raster
: calculate the neighbour indicators for each building within 3 different buffers: 25, 50 and 100, usingbn_by_raster
table.bn_buffer_by_raster_centroid
: calculate the neighbour indicators for each building within 3 different buffers: 25, 50 and 100, usingbn_by_raster_centroid
table.bni_by_raster
: joint table of thebn_buffer_by_raster
at 3 different buffersbni_by_raster_centroid
: joint table of thebn_buffer_by_raster_centroid
at 3 different buffersbuildings_indicators_by_raster
: joint table ofbgi_by_raster
andbni_by_raster
buildings_indicators_by_raster_centroid
: joint table ofbgi_by_raster
andbni_by_raster_centroid
.
Aggregated Building Indicators:
The building indicators are aggregated at 5 GADM levels and at raster cell levels. Thus the aggregation levels are namely:
- raster cell
- country
- province or state (admin division 1)
- county or district (admin division 2)
- town or city (admin division 3)
- urban commune or municipals (admin division 4)
- admin division 5
Three types of aggregated tables will be generated:
agg_bgi_by_agg_level_by_raster
: this is aggregation of the geometric building indicators frombgi_by_raster
tablesagg_bni_agg_level_by_raster
: these tables combined the aggregation of the geometric building indicators and building neighbour indicators frombni_by_raster_name
tablesagg_bni_agg_level_by_raster_centroid
: these tables combined the aggregation of the geometric building indicators and building neighbour indicators frombni_by_raster_name_centroid
tables
With the configuration in config.json
of the target scope/datasets to be exported, the python
script also generates a list of bash scripts, one for each dataset, to export the GBMI indicators.
The current export process supports ESRI Shapefiles shp
, GeoPackage gpkg
, Comma Separated
Values csv
.
The exported files are organized under the following directory hierarchic structure:
- {city}
- {aggregation-level}
- agg_bgi_by_{agg_level}_{raster_name}
- agg_bni_by_{agg_level}_{raster_name}
- agg_bni_by_{agg_level}_{raster_name}_centroid
- bldg
- buildings_indicators_{raster_name}
- buildings_indicators_{raster_name}_centroid
- {aggregation-level}
The export of various format are stored under the {aggregation-level}
folder.
The computed GBMI is a dataset with more than 380 fields. The export functions also mandate abbreviated fieldnames. Thus, the following is the data dictionary for all the exported fields.
Yoong Shin Chow, Urban Analytics Lab, National University of Singapore, Singapore
GBMI is made possible by the efforts of many others, primarily developers of PostgreSQL/PostGIS and the OpenStreetMap community.
This research is part of the project Large-scale 3D Geospatial Data for Urban Analytics, which is supported by the National University of Singapore under the Start Up Grant R-295-000-171-133 and by the AWS Cloud Credits for Research.
For more information, please see the aforementioned paper.