Procedures for data extraction, analysis, and visualization for the paper
Grinberger, A. Y., Schott, M., Raifer, M. & Zipf, A. (2021) An analysis of the spatial and temporal distribution of large‐scale data production events in OpenStreetMap. In: Transactions in GIS https://doi.org/10.1111/tgis.12746
This repository consists of three sections:
-
A Java OSHDB query which extracts various data regarding OpenStreetMap edits at a monthly temporal resolution for each cell within a custom grid. This can be found under the Extraction folder. The output of this procedure is a csv file named 'months_results', stored under Extraction\Target, which includes the data for each cell and month combination (a zipped version of this file is stored under Outputs). For more details see the README file for this section.
-
A procedure, written in R (see fit_curve.R stored within the CurveFitting folder), for fitting logistic curves to the data available for each cell within the 'months_results' file. This procedure produces a unique file for each cell within the custom grid, stored within the Predictions subdirectory of the Outputs folder. These files contain the observed (number of contribution operations) and expected values for each month, later used to analyze patterns and produce outputs.
-
A collection of Python scripts using the 'months_results' file and the files in the predictions folder to produce the outputs and visualizations used in the paper (stored under ProduceOutputs). The main script is process_events.py, which calls functions for:
- Identifying events, computing normalized RMSE values per cell, and visualizing these (identify_events.py)
- Correcting the product of the above by identifying outliers (correct_events.py)
- Creating a boxplot figure describing the distribution of cells by events frequencies and sizes (boxplot_figure.py)
- Identifying clusters of events and labeling them (cluster.py)
- Creating a table characterizing each cluster (events_table.py)
- Creating a table analyzing the share of all contribution operations attributed to each type of event (weights_table.py)
- Creating a figure depicting the change over time in the share of operations attributed to each event type out of all operations (time_figure.py)
- Creating a figure depicting the effects of events - the change in activity following an event relative to non-event periods (event_effects.py)
- Creating a table depicting the frequency with which the first event in a cell is a followed by events of all type, by type of the first event (next_events.py)
This main script also processes and cleans the data, and produces a file containing the weight of each event type for each cell, later used to produce static and dynamic online maps. The outputs of all of the above are stored under the Outputs folder.
The branch gh-pages
contains an interactive visualization of the detected events, which can be accessed at the following link: https://giscience.github.io/OSM_Events/