Skip to content
Giora Kosoi edited this page Jan 7, 2023 · 1 revision

Overview

OpenStreetMap is a free geographical database maintained by a community of volunteers. This database can be accessed and updated at the OSM website. The software running the website is published at openstreetmap-website.

So it is possible to install the openstreetmap-website, populate its database with a planet.osm dump and use it for internally for commercial or research purposes.

Unfortunately, due to the the large size of the data-set, available tools take long time to populate the apidb schema needed for openstreetmap-website.

osm-admin is specifically designed to deal efficiently with these large datasets.

OSM Model

OSM model defines three elements nodes, ways and relations. All elements have associated key/value tags.

PBF Structure

The OSM data is packed into *.osm.pbf files for efficient storage and transfer. The order of the entries is maintained in the *.osm.pbf file which improves farther the packing efficiency.

Design Considerations

OSM data-set for the entire planet is very large. As of this writing it has more than 7,000,000,000 nodes and around 1,000,000,000 ways. All these together with their associated tags have to be extracted from a PBF file and inserted into the database. What makes it even more difficult is that the apidb schema stores the data twice - one copy in the current tables and another copy in the historical table. For example nodes and current_nodes. This was the main driver of the osm-admin design.

Limit Memory Usage

Because the data-set s so large and is daily growing it was important to put a bound on memory usage. All the basic elements, nodes, ways and relations, are processed as they come not taking more memory than needed by the underlying pbf reader. User and changeset elements are buffered in memory with a bound and written to a file persistent BTree.

pg_restore

Loading large datasets into a database has two main issues, raw data transfer and storage and rebuilding each index on each insert. pg_restore does it for us and in addition it has a feature to insert the data in parallel jobs. osm-admin import converts the input PBF file into a pg_restore dump in a directory format and triggers the pg_restore program to do the rest.

Container Usage

Because the osm-admin program is dependent on third party software, it was decided to provide for usage by running a container only, saving the user from lengthy and conflicting installation.

Parallel Processing

The pg_restore dump is generated sequentially. It is not clear at this stage if parallelizing it will reduce the processing time. The load into the database utilizes the pg_restore parallel jobs feature.

Clone this wiki locally