-
Notifications
You must be signed in to change notification settings - Fork 0
Home
OpenStreetMap is a free geographical database maintained by a community of volunteers. This database can be accessed and updated at the OSM website. The software running the website is published at openstreetmap-website.
So it is possible to install the openstreetmap-website, populate its database with a planet.osm dump and use it for internally for commercial or research purposes.
Unfortunately, due to the the large size of the data-set, available tools take long time to populate the apidb
schema needed for openstreetmap-website.
osm-admin
is specifically designed to deal efficiently with these large datasets.
OSM model defines three elements nodes, ways and relations. All elements have associated key/value tags.
The OSM data is packed into *.osm.pbf files for efficient storage and transfer. The order of the entries is maintained in the *.osm.pbf file which improves farther the packing efficiency.
OSM data-set for the entire planet is very large. As of this writing it has more than 7,000,000,000 nodes and around 1,000,000,000 ways. All these together with their associated tags have to be extracted from a PBF file and inserted into the database. What makes it even more difficult is that the apidb
schema stores the data twice - one copy in the current tables and another copy in the historical table. For example nodes
and current_nodes
. This was the main driver of the osm-admin
design.
Because the data-set s so large and is daily growing it was important to put a bound on memory usage. All the basic elements, nodes, ways and relations, are processed as they come not taking more memory than needed by the underlying pbf reader. User and changeset elements are buffered in memory with a bound and written to a file persistent BTree.
Loading large datasets into a database has two main issues, raw data transfer and storage and rebuilding each index on each insert. pg_restore
does it for us and in addition it has a feature to insert the data in parallel jobs. osm-admin import
converts the input PBF file into a pg_restore
dump in a directory format and triggers the pg_restore
program to do the rest.
Because the osm-admin
program is dependent on third party software, it was decided to provide for usage by running a container only, saving the user from lengthy and conflicting installation.
The pg_restore
dump is generated sequentially. It is not clear at this stage if parallelizing it will reduce the processing time. The load into the database utilizes the pg_restore
parallel jobs feature.