Skip to content

Commit ac050b4

Browse files
committed
📝 Add file formats for geodata
1 parent d393491 commit ac050b4

File tree

1 file changed

+108
-0
lines changed

1 file changed

+108
-0
lines changed

docs/data-processing/geodata.rst

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,110 @@
55
Geodata
66
=======
77

8+
File formats
9+
------------
10+
11+
.. _pmtiles:
12+
13+
PMTiles
14+
~~~~~~~
15+
16+
`PMTiles <https://docs.protomaps.com>`_ is a general format for tile data
17+
addressed by Z/X/Y coordinates. This can be cartographic vector tiles,
18+
:ref:`remote sensing data <remote-sensing>`, JPEG images or similar.
19+
20+
`HTTP Range Requests
21+
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests>`_ are used
22+
for reading in order to retrieve only the relevant tiles or metadata within a
23+
PMTiles archive. The arrangement of tiles and directories is designed to
24+
minimise the number of requests when moving and zooming.
25+
26+
However, PMTiles is a read-only format: it is not possible to update part of the
27+
archive without rewriting the entire file. If you need transactional updates,
28+
you should use a database such as SQLite or :doc:`postgresql/postgis/index` and
29+
`ST_asMVT <https://postgis.net/docs/ST_AsMVT.html>`_.
30+
31+
.. seealso::
32+
* `GitHub Repository <https://github.com/protomaps/PMTiles>`_
33+
* `PMTiles Version 3 Specification
34+
<https://github.com/protomaps/PMTiles/blob/main/spec/v3/spec.md>`_
35+
36+
Mapbox Vector Tiles (MVT)
37+
~~~~~~~~~~~~~~~~~~~~~~~~~
38+
39+
The `Mapbox Vector Tiles
40+
<https://docs.mapbox.com/data/tilesets/guides/vector-tiles-standards/>`_ file
41+
format stores each tile in a directory tree like :file:`/Z/X/Y.mvt`. This works
42+
well for small tile sets, but updating an entire global pyramid of ~300 million
43+
tiles is very inefficient. :ref:`pmtiles`, on the other hand, is a single file
44+
with tiles de-duplicated, reducing the size of global vector basemaps by ~70%.
45+
46+
For writing, the :ref:`gdal` library with `SQLite <https://www.sqlite.org>`_ and
47+
`GEOS <https://libgeos.org>`_ support must be installed. The :ref:`mbtiles` are
48+
stored in SQLite like mbtiles and can be processed with the MBTiles driver.
49+
50+
.. seealso::
51+
* `Mapbox Vector Tile specification
52+
<https://github.com/mapbox/vector-tile-spec>`_
53+
* `MVT: Mapbox Vector Tiles
54+
<https://gdal.org/en/stable/drivers/vector/mvt.html>`_
55+
56+
.. _mbtiles:
57+
58+
MBTiles
59+
~~~~~~~
60+
61+
`MBTiles <https://docs.mapbox.com/help/glossary/mbtiles/>`_ is a container
62+
format for tile data based on SQLite. It is optimised for local access, not for
63+
access via HTTP like :ref:`pmtiles`.
64+
65+
.. seealso::
66+
* `MBTiles specification <https://github.com/mapbox/mbtiles-spec>`_
67+
68+
.. _geodata-repositories:
69+
70+
Cloud Optimized GeoTIFF (COG)
71+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72+
73+
`Cloud Optimized GeoTIFF <https://cogeo.org>`_ is a raster TIFF file that, like
74+
:ref:`pmtiles`, is optimised for reading from a cloud storage. :ref:`pmtiles`
75+
can also deliver other tile data, for example vector tiles. However, COG is
76+
backwards compatible with most GIS programmes that work with GeoTIFF.
77+
78+
.. seealso::
79+
* `OGC Cloud Optimized GeoTIFF Standard
80+
<https://docs.ogc.org/is/21-026/21-026.html>`_
81+
82+
.. _geoparquet:
83+
84+
GeoParquet
85+
~~~~~~~~~~
86+
87+
`Parquet <https://parquet.apache.org>`_ is an open-source, column-orientated
88+
data file format that was developed for the efficient storage and retrieval of
89+
data. It offers efficient data compression and encoding methods with optimised
90+
processing of large, complex data. `GeoParquet <https://geoparquet.org>`_
91+
extends Parquet with interoperable geodata types (point, line, polygon).
92+
93+
94+
* :doc:`pyviz:matplotlib/geopandas/index` supports the `reading
95+
<https://geopandas.org/en/stable/docs/reference/api/geopandas.read_parquet.html>`_
96+
and `writing
97+
<https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_parquet.html>`_
98+
of GeoParquet.
99+
* `GeoParquet Downloader Plugin
100+
<https://plugins.qgis.org/plugins/qgis_plugin_gpq_downloader/>`_ for `QGIS
101+
<https://qgis.org>`_ enables streaming downloads of large GeoParquet datasets.
102+
* `DuckDB <https://duckdb.org>`_ allows the reading and writing of GeoParquet
103+
files with the `Spatial Extension
104+
<https://duckdb.org/docs/stable/extensions/spatial/overview.html>`_.
105+
106+
.. seealso::
107+
* `GeoParquet specification <https://github.com/opengeospatial/geoparquet>`_
108+
* `GeoParquet Software <https://geoparquet.org/#implementations>`_
109+
* `validate_geoparquet.py
110+
<https://github.com/OSGeo/gdal/blob/master/swig/python/gdal-utils/osgeo_utils/samples/validate_geoparquet.py>`_
111+
8112
.. _geodata-repositories:
9113

10114
Data repositories
@@ -30,6 +134,8 @@ Software
30134
Reading and writing
31135
~~~~~~~~~~~~~~~~~~~
32136

137+
.. _gdal:
138+
33139
`Geospatial Data Abstraction Library (GDAL) <https://gdal.org/en/latest/>`_
34140
provides a low-level but more powerful API for reading and writing hundreds
35141
of data formats.
@@ -137,6 +243,8 @@ Reading and writing
137243
.. seealso::
138244
:ref:`geo-wrappers`
139245

246+
.. _remote-sensing:
247+
140248
Remote sensing
141249
~~~~~~~~~~~~~~
142250

0 commit comments

Comments
 (0)