Description
Add a module for working with geodata in DataFrame.
Essentially all geoformats store a dataframe of geometric objects (that is, regular objects with data, but also with their own geometry, which is defined in one way or another). Thus, it is natural to read geodata into a DataFrame - simple, linear (possibly with nested columns) with geometry column. We also need to do some transformations of this data/make calculations on it (see examples below). Working name - GeoDataFrame
Formats
To starts, let's support the two most popular data formats (rest, if necessary, will be easy to support):
GeoJSON is more prioritized because it is more modern, easier to share (unlike shpafile it is textual and also comes in one file). Nevertheless shapefile is definitely worth supporting as it is widely used (and if we use geotool, it will cost nothing).
Possible API’s:
GeoDataFrame.read("country.json"): GeoDataFrame
GeoDataFrame.read("country.geojson"): GeoDataFrame
GeoDataFrame.read("country.geo.json"): GeoDataFrame
GeoDataFrame.readGeoJSON("country.json"): GeoDataFrame
GeoDataFrame.readGeoJSON("country.geojson"): GeoDataFrame
GeoDataFrame.readGeoJSON("country.geo.json"): GeoDataFrame
GeoDataFrame.readJSON("country.json"): GeoDataFrame
GeoDataFrame.readJSON("country.geojson"): GeoDataFrame
GeoDataFrame.readJSON("country.geo.json"): GeoDataFrame
GeoDataFrame.read("country.shp"): GeoDataFrame
GeoDataFrame.readShapefile("country.shp"): GeoDataFrame
Reading result:
Reading following geojson should leads to DataFrame with the following schema:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "MultyPloygon",
"coordinates": [102.0, 0.5]
},
"properties": {
"name": "Netherlands",
"gdp": {
"in_2021": 34121.1,
"in_2022": 65463.1
}
}
},
...
| name | gdp | geometry |
| |-------------------| |
| | in_2021 | in_2022 | |
---------------------------------------
name: String
gdp:
in_2021: Int
in_2022: Int
geometry: Geometry
Implementation:
Reading most popular types (including geojson and shapefile) is supported in geotools. The internal representation in geotools is similar to a DataFrame, so converting it to a GeoDataFrame should not be difficult.
Geometry
Further, to work with geometry we need the type - Geometry
. And here it seems the only correct solution is to take the JTS library. This is a standard solution not only in Java, but even in C++ and Python. Moreover, it is used in GeoTools, so if we use geotools for data read implementation, we don't need to do any additional conversions.
Processing
Example of possible actions with GeoDataFrame
:
- Filter geometries in bounds
- Move something (for example, move Iceland to plot more pretty Europe map)
- Transform coordinate system