Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace logic in OsmPbfLoader from osmium to QuackOSM #405

Merged
merged 163 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
d0fbd70
chore: add duckdb dependency
RaczeQ Nov 14, 2023
a45754a
feat: add first working pipeline
RaczeQ Nov 18, 2023
238682b
feat: add geoarrow-python dependency
RaczeQ Nov 20, 2023
fef8fbf
feat: modify tests and add geoparquet functionality
RaczeQ Nov 20, 2023
784ae19
fix(pre-commit.ci): auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2023
ea2e138
chore: remove osm pbf files clipping
RaczeQ Nov 20, 2023
3b14eac
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Nov 20, 2023
991c37b
chore: add geoarrow-python dependency
RaczeQ Nov 20, 2023
2560fda
fix: modify tests
RaczeQ Nov 21, 2023
eaab0ac
chore: remove prints
RaczeQ Nov 21, 2023
a5960fa
chore: add duckdb connection closing
RaczeQ Nov 21, 2023
c5cb5cc
feat: scale up PbfFileHandler for big files
RaczeQ Nov 28, 2023
73462e2
chore: apply refurb suggestions
RaczeQ Nov 28, 2023
c043099
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Nov 28, 2023
5d3e471
fix: change sql query for ways intersections
RaczeQ Nov 28, 2023
f41aa78
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Nov 28, 2023
6083e1d
chore: add debugging message for tests
RaczeQ Nov 28, 2023
0a89035
chore: add debugging message for tests
RaczeQ Nov 28, 2023
dab044f
fix: change empty required ways case
RaczeQ Nov 28, 2023
806b2da
chore: remove debugging message from tests
RaczeQ Nov 28, 2023
b12cf6f
chore: add in-sql features grouping
RaczeQ Nov 28, 2023
026f257
chore: modify docstrings
RaczeQ Nov 28, 2023
b5f8eeb
chore: add automatic directories removal
RaczeQ Nov 28, 2023
ab841eb
chore: modify docstrings
RaczeQ Nov 28, 2023
d9e09b8
chore: remove comments from SQL
RaczeQ Nov 28, 2023
9ccae64
chore: add changelog entry
RaczeQ Nov 28, 2023
15421a9
fix: change optional imports and directory removal
RaczeQ Nov 29, 2023
a97b8ec
chore: change default download source from protomaps to geofabrik
RaczeQ Nov 29, 2023
1c3ba65
chore: add option to load data to geoparquet only
RaczeQ Nov 29, 2023
3e4523b
chore: change default download source from protomaps to geofabrik
RaczeQ Nov 29, 2023
7c3f3f6
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Nov 29, 2023
7f1812d
fix(pre-commit.ci): auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 29, 2023
f5652cd
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Nov 29, 2023
f63d327
chore: lock geoarrow-pyarrow commit ref
RaczeQ Nov 30, 2023
12bf153
refactor: simplify the osm pbf loader code
RaczeQ Nov 30, 2023
41c7078
feat: add more options to merge OsmTagsFilters
RaczeQ Dec 2, 2023
0e352e6
chore: removed PbfFileClipper
RaczeQ Dec 4, 2023
0f8b666
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Dec 4, 2023
6589068
chore: add explode_tags parameter
RaczeQ Dec 17, 2023
5cc6372
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Dec 17, 2023
03bc05b
chore: add pyogrio for testing osm reader
RaczeQ Dec 17, 2023
d5f76f9
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Dec 17, 2023
8f31b89
chore: update pyarrow and ruff versions
RaczeQ Dec 18, 2023
a7ec353
feat: add logic for explode_tags parameter
RaczeQ Dec 18, 2023
7d0e28c
fix(pre-commit.ci): auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2023
baa7cdb
chore: add new parameter to tests
RaczeQ Dec 18, 2023
4ad0575
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Dec 18, 2023
bd9094d
chore: lock pyarrow minimal version
RaczeQ Dec 18, 2023
6c865e4
fix: change explode_tags value in tests
RaczeQ Dec 18, 2023
4d2f654
chore: add osmconf.ini file to test files
RaczeQ Dec 18, 2023
ab1a6c4
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Dec 18, 2023
cb43c7c
feat: add metadata tags filtering
RaczeQ Dec 18, 2023
1a1f7b9
chore: add pghstore for testing purposes
RaczeQ Dec 18, 2023
9b88469
chore: change osmconf file
RaczeQ Dec 18, 2023
d6eb207
chore: change osmconf file
RaczeQ Dec 18, 2023
f7dc198
test: add gdal testing with geometries
RaczeQ Dec 18, 2023
b7a91ac
chore: paste way polygon features configs
RaczeQ Dec 19, 2023
ee9a9ba
feat: add dynamic osm way polygon filter generation
RaczeQ Dec 20, 2023
243177c
chore: add first tests for gdal parity
RaczeQ Dec 20, 2023
cdb3922
chore: apply refurb changes
RaczeQ Dec 20, 2023
e553cd7
chore: apply refurb changes
RaczeQ Dec 20, 2023
1270de5
chore: change pghstore source
RaczeQ Dec 20, 2023
bddba85
chore: remove pghstore dependency
RaczeQ Dec 20, 2023
7e183a0
chore: change gdal loading logic
RaczeQ Dec 20, 2023
ce40eae
chore: change tests logic with ogr2ogr
RaczeQ Dec 21, 2023
b14dfc9
chore: add option to check all geometries during single test
RaczeQ Dec 21, 2023
7e238bb
chore: add GDAL installation step
RaczeQ Dec 21, 2023
209c8d8
chore: change GDAL installation step
RaczeQ Dec 21, 2023
effe3f5
chore: change GDAL installation step
RaczeQ Dec 21, 2023
dd110de
chore: change GDAL installation step
RaczeQ Dec 21, 2023
f0da70a
chore: change GDAL installation step
RaczeQ Dec 21, 2023
13ed2e9
chore: change GDAL installation step
RaczeQ Dec 21, 2023
e0f67a1
chore: add GDAL installation step
RaczeQ Dec 21, 2023
20e2432
chore: change GDAL installation step
RaczeQ Dec 21, 2023
f2ad317
chore: change GDAL installation step
RaczeQ Dec 21, 2023
871bfe3
chore: change GDAL installation step
RaczeQ Dec 21, 2023
cac7241
chore: change GDAL installation step
RaczeQ Dec 21, 2023
e718fe1
chore: change GDAL installation step
RaczeQ Dec 21, 2023
0ce1658
chore: add GDAL version checking
RaczeQ Dec 21, 2023
8f6a332
chore: add GDAL version checking
RaczeQ Dec 21, 2023
8089980
chore: add GDAL version checking
RaczeQ Dec 21, 2023
684fe90
chore: add GDAL version checking
RaczeQ Dec 21, 2023
935541d
chore: add GDAL version checking
RaczeQ Dec 21, 2023
f1bdef6
chore: add GDAL version checking
RaczeQ Dec 21, 2023
a95ed2d
chore: add GDAL version checking
RaczeQ Dec 21, 2023
df08a05
chore: add GDAL version checking
RaczeQ Dec 21, 2023
a0931c5
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
08a6f3a
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
c69f35a
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
cdc72bb
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
05f43a8
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
74d3aac
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
be64c54
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
87d36f3
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
d3c6d23
chore: add ogr2ogr to path
RaczeQ Dec 21, 2023
3768260
fix: change ogr2ogr execution on windows
RaczeQ Dec 21, 2023
830bff7
chore: add skipping if ogr2ogr is not found
RaczeQ Dec 21, 2023
8399560
chore: remove prints and echos
RaczeQ Dec 21, 2023
6301f3e
fix: change job config
RaczeQ Dec 21, 2023
a22ece4
fix: change retry command
RaczeQ Dec 21, 2023
8a5d878
fix: change timeout config
RaczeQ Dec 21, 2023
7b913d2
fix: add required config value
RaczeQ Dec 21, 2023
83d2207
chore: change timeout length
RaczeQ Dec 21, 2023
f1351bd
chore: revert retry for gdal installation
RaczeQ Dec 21, 2023
768eceb
chore: change tests
RaczeQ Dec 21, 2023
1930d60
chore: fix duckdb way geometries
RaczeQ Dec 21, 2023
f9aa607
chore: modify gdal parity test
RaczeQ Dec 21, 2023
0ef257d
fix: skip invalid relations from evaluation
RaczeQ Dec 21, 2023
770a393
chore: start changing relations parsing
RaczeQ Dec 22, 2023
9da8e92
chore: save todo note
RaczeQ Dec 22, 2023
f3e0408
fix: change geometries difference logic
RaczeQ Dec 23, 2023
ae5fc1c
chore: cut test examples
RaczeQ Dec 23, 2023
be47403
fix: change pbf_reader logic
RaczeQ Dec 23, 2023
9005ece
chore: speed up sql left join
RaczeQ Dec 23, 2023
d960202
feat: add logic to parse relations without outer geometries
RaczeQ Dec 23, 2023
f03f112
feat: update remove_interiors function
RaczeQ Dec 23, 2023
bd755b2
chore: modify geometry checking flow
RaczeQ Dec 23, 2023
6e08f60
chore: add new logic to geometry checking logic
RaczeQ Dec 23, 2023
8fdcce0
chore: extract comparation logic to another function
RaczeQ Dec 23, 2023
aab80c8
chore: remove tqdm progress bar
RaczeQ Dec 23, 2023
752cf80
fix: change comapration logic
RaczeQ Dec 23, 2023
79bc092
chore: break relations geometry building into steps
RaczeQ Dec 24, 2023
a70b2c8
chore: add more in-depth tags comparation
RaczeQ Dec 26, 2023
3ed2558
chore; modify hstore parsing
RaczeQ Dec 26, 2023
ab18fbb
chore: add check for valid geometries in relations
RaczeQ Dec 26, 2023
0d5b57a
chore: remove same_number_of_points condition
RaczeQ Dec 27, 2023
fcd3a65
chore: vectorize pandas operations for tags
RaczeQ Dec 27, 2023
871690c
chore: merge branch 'duckdb_osm_native_reader' of https://github.com/…
RaczeQ Dec 27, 2023
20d4a6d
fix: change pandas bool condition
RaczeQ Dec 27, 2023
6363e96
fix: change tags comparison
RaczeQ Dec 27, 2023
acea5c4
fix: change tags comparison
RaczeQ Dec 27, 2023
6c3eb0d
chore: vectorize geometry comparison
RaczeQ Dec 27, 2023
debaedb
chore: change few examples for gdal parity test
RaczeQ Dec 27, 2023
cf9d3b9
chore: upload monaco pbf file
RaczeQ Dec 27, 2023
8a022cd
chore: change test values for a new monaco extract
RaczeQ Dec 27, 2023
6560eee
chore: change monaco osmpbfloader tests
RaczeQ Dec 27, 2023
0eea4fe
chore: change tests and add new ignore_cache flag
RaczeQ Dec 27, 2023
5980a04
chore: remove testing notebooks
RaczeQ Dec 27, 2023
e3c68ab
chore: add geometry fixing for polygons in relations
RaczeQ Dec 27, 2023
2830a81
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Jan 3, 2024
c1666c4
feat: add quackosm dependency
RaczeQ Jan 3, 2024
3d5df3d
Merge branch 'duckdb_osm_native_reader' of https://github.com/kraina-…
RaczeQ Jan 3, 2024
0dff419
feat: replace PbfFileReader with QuackOSM implementation
RaczeQ Jan 3, 2024
244ec4b
chore: remove osmconf.ini
RaczeQ Jan 3, 2024
28c1801
chore: merge branch 'main' of https://github.com/kraina-ai/srai into …
RaczeQ Jan 3, 2024
6914b92
chore: remove gdal installation
RaczeQ Jan 3, 2024
9a5fe3b
chore: change osm pbf loader example notebook
RaczeQ Jan 3, 2024
fbf8704
chore: remove osm way polygon config
RaczeQ Jan 4, 2024
c47a7ad
chore: change docstring
RaczeQ Jan 4, 2024
8797ef6
chore: change optional imports
RaczeQ Jan 4, 2024
30ffde5
chore: bumped QuackOSM version
RaczeQ Jan 10, 2024
2836d2c
refactor: removed pbf related classes from srai
RaczeQ Feb 1, 2024
fbaf865
chore: bumped quackosm version and moved osmnx to main dependencies
RaczeQ Feb 1, 2024
e048c61
chore: change default download source for pbf files
RaczeQ Feb 1, 2024
5d2d983
fix: change osm pbf loader example
RaczeQ Feb 1, 2024
e98d4f0
chore: modified changelog entries
RaczeQ Feb 1, 2024
973e434
fix: add new error from osmnx
RaczeQ Feb 1, 2024
87db89e
refactor: add OsmExtractSource typing
RaczeQ Feb 2, 2024
35fb94c
chore: add geoparquet related tests
RaczeQ Feb 2, 2024
649dfec
chore: change filters for test with geoparquet osm pbf
RaczeQ Feb 2, 2024
0966687
chore: bump quackosm version
RaczeQ Feb 2, 2024
af5f50b
Merge branch 'main' into duckdb_osm_native_reader
RaczeQ Feb 2, 2024
98aa6b5
fix: typo in changelog
RaczeQ Feb 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Refactored `PbfFileHandler` to use `DuckDB` with `spatial` extension instead of `osmium` and `GDAL` [#405](https://github.com/kraina-ai/srai/pull/405)
- Changed the default pbf download source from `protomaps` download service to `geofabrik`.

### Deprecated

### Removed
Expand Down
192 changes: 192 additions & 0 deletions examples/loaders/osm_pbf_loader.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the need for duckdb examples. I understand the part with loading data into geoparquet, but not the next things. I'm not saying we should drop them completely, just wondering if we should expect our users to use raw DuckDB and have an example of that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to emphasize the possibility of using srai as the tool for just parsing the *.osm.pbf file to geoparquet in big scale, since it's a big use case in the current cloud computing. Do you think we should add a new example notebook just for that, focused solely on the PbfFileHandler?

Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,198 @@
"\n",
"ax.set_axis_off()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using OSMPbfLoader to download data for a specific area and transforming it to GeoParquet file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download all grouped features based on Geofabrik layers in Reykjavík, Iceland"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = OSMPbfLoader()\n",
"reykjavik_gdf = geocode_to_region_gdf(\"Reykjavík, IS\")\n",
"reykjavik_features_gpq = loader.load_to_geoparquet(reykjavik_gdf, GEOFABRIK_LAYERS)\n",
"reykjavik_features_gpq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read those features using DuckDB"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import duckdb\n",
"\n",
"connection = duckdb.connect()\n",
"\n",
"connection.load_extension(\"parquet\")\n",
"connection.load_extension(\"spatial\")\n",
"\n",
"features_relation = connection.read_parquet([str(path) for path in reykjavik_features_gpq]).project(\n",
" \"* REPLACE (ST_GeomFromWKB(geometry) AS geometry)\"\n",
")\n",
"features_relation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count all buildings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"features_relation.filter(\"buildings IS NOT NULL\").count(\"feature_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download main roads for Estonia using `PbfFileHandler`\n",
"\n",
"`PbfFileHandler` is a special class dedicated for reading `*.osm.pbf` files and is used by `OSMPbfLoader` to load features.\n",
"It allows for parsing `pbf` files without any tags or geometry filtering, that is automatically applied by `OSMPbfLoader`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"highways_filter = {\n",
" \"highway\": [\n",
" \"motorway\",\n",
" \"trunk\",\n",
" \"primary\",\n",
" \"secondary\",\n",
" \"tertiary\",\n",
" ]\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from srai.loaders import download_file\n",
"from srai.loaders.osm_loaders.pbf_file_handler import PbfFileHandler\n",
"\n",
"estonia_pbf_url = \"http://download.geofabrik.de/europe/estonia-latest.osm.pbf\"\n",
"estonia_pbf_file = \"estonia-latest.osm.pbf\"\n",
"download_file(estonia_pbf_url, estonia_pbf_file, force_download=False)\n",
"\n",
"handler = PbfFileHandler(\n",
" geometry_filter=None, tags_filter=highways_filter\n",
") # parsing pbf file without tags filtering requires a lot of memory available in the system\n",
"estonia_features_gpq = handler.convert_pbf_to_gpq(estonia_pbf_file)\n",
"estonia_features_gpq"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"features_relation = connection.read_parquet(str(estonia_features_gpq)).project(\n",
" \"* REPLACE (ST_GeomFromWKB(geometry) AS geometry)\"\n",
")\n",
"features_relation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count loaded roads"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"features_relation.count(\"feature_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate roads length\n",
"We will transform the geometries to the Estonian CRS - [EPSG:3301](https://epsg.io/3301)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"length_in_meters = (\n",
" features_relation.project(\n",
" \"ST_Length(ST_Transform(geometry, 'EPSG:4326', 'EPSG:3301')) AS road_length\"\n",
" )\n",
" .sum(\"road_length\")\n",
" .fetchone()[0]\n",
")\n",
"length_in_km = length_in_meters / 1000\n",
"length_in_km"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot the roads using GeoPandas\n",
"\n",
"With fast loading of geoparuqet files using `geoarrow.pyarrow` library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import geoarrow.pyarrow as ga\n",
"from geoarrow.pyarrow import io\n",
"\n",
"from srai.constants import GEOMETRY_COLUMN\n",
"\n",
"parquet_table = io.read_geoparquet_table(estonia_features_gpq)\n",
"ga.to_geopandas(parquet_table.column(GEOMETRY_COLUMN)).plot()"
]
}
],
"metadata": {
Expand Down
Loading
Loading