-
-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Each geometry is split into 'shards', where each shard contains a maximum of n vertices.
This process is used to ensure that PIP operations are fast over large polygons.
The current default is 200, this results in many shards being created, as a result the entire database files can be over 50% used for the shard table.
The official sqlite3_analyzer tool can be used to display the byte usage per table/index, the following diff shows the difference between using the default setting of 200 and a setting of 2000:
.rw-r--r--@ 2.1Gi peter 11 Jul 16:22 -I zcta.spatial.shard200.db
.rw-r--r--@ 1.8Gi peter 11 Jul 16:21 -I zcta.spatial.shard500.db
.rw-r--r--@ 1.7Gi peter 11 Jul 16:03 -I zcta.spatial.shard2000.db
.rw-r--r--@ 1.6Gi peter 11 Jul 16:20 -I zcta.spatial.shard100000.db
This issue is open to explore this default setting more, each dataset will have a different 'sweet spot', with datasets containing many large polygons being more impacted.
If increasing this setting is shown to have negligible impact on PIP performance then it would be ideal to increase it in order to reduce disk usage and make page caching more effective.