Skip to content

tune module.shard.complexity #106

@missinglink

Description

@missinglink

Each geometry is split into 'shards', where each shard contains a maximum of n vertices.
This process is used to ensure that PIP operations are fast over large polygons.

The current default is 200, this results in many shards being created, as a result the entire database files can be over 50% used for the shard table.

The official sqlite3_analyzer tool can be used to display the byte usage per table/index, the following diff shows the difference between using the default setting of 200 and a setting of 2000:

.rw-r--r--@ 2.1Gi peter 11 Jul 16:22 -I  zcta.spatial.shard200.db
.rw-r--r--@ 1.8Gi peter 11 Jul 16:21 -I  zcta.spatial.shard500.db
.rw-r--r--@ 1.7Gi peter 11 Jul 16:03 -I  zcta.spatial.shard2000.db
.rw-r--r--@ 1.6Gi peter 11 Jul 16:20 -I  zcta.spatial.shard100000.db
Image

This issue is open to explore this default setting more, each dataset will have a different 'sweet spot', with datasets containing many large polygons being more impacted.

If increasing this setting is shown to have negligible impact on PIP performance then it would be ideal to increase it in order to reduce disk usage and make page caching more effective.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions