-
Notifications
You must be signed in to change notification settings - Fork 119
Architecture
This is a description of the Tilezen architecture as it was running at Mapzen in production. Of note is that this is meant to capture the various systems as they were running before rawr tiles and global builds.
The system that emerged reflects the choices and trade-offs that were made given our goals. Typically, the variables that affect most of the decisions are acceptable latencies, both typical and worst case, data freshness, and cost. We wanted to serve tiles that were usually just a few hours out of date, with most of the service being reasonably fast, while it being acceptable for a small percentage of tiles to be slow to render.
The way we chose to balance the goals was to create a concept called the toi, or tiles of interest. This represented a set of tiles that should be fast, and is the list of tiles that were most requested in a particular time window. This is the set of tiles that was pre-generated. Requests for tiles outside of this set would be served live.
Requests are satisfied by first trying a cache, then if the tile is pre-generated on S3, and finally it is rendered on-demand if necessary. fastly
would run this logic, and the tapalactl
service would return a 404 if the tile didn't exist, which would signal to fastly
that the request should be directed to tileserver
to generate the tile on-demand using a custom VCL script.
osm2pgql
applies OpenStreetMap planet file diffs to the database, and it would generate a tile expiry list. tilequeue
(tilequeue intersect
command) would read this list, and the tiles that were in the TOI (tiles of interest list) would get enqueued onto Amazon's SQS for processing. A number of instances running tilequeue process
would read tiles from SQS, generate them and store them onto Amazon's S3.
To manage the TOI, a "gardener" process (tilequeue prune-tiles-of-interest
command), would run periodically to add and remove tiles from the TOI based on a rolling window of requests (usually 2 months average service usage). New tiles would get enqueued for processing, and tiles to be removed would get deleted from s3.
Low zoom tile content doesn't change much so keep it around longer, and prefer to update "max zoom" 15 and 16 tiles at a faster cadence to create a virtuous cycle of edit in OSM.org, go-live in vector tiles, review, and iterate. Zoom 17+ are excessive but popular for generic low-value uses so keep them around longest in cache.
ttls:
"0-10":
ttl: 12h
max-age: 43200
grace: 13h
"11-12":
ttl: 8h
max-age: 43200
grace: 9h
"13-14":
ttl: 4h
max-age: 43200
grace: 5h
"15-16":
ttl: 2h
max-age: 43200
grace: 3h
"17-20":
ttl: 1w
max-age: 604800
grace: 2w
default:
ttl: 4h
max-age: 43200
grace: 5h