This is a port of tilezen/vector-datasource developed by Mapzen. It converts Open Street Map data directly into GeoJSON with properties that are understood by Mapzen house styles. See the tile server example for a demo.
A Postgres database is not required to evaluate the logic that is originally defined in a combination
of SQL and Python. This allows for the quick mapping of any OSM element(s) to a kind
/kind_detail
normalization. Such a normalization is non-trivial given the "diversity" of OSM tagging so projects
like tilezen/vector-datasource (and may others) are necessary.
The port currently implements almost all features applicable to evaluating zoom 14+ tile data. These features include:
- all filter, min_zoom and output logic defined in the
yaml/*.yaml
files, - all transforms that apply, implementation specific data transforms are skipped,
- the CSV matcher post processor to set the
scale_rank
andsort_rank
properties, - geometry clipping and label placement logic.
A lot of post processors still need to be ported, but only a few of the missing ones apply to zooms 14+. Missing post processors include: landuse_kind intercuts, merging line strings, merging building with building parts and any admin area matching used to get accurate country codes for highways and other objects.
It would also be nice to port some of the integration tests as they would give confidence that things are really working as expected. Right now there are just some unit tests and some high level sanity checks.
The goal is for there to be no functional differences for zooms 14+. The YAML definition files are
unchanged, there a just a few minor changes to the post processor filtering in queries.yaml
.
The port is based off of v1.8.0ish version of the vector-datasource.
-
Load and compile the
queries.yaml
,yaml/*.yaml
andspreadsheets/*_rank/*.csv
files. This can be done by loading the files directly using the implied directory structure:config, err := osmzen.Load("config/queries.yaml")
or if you want to use the "official" ported config files that are embedded into the binary using the
embed
package from the standard library:config, err := osmzen.LoadDefaultConfig()
If there are mistakes in the YAML the error will contain a lot of information to help debug:
if err, ok := errors.Cause(err).(*filter.CompileError); ok { log.Printf("error: %v", err.Error()) log.Printf("cause: %v", err.Cause) log.Printf("yaml:\n%s", err.YAML()) // chunk of marshalled YAML with the issue } else if err != nil { log.Printf("other err: %v", err) }
-
Process some OSM data:
data := osm.OSM{} layers, err := config.Process( data, orb.Bound{Min: orb.Point{-180, -90}, Max: orb.Point{180, 90}}, zoom, ) // layers is defined as `map[string]*geojson.FeatureCollection`
Layers can also be processed individually:
featureCollection, err := config.Layers["buildings"].Process( data, orb.Bound{Min: orb.Point{-180, -90}, Max: orb.Point{180, 90}}, zoom, )
The bound is necessary for clipping. Typically, set to the bound of the requested tile.
The result is a GeoJSON feature collection with kind
, kind_detail
etc. properties that
are understood by Mapzen house styles.
A more complete example that loads a zoom 16 area from the OSM API and the processes the tile (minus error checking):
package main
import (
"context"
"encoding/json"
"fmt"
"github.com/paulmach/osmzen"
"github.com/paulmach/orb/maptile"
"github.com/paulmach/osm"
"github.com/paulmach/osm/osmapi"
)
func main() {
tile := maptile.New(19613, 29310, 16)
// load osmzen config
config, _ := osmzen.LoadDefaultConfig()
// get osm data for a tile from the offical api.
bounds, _ := osm.NewBoundsFromTile(tile)
data, _ := osmapi.Map(context.Background(), bounds)
// process the data
// The tile coords will be used to exclude interesting nodes
// and labels outside the tile.
layers, _ := config.Process(data, tile.Bound(), tile.Z)
// pretty print the json
pretty, _ := json.MarshalIndent(layers, "", " ")
fmt.Println(string(pretty))
}
At a high level tilezen/vector-datasource filters and processes its data using the following steps:
- find relevant elements for a layer using the SQL queries defined in
data/{layer_name}.jinja
, - filter the elements using filter conditions defined in
yaml/{layer_name}.yaml
, - generate properties for each element using the matching filter's output expressions,
- apply transforms to each element independently,
- apply post processes to all the layers together.
The transforms and post processes that apply to each layer and zoom are defined in queries.yaml
.
For a lot more details see the official tilezen/vector-datasource project
overview.
As this package is a port of that code it follows the same steps, except for step 1 since the data is passed in directly.
During the loading of the YAML+CSV config files everything is compiled to make sure all the expressions and function references are known. If there is a typo, or something new/unsupported, an error will be returned. See above for how to get useful information from the error. The initial compile step allows for the checking of config errors at startup. Also since the types are converted up front there is a nice performance boost of about 10x.
The filters and outputs defined in the yaml/*.yaml
files are basically a set of statements that
act like: "if the element tags look like this, output these kind, kind_detail, etc. properties".
The filters define a condition, yes/no matching, that evaluates into a boolean value. During the compile
step these are converted into concrete types that implement the filter.Condition
interface. The
interface is defined as:
type filter.Condition interface {
Eval(*filter.Context) bool
}
The output for each filter defines what properties should be assigned to the element's GeoJSON feature. They output things such as booleans (is_tunnel), strings (kind), numbers (area) or nil to be ignored. The interface is defined as:
type fitler.Expression interface {
Eval(*filter.Context) interface{}
}
type filter.NumExpression interface {
filter.Expression
EvalNum(*filter.Context) float64
}
The filter.NumExpression
is also implemented by expressions that must be a number (e.g. area,
building height). Using it helps avoid a type indirection when we know we need numbers. For example
the min
and max
expressions.
The filter.Context
is passed in at runtime and contains info about the element being evaluated
like the OSM tags and geometry. It also caches "expensive" things like the area and volume that can
be used by multiple filters.
After elements for a layer are matched and GeoJSON features are created, a set of transforms is applied. The transforms edit the element properties based on some logic, sometimes requiring the set of relations the original OSM element is a member of.
While loading the config the transforms are matched to functions of the form:
func(*filter.Context, *geojson.Feature)
Transforms can only change a feature, they can't remove a feature if it's "bad" for any reason, like too small for the zoom. Transforms also don't know about other features, so they can't be used to remove duplicates or merge features, like parts of the same road. However, transforms can be used to do things like fix one-way direction, abbreviate road names, etc.
The post processes are compiled to check the parameters and data files. They are mapped to an
object implementing the postprocess.Function
interface defined as:
type postprocess.Function interface {
Eval(*postprocess.Context, map[string]*geojson.FeatureCollection)
}
The function takes all the layers as input. Some examples of post processing are clipping to the tile bounds, setting sort_rank and scale_rank, removing duplicate features, removing small areas, merging lines, etc.
Once everything is all setup we can start evaluating data against the filters and apply the
transforms and post processes. The input is OSM data, a bound, plus a zoom. The bound is used to
clip geometry and check if a label should be included. The zoom is used to filter out
things that are "too small" as defined by the min_zoom
output in the yaml/*.yaml
files. To
include everything, use a high zoom, such as 20.
The evaluation proceeds in the following steps:
-
Convert OSM data to GeoJSON
The data is run through osm/osmgeojson which is a port of the osmtogeojson node.js library. This groups nodes into ways and ways into polygons. For example, we don't care about the 4 nodes that define a building, we just want the building polygon.
-
Run each OSM element GeoJSON feature through the filters
We find the first filter in each layer to match and then compute the filter's outputs. Note, that an element can match in multiple layers, for example a building polygon and a POI. The input and output are both GeoJSON, however, the input contains properties based on OSM tags, but the output has properties from the filter like the
kind
andkind_detail
etc. -
Apply the transforms
The new GeoJSON object is updated a bit. This can include reversing the geometry or simplifying the name.
-
Apply the post processes to all the layers.
The end result is a layer, or set of layers that match those produced by tilezen
.
Note that this whole process can be applied to a single element.
The first two benchmarks evaluate a single element against ALL the filters and outputs in that layer. Normally you can stop after the first match and only evaluate that one output. The third benchmark is more typical of normal usage and coverts data from a zoom 16 tile. The last benchmark leaves out the osm data to GeoJSON step and just does the filtering and processing unique to this package.
BenchmarkBuildings-4 200000 9969 ns/op 1040 B/op 42 allocs/op
BenchmarkPOIs-4 10000 171457 ns/op 6816 B/op 450 allocs/op
BenchmarkFullTile-4 100 11292314 ns/op 3611916 B/op 26555 allocs/op
BenchmarkProcessGeoJSON-4 200 8091129 ns/op 1978560 B/op 18319 allocs/op
New benchmarks using v1.5.1 and go version 1.11.2
BenchmarkBuildings-4 300000 5525 ns/op 536 B/op 42 allocs/op
BenchmarkPOIs-4 20000 80353 ns/op 8264 B/op 546 allocs/op
BenchmarkFullTile-4 200 8736833 ns/op 2639975 B/op 22412 allocs/op
BenchmarkProcessGeoJSON-4 200 6367984 ns/op 1198285 B/op 12874 allocs/op
These benchmarks were run on a 2017 MacBook Pro with a 3.1 ghz processor and 8 gigs of ram. No concurrency is used in this package.
- github.com/pkg/errors - for rich errors with stack traces
- gopkg.in/yaml.v2 - YAML parsing
- github.com/paulmach/orb - geometry area, centroid, clipping, etc.
- github.com/paulmach/osm