Skip to content

Commit

Permalink
faster Intersects queries
Browse files Browse the repository at this point in the history
In systemed#635, we introduced a way
to avoid calling Boost's R-Tree for `Intersects` when we knew the
underlying z14 tile had no shapes in it. This was a perf win (although
not as big as was claimed in that PR -- there was a bug, fixed in
systemed#641, which was the source
of much of the speedup).

This improves on that work for the case of large, irregularly shaped
polygons.

Currently, for each shape in the shapefile/GeoJSON file, we compute
the set of z14 tiles in its bounding box, and set a bit in a bitset
for each of those tiles. This is fast to compute, but lossy.

For example, the bounding box of
https://www.openstreetmap.org/relation/9848163#map=12/39.9616/-105.2197
covers much of the city of Boulder, even though the park itself is
entirely outside of Boulder.

Instead, this PR now uses 2 bits per z14 tile. The loading of shapes
is as before: the first bit is set to 1 if the z14 tile is contained in
a bounding box of a shape. The second bit is initialized to 0.

At `Intersects` time, we'll lazily refine the index, but only for those
tiles that are actually being queried. We'll actually check if the z14
tile intersects any of the shapes, and then either set the second bit to
1, or clear the first bit.

This doubles the memory requirement for indexing from 32MB to 64MB
per layer.

I've left some counters in - I'm going to fiddle with some other ideas
to see if there are other speedups. I'll remove the counters before
making a PR.

Related: the code previously treated the lats as non-projected values.
This was wrong, but didn't affect correctness, since both the indexing
and querying code made the same error. This commit changes it to
correctly use latp.
  • Loading branch information
cldellow committed Oct 1, 2024
1 parent 142b38b commit 4ee8dc5
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 14 deletions.
7 changes: 6 additions & 1 deletion include/shp_mem_tiles.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,12 @@ class ShpMemTiles : public TileDataSource
std::map<uint, std::string> indexedGeometryNames; // | optional names for each one
std::map<std::string, RTree> indices; // Spatial indices, boost::geometry::index objects for shapefile indices
std::mutex indexMutex;
std::map<std::string, std::vector<bool>> bitIndices; // A bit it set if the z14 (or base zoom) tiles at x*width + y contains a shape. This lets us quickly reject negative Intersects queryes
// A bit is set if the z14 (or base zoom) tiles at 2 * (x*width + y) might contain at least one shape.
// This is approximated by using the bounding boxes of the shapes. For large, irregular shapes, or
// shapes with holes, the bounding box may result in many false positives. The first time the index
// is consulted for a given tile, we'll do a more expensive intersects query to refine the index.
// This lets us quickly reject negative Intersects queryes
mutable std::map<std::string, std::vector<bool>> bitIndices;
};

#endif //_OSM_MEM_TILES
Expand Down
12 changes: 11 additions & 1 deletion src/osm_lua_processing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ thread_local OsmLuaProcessing* osmLuaProcessing = nullptr;

std::mutex vectorLayerMetadataMutex;

std::atomic<uint64_t> bitmapLookups;
std::atomic<uint64_t> bitmapHits;

void handleOsmLuaProcessingUserSignal(int signum) {
osmLuaProcessing->handleUserSignal(signum);
}
Expand Down Expand Up @@ -355,8 +358,15 @@ double OsmLuaProcessing::AreaIntersecting(const string &layerName) {
template <typename GeometryT>
std::vector<uint> OsmLuaProcessing::intersectsQuery(const string &layerName, bool once, GeometryT &geom) const {
Box box; geom::envelope(geom, box);
if (!shpMemTiles.mayIntersect(layerName, box))
bitmapLookups++;

if (bitmapLookups.load() % 10000 == 0) {
std::cout << "lookup: " << std::to_string(bitmapHits) << " saved calls of " << std::to_string(bitmapLookups) << " lookups" << std::endl;
}
if (!shpMemTiles.mayIntersect(layerName, box)) {
bitmapHits++;
return std::vector<uint>();
}

std::vector<uint> ids = shpMemTiles.QueryMatchingGeometries(layerName, once, box,
[&](const RTree &rtree) { // indexQuery
Expand Down
63 changes: 51 additions & 12 deletions src/shp_mem_tiles.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "shp_mem_tiles.h"
#include "coordinates_geom.h"
#include <iostream>
#include <mutex>

Expand Down Expand Up @@ -57,7 +58,7 @@ void ShpMemTiles::CreateNamedLayerIndex(const std::string& layerName) {
indices[layerName]=RTree();

bitIndices[layerName] = std::vector<bool>();
bitIndices[layerName].resize((1 << indexZoom) * (1 << indexZoom));
bitIndices[layerName].resize(2 * (1 << indexZoom) * (1 << indexZoom));
}

bool ShpMemTiles::mayIntersect(const std::string& layerName, const Box& box) const {
Expand All @@ -66,19 +67,52 @@ bool ShpMemTiles::mayIntersect(const std::string& layerName, const Box& box) con
auto& bitvec = bitIndices.at(layerName);

double lon1 = box.min_corner().x();
double lat1 = box.min_corner().y();
double latp1 = box.min_corner().y();
double lon2 = box.max_corner().x();
double lat2 = box.max_corner().y();
double latp2 = box.max_corner().y();

uint32_t x1 = lon2tilex(lon1, indexZoom);
uint32_t x2 = lon2tilex(lon2, indexZoom);
uint32_t y1 = lat2tiley(lat1, indexZoom);
uint32_t y2 = lat2tiley(lat2, indexZoom);
uint32_t y1 = latp2tiley(latp1, indexZoom);
uint32_t y2 = latp2tiley(latp2, indexZoom);

for (int x = std::min(x1, x2); x <= std::min((1u << indexZoom) - 1u, std::max(x1, x2)); x++) {
for (int y = std::min(y1, y2); y <= std::min((1u << indexZoom) - 1u, std::max(y1, y2)); y++) {
if (bitvec[x * (1 << indexZoom) + y])
return true;

uint32_t index = 2 * (x * (1 << indexZoom) + y);
if (bitvec[index]) {
if (bitvec[index + 1])
return true;
else {
// When we loaded the shapefiles, we did a rough index based on a bounding
// box. For large, irregularly shaped polygons like national forests, this
// can give false positives.
//
// We lazily do a more exacting check here, intersecting the index zoom tile.
// Afterwards, we eitehr set bitvec[index + 1] or clear bitvec[index].
TileBbox bbox(TileCoordinates(x, y), indexZoom, false, false);
std::vector<uint> intersections = QueryMatchingGeometries(
layerName,
true,
bbox.clippingBox,
[&](const RTree &rtree) { // indexQuery
vector<IndexValue> results;
rtree.query(geom::index::intersects(bbox.clippingBox), back_inserter(results));
return results;
},
[&](OutputObject const &oo) { // checkQuery
return geom::intersects(bbox.clippingBox, retrieveMultiPolygon(oo.objectID));
}
);

if (intersections.empty()) {
bitvec[index] = false;
} else {
bitvec[index + 1] = true;
return true;
}
}
}
}
}

Expand Down Expand Up @@ -159,18 +193,23 @@ void ShpMemTiles::StoreGeometry(
// this shape.
auto& bitvec = bitIndices.at(layerName);
double lon1 = box.min_corner().x();
double lat1 = box.min_corner().y();
double latp1 = box.min_corner().y();
double lon2 = box.max_corner().x();
double lat2 = box.max_corner().y();
double latp2 = box.max_corner().y();

uint32_t x1 = lon2tilex(lon1, indexZoom);
uint32_t x2 = lon2tilex(lon2, indexZoom);
uint32_t y1 = lat2tiley(lat1, indexZoom);
uint32_t y2 = lat2tiley(lat2, indexZoom);
uint32_t y1 = latp2tiley(latp1, indexZoom);
uint32_t y2 = latp2tiley(latp2, indexZoom);

uint32_t hits = 0;
for (int x = std::min(x1, x2); x <= std::min((1u << indexZoom) - 1u, std::max(x1, x2)); x++) {
for (int y = std::min(y1, y2); y <= std::min((1u << indexZoom) - 1u, std::max(y1, y2)); y++) {
bitvec[x * (1 << indexZoom) + y] = true;
uint32_t index = 2 * (x * (1 << indexZoom) + y);
if (!bitvec[index]) {
hits++;
}
bitvec[index] = true;
}
}
}

0 comments on commit 4ee8dc5

Please sign in to comment.