-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter search radius formula for map matching #3184
Smarter search radius formula for map matching #3184
Conversation
b19a38b
to
58b98cf
Compare
Thanks for this great analysis! 👍 I need some time to look at this in more detail, but I think we can use this as a basis to not only fix the search radius, but actually provide a real empirical correction over the original measurements done in the Newson and Krumm paper. Right off the bat I'm skeptical about exposing this as a querty-time parameter. Basically what we are defining here are some properties of an empirical distribution. Hard-coding that would be fine with me, if we can ensure that the measurements are good. The 200m limit is also something that was enforce in the Newson and Krumm paper. My main concern is about documenting this behavior. A user might expect increasing the value for BTW. If you are looking at some more performance improvements, there are a few things you can do. One would be to incorporate a bearing filter (or the more advanced version, modify the emission probability in the hidden markov model to use the bearing value). Speeding up the Viterbi algorithm by using a less naive version that limits the amount of memory used would also be possible. Hit me up by email if you have questions. |
I'm not sure why the Appveyor build is failing, could you help me debug that? Thanks for your response :) I'm working on getting the I'd strongly prefer to keep the I'll update the docs to indicate the We've actually experimented with using the phone-reported bearing to modify the emission probability! The distribution of We're definitely interested in any performance gains we can get. I'll start an email thread so we can discuss there. |
I understand this is great for experimenting and debugging. But when it comes to including something in our API my primary concern is always two factors:
From experience, once you include something in your API that is not completely obvious people will misunderstand and misuse it (:wave: I know this is unsatisfying, but let's keep production and prototyping code separate for now.
That would be great.
There were a number of problems with Windows unrelated to your PR, there is a good chance that a rebase will fix it.
Great! I'm currently trying to find a test data set that includes device bearings for some own experiments. Proposal:
|
Here is the accuracy_to_distance.csv data file. Keep in mind that this is just based off our best estimate of the driver's position, using OSRM 5.2.6 with the OSRM v4 radius formula. Sounds good, I'll make those changes and then ping this PR. |
I've implemented the changes you suggested and rebased, PR is ready for review again. |
9e36a92
to
be3d3c2
Compare
@kerrick-lyft looks good to me. 👍 Merging. |
Hi,
|
I dug back through my emails, and discovered why this was reverted:
The bad news is that we never got around to revisiting that. The core problem is that the foot/bike networks are quite a bit denser - I don't remember the exact details, but I bet it was causing slowdowns in urban areas where there were sidewalks present. In those cases, you really want to use the minimum radius you can get away with, so as not to include too many candidate points and slow things down. The PR does greatly increase the default radius, which overall makes things slower. |
Thanks @danpat for the reply. |
+1 to Masoud's comment, I think it would be great if there were a way to customize the search radius based on the profile. I think the formula I gave in the PR description works well for the driving use-case (and we still use that same formula internally at Lyft). But it may make sense to use the current formula for biking/walking (or come up with a smart formula for those cases too, but that's probably further down the road). |
I'm unfortunately not familiar enough with OSRM to know whether this is easy to implement in a robust way or not. Is it possible to tell from the profile whether it's considered a driving, walking, or biking profile? What happens down the road when other modalities are added? |
If the linear relationship you found holds for other modailities, but has a different gradient, then we could allow setting of the factors in the |
I think the current heuristic is (3 * gps_accuracy), unless it changed since I last looked (that was 4 years ago, so there's a good chance). This is a linear relationship so it should be fine. We could have default values for the slope and y-intercept (3 and 0, respectively) and optionally override them in the .lua file (e.g. driving profile would set slope = 3.45 and y-intercept = 45). |
This is a long commit message, sorry! The gist is that I found a better formula for the map match search radius that gives good results but is still small enough to be performant.
OSRM’s /match endpoint can take a long time to finish a request, even on a powerful, modern server, which has led to a lot of additional latency as we deploy map matching in more places.
The running time of the map-matching algorithm is approximately quartic (O(n^4)) in the candidate point search radius; a 2x increase in the search radius for each input point will tend to lead to a 4x increase in the number of candidate points, and a 4x increase in candidate points (states) will lead to a 16x increase in the number of operations the Viterbi algorithm performs.
So decreasing the search radius can dramatically improve the running time. However, the formula OSRM v5 uses, which is search_radius = 3 * gps_accuracy, doesn’t find the correct point in many cases. We internally patched OSRM to use the same formula as OSRM v4, search_radius = 10 * gps_accuracy, which gives good results.
I suspected the optimal search radius isn’t directly proportional to the gps_accuracy, and so I pulled ~1 million data points from Lyft drivers and compared the distance from the raw point to the map-matched point output by our current system (data file here). I bucketed the points by rounding their phone-reported accuracy (Location.getAccuracy() on Android or CLLocation.horizontalAccuracy on iOS) down to the nearest integer. To handle sparsity and to smooth out the data I also added each point to the two neighboring buckets in each direction. For example, a point with accuracy 3.7 would be included in the buckets 1, 2, 3, 4, and 5. Finally, I computed the 99.9th percentile raw->map-match distance for each bucket. See this script for the specific computation: https://drive.google.com/file/d/0B30B6-L__QYKbXZwUl9DbkVsbDA/view
Here’s a graph of the results: https://drive.google.com/file/d/0B30B6-L__QYKU0RmZjI0ZGxYZ1E/view
The upward trend stops at around bucket = 47. Removing buckets >= 48, we see a clear linear trend: https://drive.google.com/file/d/0B30B6-L__QYKNTF5bm1YWGNaOGc/view?usp=sharing
We can fit a trendline to this data: https://drive.google.com/file/d/0B30B6-L__QYKeF8yb3ZiTkhjV1E/view?usp=sharing
This gives the formula search_radius = 3.45 * gps_accuracy + 44.4. Since P99.9 radius was essentially never more than 200 meters, we cap the search radius at 200 meters and round up the coefficients, giving the final formula search_radius = min(3.5 * gps_accuracy + 45, 200). This formula should yield a search radius that contains the correct point 99.9% of the time.
We allow the caller to configure these parameters so they can tweak the performance / accuracy tradeoff. The 200 meter cap might be a quirk of the data I processed, but the caller can change it if they want.
As a result of this change, the latency of our map match calls dropped significantly (vs our patched OSRM that used 10 * gps_accuracy) without any degradation in accuracy on our test dataset. OSRM currently uses the formula 3 * gps_accuracy, so for mainline OSRM this change means a modest increase in latency, but the map match results should be more accurate. The new formula provides a good tradeoff between latency and accuracy.