You need to have the following Python packages installed:
numpy
scipy
scikit-learn
matplotlib
shapely
utm
pygeoj
overpass
- Run
detect_bus_stops.py
and wait until it terminates. - Run
start_webserver.py
. - Open a webbrowser (preferably not IE) and go to http://localhost:8000/.
- Explore the results.
I implemented two different algorithms to detect bus stops and will explain both approaches in the following paragraphs.
Input: Activity points, bus routes, bus stops from OSM
- Extract bus stop related previous/current activity patterns by examining the surrounding of known bus stops.
- Perform spatial clustering on the activity points.
- Extract the previous/current activity pattern for each cluster.
- Compare the patterns with the bus stop related patterns.
- If pattern similarity is above a certain threshold, chances are good that there is a bus stop nearby.
- Calculate the centroids for each cluster.
- If the distance between the centroid and the closest bus route is below a certain threshold, project the centroid to the closest bus route.
For each cluster, take the bearing and speed of the contained activity points into account and shift the centroid accordingly. Then project it to the closest route.
Input: Activity points, bus routes, bus stops from OSM
- Extract bus stop related previous/current activity patterns by examining the surrounding of known bus stops.
- Assign a score to each combination.
- Define common sense previous/current bus stop related activity combinations and also give them a score.
- Traverse each bus route with a certain step length and check for surrounding activity points at each step.
- For each step, sum the score of each activity point and optionally penalize by distance.
- Detect local maxima on the series of scores for each route
- Spatially aggregate the found maxima because bus routes can have route segments in common.
- Calculate an average score for the aggregated local maxima and optionally exclude those that are below a certain threshold.
+ Fast.
+ Scales well with larger amounts of data.
- Does not work so well with small data sets.
- Results depend on many parameters.
+ Flexible due to the ability to define scores for each activity combination.
- Rather slow.
- Does not scale very well with larger amounts of data.
- Results depend on many parameters.
The two proposed algorithms heavily depend on several parameters and the parameter setting is not always trivial. For that reason, I wrote a small function which tests various parameter settings and calculates the average distance between the detected bus stops and the closest known bus stop. However, it is always a tradoff between the number of detected bus stops and the average distance to the ground truth. Unfortunately, I had not enough time to evaluate the best parameter settings in depth, however I chose two different settings for each algorithm and provided the option to view the different results in the map visualization.
Furthermore, I've created some charts which helped me to understand the data. They will show up after the bus stop detection is finished.