..

Route Discovery: Reverse Engineering Wandrer

Spoiled by great and invisible technologies, beyond the very conspicuous achievements of self-driving cars or generative AIs, we have stopped being amazed by all the other less flashy things that technology unlocks. Live OCR or visual similarity search on our smart phones? No big deal. Instagram or Snapchat’s augmented reality filters? Sure, that’s a given.

Machine learning practitioners know how complex those actually are, from the computer vision algorithms making it even possible in the first place to all the software engineering and architecture needed for serving and providing a smooth experience. But even so, we sometimes don’t dwell on what’s offered to us.

My latest experience of this phenomenon started by this unusual description of a Strava activity:

Wandrer on Strava
Once again, Strava is giving me homework

The task is rather straightforward to understand: for any athlete’s new activity, how many new kilometers of road (s)he has explored. Of course there are already a multitude of great tools proposing this type of features like wandrer.earth or stats hunter but I wanted to give it a go.

New Path Detection Algorithm, Geometrical Approach

The first step is to preprocess all the historical routes of an athlete into an appropriate format. Starting from a list of successive GPS coordinates describing a given route, we resample it such that all consecutive pairs of points are at least 500m apart1. We then index all those path segments.

Now for a new route, we proceed similarly and then compute the intersection between its path segments and the historical ones. We then take the complementary of the union of those intersection in the new route.

Let’s give a bit more details to the above. Firstly, to get path intersection if we can do the math, computing the parametric equation of both segments, if they are the same, the two segments are colinear and we can extract their intersection2.
Or we can simply leverage Python’s computational geometric toolbox Shapely that exposes an intersection function between geometries. There are two other Python packages worth mentioning for dealing with geospatial data even though they were not used for this project: GeoPandas and rtree3.

A second-order matter is the indexing of the path segments. Of course, we could index this 4-dimensional data but then search over the index would become complex, we would need to define a custom distance metric between two segments. Instead, we simply index each segment by its midpoint.

Rather than doing brute force comparison between all the new route paths segments and the historical ones, we can leverage a nearest-neighbor algorithm to filter historical segments that are worth checking for intersection. We use scikit-learn’s BallTree together with the haversine distance appropriate for this spatial indexing. When searching for nearby historical path segments, we choose a search radius of 500m/6371e34 to be coherent with our step size. And that’s it !

It turns out that Shapely offers a spatial indexing solution out of the box, the so-called STR Tree(the paper for the Sort-Tile Recursive tree). Not only I’ve found out about it much later, but my intution is that for a geometry made only of points, and not more complex arbitrary geometry, it’s not clear whether they would be a real difference in performance with the BallTree.

You can check the underlying code here, the client code5 looks like this when comparing my first Strava activity with the rest:

from wandrer.activity import HistoricalActivities, Activity

history = HistoricalActivities(strava_credentials)
new_activity = Activity(activity_id =activity_id, strava_client=strava_credentials)

history.fetch_path_segments()
history.index()

new_segments = new_activity.get_new_path_segments(history)

which gives the following plot of all the new paths discovered in that activity

New Segment

The Strava Segment Approach

A perhaps less elegant approach but fun nonetheless is to leverage directly Strava segments instead of relying on computational geometry. Every Strava activities comes with a set of Strava segments, they can overlap and are not necessarily covering the full route though. Here instead of computing segments intersection, we choose to simply do a string comparison on their names, which gives (thankfully) something rather similar:

New Strava Segment

If you are curious to know more, you can check out the implementation of the code explained below in my wandrer toy package.


  1. This is a reasonable step value, although there is a tradeoff between precision (lower step value) and the route’s representing list size. 

  2. If the two segments only intersect in one point, we consider there is no overlap in their route. 

  3. It’s also worth mentioning this intro course on Automating Gis-Processes

  4. 6371e3 is the Earth radius of course. 

  5. Once again and as always, this is not production-ready code in anyway. It’s just a demo project so don’t judge the code too harshly.