The nice thing about geographical data is how condensing it is. By looking at a single visualization it is possible to infer a lot of information. You can already easily identify regions and specific locations where I stayed longer. It is also easy to see when I moved within a region or when I hopped from one to another.

It gets even better when zooming in and fixating on specific locations.

Pokhara Lakeside

This is a close-up into my time in Pokhara,Nepal, where many of my favourite accommodations and restaurants were.

Another interesting takeaway is how bad the GPS noise gets. Many of the single blue dots are quite far away from anywhere I have been. Although I do have to be careful here because there are single blue dots that are responsible for some of my most memorable experiences. For example the blue dots in the lake is the time I went on a spontaneous boat ride attempting to cross the lake.

Noise-Cancelling the Data

Generally with GPS erroneous points are isolated, meaning that your location is “teleported” to a completely different position for one moment. This noise is affected by many things and has interestingly been used even to detect things like the water consistency of the ground.

To deal with this problem there are a few solutions, such as removing low-accuracy points, points that moved too quickly and “lonely” points. Removing GPS noise is already shown to be highly effective in general geographic data algorithms, i.e. detecting modes of transportation.

Starting with low-resolution points, let’s see what happens when we filter what Google tags as inaccurate.

This already looks much better, many lonely dots on the heat map disappeared. It’s interesting to notice that we dropped from an initial 777,022 locations, to just 31,542 after the initial cleaning. That is a whopping 96% drop. Geolocation data is by nature very noisy.

Sometimes locations jump around and “throw” your location an unimaginable distance. To get rid of those points that move too quickly, we can filter out by speed. Yet unfortunately there is no speed in the dataset. This can be solved by calculating the speed ourselves. Speed is a good metric as there are limits to how fast a human can move, unless you are a Formula 1 driver it is unlikely you would regularly move faster than 200km/h . The speed can be calculated by taking the distance in meters in between two points and dividing it by the time that passed between the sampling. The earth is not flat, so regular euclidian distance does not work at all. But great circle distance does.

Personally I believe that other than flights I haven’t moved faster than 80 km/h in my entire trip, the roads are terrible in this area of the world. But maybe I am wrong, let’s test this (filtering outliers out of the graph).

x = km/h & y = logarithmic amount of geolocations

Interestingly enough, memory does not serve me right. I spent the same amount of time at 100~ km/h as 80km/h and I have an odd spike at 160km/h. Probably the 14 hour night buses. The limit is set then at 160 km/h (or 44.444~ m/s). Sadly this attempt does not improve our accuracy much. The filter only dropped gross misdirections, a mere 38 data points to 31,504 locations.

While we still have more work to do before we are ready to build more layers on our data at this point in the process you can already use the map to show lots of interesting things. For example it is very easy to spot the times I was lazy and stayed in one place for too long.