Introduction

At Sentiance, we developed an AI platform that learns to detect and predict a person’s behavior, routines and profile from mobile sensor data such as accelerometer, gyroscope and GPS. Many low level algorithms, such as our transport mode detector, are trained by supervised learning techniques for which we gathered labeled data by closely collaborating with our partners, customers and test groups. However, supervised learning comes with at least two major drawbacks:

Gathering labeled data in a natural setting can be difficult and very expensive. The algorithm does not improve over time, even when millions of unlabeled data points per day are captured.

A concrete example that would benefit from unsupervised or semi-supervised learning, is venue mapping, i.e. figuring out which venue a user is visiting given a noisy location estimate. Since the type of place a user is currently visiting highly depends on his recent and long-term past behavior, this kind of data needs to be gathered in a realistic setting where the temporal behavior of the user actually corresponds to natural human behavior. While gathering labeled data for venue mapping is a tedious process, getting unlabeled data comes for free as more users are using our platform. The question then becomes: Can we extract meaningful information from population-wide human visiting patterns without requiring access to the actual labels?

The main motivation for this project is therefore to enable us to learn from these large unlabeled datasets, with minimal preprocessing and feature engineering. More specifically, we want to learn a representation of a location that embeds different types of meaningful information, for example the category (e.g. shop, school, etc.) of the place present at that location, and some information about population-wide behavior at that location. We can then use these representations in our venue mapping and other models.



To achieve this, we adapt the DeepCity deep learning framework (Pang, 2017), which is inspired on word2vec word embeddings (Mikolov, 2013) and graph embeddings (Perozzi, 2014).

DeepCity

DeepCity (Pang, 2017) is a feature learning framework based on deep learning, to profile users and locations. It uses check-in data obtained from online social networks such as Facebook, Twitter, Foursquare, etc. Users frequently share their whereabouts using those platforms in the form of so-called check-ins. Two use cases are considered: user profiling, and location profiling. In the context of user profiling, it is assumed that the whereabouts of a user reflects who he or she is. User profiling includes tasks such as age or gender prediction and is the focus of Pang et al. Location profiling, on the other hand, is aimed at learning more about locations and includes tasks such as predicting the category of a location. The latter is the focus of this blog, because of the direct link to venue mapping.

User and location profiling tasks are being handled by machine learning algorithms, typically involving hand-engineered features. The latter is very time consuming and requires domain-specific knowledge. Even with expert domain knowledge, it is difficult to capture all relevant features, that are applicable in all scenarios. Deep learning provides a promising alternative, where features are no longer hand engineered but are learned by the model itself. DeepCity (Pang, 2017) can be considered such an approach, specifically for user and location profiling.

DeepCity uses a graph representation of check-ins and random walks on this graph and outputs node embeddings. More specifically, location and user nodes are organized in a bipartite graph, and task-specific random walks are used to explore the graph. Using those random walks and some clever algorithms we can learn our embeddings. But let’s take a step back first, and put everything into context.