Share this post:

Rather than spending a month figuring out an unsupervised machine learning problem, just label some data for a week and train a classifier — Richard Socher (Chief Data Scientist, Salesforce) in 2017.

The TensorBoard projector features t-distributed Stochastic Neighborhood Embedding (t-SNE) for visualizing high-dimensional datasets, since it is a well-balanced dimensionality reduction algorithm that requires no labels yet reveals latent structure in many types of data. What happens when t-SNE can use partial labeling to recreate pairwise similarities in a lower dimensional embedding?

IBM Research AI implemented semi-supervision in TensorBoard t-SNE and contributed components required for interactive supervision to demonstrate cognitive-assisted labeling. A metadata editor, distance metric/space selection, neighborhood function selection, and t-SNE perturbation were added to TensorBoard in addition to semi-supervision for t-SNE. These components function in concert to apply a partial labeling that informs semi-supervised t-SNE to clarify the embedding and progressively ease the labeling burden.

Semi-supervised t-SNE

Available sample class labels can be used to calculate the Bayesian priors of Leland et al. [1], which can be applied to high-dimensional similarities to promote greater attraction between same-label pairs. The attractive and repulsive forces in weighted t-SNE are balanced with a connection scalar according to Yang et al. [1], but we normalize the gradient size by dividing with the sum of prior probabilities and leaving the repulsion normalization unaffected.

It’s like a liquid thinking process that fluidly adapts to the user’s definition of structure. The user gets to compose a perspective of the data that is useful.

The general effect is, predictably, that same-label samples form tighter and combined clusters, which effectively clears space in the embedding that highlights outliers and unlabeled points. This may incrementally reduce the user difficulty in applying labels to a dataset, as the embedding progressively becomes organized into compact clusters. t-SNE is extremely useful in providing an initial view of the data structure, but then supervision can be injected into its objective and iterative gradient descent can compose a user perspective of the data.

Imposing additional constraints by supervising t-SNE could make it harder to escape local optima, which is required e.g. to join two separated same-label clusters, especially when the Barnes-Hut approximation localizes attractive forces. Also, labeling becomes harder when same-label clusters collapse, so a method is required to kick the embedding out of its local optimum.

We propose random walks for points to perturb t-SNE, by iteratively applying independent offsets within small hyperspheres to a user-specified extent. The perturb function can be applied at any time, which can help to reduce sprite occlusion so that selections can be refined or to join separated same-label clusters.

Interactive supervision

Metadata in TensorBoard provide information on tensors, such as a class label for each sample. Now it is possible to edit existing metadata in TensorBoard, which effectively allows for labels to be applied to selected samples. The Projector switches into a metadata context when the user starts labeling, which shows a label histogram that helps to quickly identify and apply a desired label.

Previously, only cosine and Euclidean metrics in the high-dimensional input space were available to select neighborhoods. These distance metrics have been expanded to include use in the PCA and t-SNE embedding spaces, which is required for multi-sample labeling in the semi-supervised setting.

Geodesic neighborhood selection is proposed to grab smaller clusters based on discontinuities disregarded by k-nearest neighbor selection. Geodesic neighborhoods are calculated in a greedy approximate manner and normally provides good multi-sample labeling prospects.

Cognitive-assisted labeling of EMNIST Letters

How many interactions are required to obtain a sufficient labeling for an image dataset like EMNIST Letters (26 classes) or CIFAR-100 (100 classes)?

Labeling datasets is normally a very time-consuming, unenviable task, but one that usually cannot be escaped. Labeling facilitates the use of supervised machine learning, but why not use machine learning to facilitate minimum supervision labeling? Of course, transfer learning, zero-shot or one-shot learning could be used to circumvent the need for labels all together, but these rely on assumptions that will typically not hold for most real-world data.

Provided labels can also be explicitly used to train a feature extractor and classifier that is able to make increasingly confident label recommendations. Recognize however how t-SNE can present an initial view to the user that is amenable to clustering, and that the single global objective function is harnessed to help solve the minimum supervision problem in an elegant and self-contained manner, adhering to the philosophy of simplicity.

Fig 1 shows a snippet of a longer labeling session, sped up by 4x. It turns out that a lot of interactions are required and that labeling really is a painful task! However, it is clear that the clarification provided by semi-supervised t-SNE in conjunction with geodesic neighborhood selection definitely increases the labels/interaction efficiency. It often joins disparate samples into its membership cluster upon being labeled, so it clears up the embedding and it becomes easier to notice and handle unlabeled samples.

EMNIST Letters is a 26-class dataset with 411,302 samples for which a 85.15{ccf696850f4de51e8cea028aa388d2d2d2eef894571ad33a4aa3b26b43009887} accuracy is achieved with an OPIUM-based classifier [3], though we use only about 2000 stratified samples for the labeling exercise. This is a good dataset to demonstrate labeling on, as the sample images are small, familiar and easily distinguishable by the human eye. The bottleneck thus becomes the labeling system, and the challenge is to learn as much from every human click/keypress so as to require the least number of interactions to obtain a decent labeled sample size for every class.

Categorizing Radio Frequency Interference

The SETI Institute commandeers a formidable radio telescope, called the Allen Telescope Array, which listens to the night sky in the hopes of detecting ET signals. Unfortunately, most signals come from human-made sources and are unwanted interference that have to be filtered out. There are however natural categories of RFI appearing in millions of captured signal events, and it would be much easier to take out the noise if it can be accurately classified.

We represent signals as small square images that are depictions of spectrograms, or a time-vs-frequency plot that can explain the frequency content and possible nature of the signal. So now if we can visualize signals, we can use TensorBoard interactive labeling to good effect as sample similarity can easily be seen which makes it easy to delineate good clusters.

Some 14 million archived measurements have been processed with spectral feature extraction followed by autoencoding to generate a balanced sample of 2000 measurements possessing a good diversity of signal activity. In the above video we inspect these samples with TensorBoard and progressively label geodesic clusters with user-defined terms.

Remaining unlabeled samples can be explored as possible anomalies which may require follow-up measurements. You will notice some strange looking signals in the latter part of the video.

Is it useful?

Note the utility provided by semi-supervised t-SNE in assisting the labeling process:

An initial cluster-like view is presented that makes it easy to pick homogeneous clusters for labeling. With every labeling operation more samples are compacted into labeled clusters, which organizes the representation so that remaining unlabeled samples are much easier to see and get to. As the curse of dimensionality is solved here, embedding space comes at a premium and has to be recovered at all cost. After a sufficient labeling the remaining unlabeled samples are likely outliers which can be explored in terms of content and context in relation to common classes.

From the above demonstrations it is thus conceivable that the labeling process can be simplified by harnessing a global weighted objective that is solved iteratively with gradient descent. The obvious limitation here is that points have to move through the embedding and with the Barnes-Hut approximation it becomes very difficult for separated same-label clusters to agglomerate for a perfect clustering to be obtained. Future work may consider alternative approaches to make better use of labeling to elegantly obtain the best clusters.

UPDATE: This research was recently presented at the 2018 TFDevSummit. Watch the video below – starts at 1:50

[1] Leland McInnes, Alexander Fabisch, Christopher Moody, Nick Travers, “Semi-Supervised t-SNE using a Bayesian prior based on partial labelling”, https://github.com/lmcinnes/sstsne. 2016.

[2] Zhirong Yang, Jaakko Peltonen, and Samuel Kaski. “Optimization equivalence of divergences improves neighbor embedding”. International Conference on Machine Learning. 2014.

[3] Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. “EMNIST: an extension of MNIST to handwritten letters.” arXiv preprint arXiv:1702.05373 (2017).