So how well does the model work? One way to probe it is to retrieve the closest sentence to a query sentence; here are some examples:

Query: “I’m sure you’ll have a glamorous evening,” she said, giving an exaggerated wink. Retrieved: “I’m really glad you came to the party tonight,” he said, turning to her.

And:

Query: Although she could tell he hadn’t been too interested in any of their other chitchat, he seemed genuinely curious about this. Retrieved: Although he hadn’t been following her career with a microscope, he’d definitely taken notice of her appearance.

The sentences are in fact very similar in both structure and meaning (and a bit salacious, as I warned earlier) so the model appears to be doing a good job.

To perform more rigorous experimentation, and to test the value of skip-thought vectors as a generic sentence feature extractor, the authors run the model through a series of tasks using the encoded vectors with simple, linear classifiers trained on top of them.

They find that their generic skip-thought representation performs very well for detecting the semantic relatedness of two sentences and for detecting where a sentence is paraphrasing another one. Skip-thought vectors perform relatively well for image retrieval and captioning (where they use VGG to extract image feature vectors). Skip-thought performs poorly for sentiment analysis, producing equivalent results to various bag of word models but at a much higher computational cost.

We have used skip-thought vectors a little bit at the Lab, most recently for the Pythia challenge. We found them to be useful for novelty detection, but incredibly slow. Running skip-thought vectors on a corpus of about 20,000 documents took many hours, where as simpler (and as effective) methods took seconds or minutes. I will update with a link to their blog post when it comes online.