Experiment

All experiment results were generated with NeuralTalk. It takes an image and predict its sentence description with a Recurrent Neural Network. The NeuralTalkAnimator was used to process video files.

NeuralTalk is overall very fascinating. With the right selection of inputs, it works with astounding accuracy and generates informative sentences. When it fails... Inputs & Outputs are cherrypicked, balancing accuracy VS comedy.

Model

NeuralTalk´s model generates natural language descriptions of images. It leverages large datasets of images and their sentence descriptions to learn about the correspondences between language and visual data.

The model is based on a combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities. For more insights, read this great blog post: Image captioning for mortals.

NeuralTalkAnimator

The NeuralTalkAnimator is a python helper, that creates captioned videos. It take a folder with videos and returns a folder with processed videos back. It´s open source on GitHub. Thanks to @karpathy for releasing NeuralTalk! Send input video requests to @samim (<3min, Youtube 720p).