The PredNet is a deep convolutional recurrent neural network inspired by the principles of predictive coding from the neuroscience literature [1, 2]. It is trained for next-frame video prediction with the belief that prediction is an effective objective for unsupervised (or "self-supervised") learning [e.g. 3-11]. The PredNet architecture is illustrated below. An animation of the flow of information in the network can be found here.

Next frame predictions on the Caltech Pedestrian [12] dataset are shown below. The model was trained on the KITTI dataset [13]. See the repo for downloading the model.

Multi-timestep ahead predictions can be made by recursively feeding predictions back into the model. Below are several examples for a PredNet model fine-tuned for this task.