NVIDIA researchers have developed a deep learning-based system which can produce high-quality slow-motion video from a standard (30 fps) video clip. In comparison with manual slow motion results, the NVIDIA demonstration video shows far superior smoothness. The technique generates intermediate frames to achieve the super slow motion effect — and as it can generate an indefinite number of such intermediate frames there is no limit to how slow videos can be made to go.

Video demo 1:

The paper Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolationalong with NVIDIA’s presentation at the last summer’s Computer Vision and Pattern Recognition (CVPR 2018), are the latest research from NVIDIA on such AI-empowered video transformation techniques.

CVPR spotlight video:

https://people.cs.umass.edu/~hzjiang/projects/superslomo/superslomo_cvpr18_spotlight_v4.mp4

The paper introduces an end-to-end convolutional neural network for variable-length multi-frame video interpolation, which generates intermediate frame(s) between two consecutive frames to form both spatially and temporally coherent video sequences.

NVIDIA method network architecture

To address the challenge of generating multiple intermediate video frames, researchers first computed bi-directional optical flow between the input images using a U-Net architecture. The flows were then linearly combined at each time step to approximate the intermediate bi-directional optical flows. Although these approximate flows work well in locally smooth regions, they can produce artifacts around motion boundaries. To address this an additional U-Net is employed to refine the approximated flow and predict soft visibility maps. The two input images are then warped and linearly fused to form each intermediate frame. To avoid artifacts, the team applies visibility maps to the warped images before fusion, to exclude the contribution of occluded pixels to the interpolated intermediate frames.

Snapshot of training data

The NVIDIA multi-frame approach outperforms state-of-the-art single frame methods on the Middlebury, UCF101, Slowflow, and High-framerate Sintel datasets. The paper Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation is on arXiv.