Have you ever shot a perfect video only to have it spoiled by an unsightly pedestrian, truck or other moving object passing through the frame? Although most people know how to crop or photoshop their photos to remove unwanted stuff, doing so with video is another matter entirely.

Conventional object-removal techniques for video require specialized video editing skills and software and are time-consuming. However, a new Github project promises to make the task much easier — by simply outlining an unwanted object you can effectively “erase” it from your video.

Let’s see how it works below. First, draw a typical bounding box around the object you want to purge — in this case a pedestrian:

The system will track and remove the visual information inside the box, then perform “inpainting” — a technique that uses inference to reconstruct lost or corrupted parts of an image — to fill in the “hole” left by our departed pedestrian.

Here are some more examples:

Although on closer inspection it is clear the object-removal technique is nowhere near perfect, the results are still impressive. From the GIFs above, we can see that the simpler the background is, the better the inpainting results. Problems such as uneven alignment or ghost images can however emerge, especially in the case of complex textures.

The shadows of the tennis player and race car remain visible in the above GIFs

This GitHub project draws its inspiration from two CVPR papers: Fast Online Object Tracking and Segmentation: A Unifying Approach (SiamMask) and Deep Video Inpainting.

SiamMask is a simple multi-task learning approach that can be used to address both visual object tracking and semi-supervised video object segmentation. A trained SiamMask can produce object segmentation masks and rotate bounding boxes at 55 frames per second, relying solely on an initialized bounding box. The system established a new state-of-the-art result in real-time object tracking on the VOT-2018 (Visual Object Tracking) challenge.

Deep Video Inpainting meanwhile is designed to fill spatiotemporal holes with reasonable content in a video. The framework is designed to synthesize unknown regions in videos using an image-based encoder-decoder model and release more semantically correct and smoother images.

Find out more about the new project at Github. The tool is easy to get going, and all the code has been tested on Ubuntu 16.04, Python 3.5, Pytorch 0.4.0, CUDA 8.0, and GTX1080Ti GPU — meaning the script can run smoothly with Python 3.5 and Pytorch 0.4.0 on a decent GPU.

By installing pretrained SiamMask and Inpainting models and putting them in the “cp/” folder, users can easily run the miraculous “bounding box” and use it to hide anything they like in the frame (or more likely, things they dislike in the frame).

The papers Fast Online Object Tracking and Segmentation: A Unifying Approach & Deep Video Inpainting are on arXiv.