CGI is playing an increasingly important role in contemporary culture. Filming for example a battle scene between warriors and mystic creatures in Wakanda involves much more than just dressing and directing actors. The action must also be realistically placed in the fictional world the players inhabit using a technique called “compositing.”

Compositing is the process of merging foreground and background images. It is time-consuming and expensive for visual artists to create such image transitions, especially when edge details involve fuzzy items such as hair.

In the new paper Semantic Soft Segmentation, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Visiting Researcher Yagiz Aksoy and colleagues illustrate how machine learning can automate this photo editing process. Aksoy presented the paper at the annual SIGGRAPH computer graphics conference in Vancouver last week.

Semantic Soft Segmentation (SSS) can decompose an image into different layers extracted using “soft transitions” between layers.

“Once these soft segments are computed, the user doesn’t have to manually change transitions or make individual modifications to the appearance of a specific layer of an image,” says Aksoy. Machine learning enables the system to analyze the image’s texture and color. The neural network then detects and provides information about what the objects within the image are.

A dedicated pre-segmentation of the image is essential when editing. The segmentation should provide clear segments of the image while the soft transitions between them remain accurate; every segment should not extend out of the meaningful region in the image (to enable targeted edits); and the analyzing process should be completed automatically.

Askoy’s team adopted spectral decomposition to tackle the semantic soft segmentation problem. Using a convolutional neural network trained for scene analysis, they combined texture and color information from the input image with the neural network generated semantic cues.

Although SSS research is currently focused on still images, the technique is expected to be developed for filmmaking in the near future.