Every March US chip giant NVIDIA hosts its GPU Technology Conference in Santa Clara, rolling out new chip designs and products while showcasing its latest tech and AI breakthroughs. GTC 2019 runs next Monday through Thursday (March 18–21), and while we can only speculate what surprises NVIDIA CEO Jensen Huang might have in store for us, we can get some sense of where the company is headed by looking at what it’s been up to for the last 12 months.

NVIDIA is steadily expanding its foothold in artificial intelligence and in 2018 contributed numerous noteworthy research results to the machine learning community, including StyleGAN, video-to-video translation, WaveGlow, and more.

The company’s AI-related efforts make sense, as NVIDIA is a major supplier of graphics cards to AI researchers and developers. Because GPUs remain the dominant option for training machine learning models, NVIDIA sales will obviously benefit from AI development and deployment. NVIDIA also leverages the knowledge obtained from AI experiments to help further evolve and polish its chip designs. The company’s latest GPU architectures — Turing and RTX — are good instances of this, as they feature Tensor Cores designed to accelerate the large matrix operations which are the heart of AI.

NVIDIA is also infusing AI capabilities into its chips as an additional selling point, and recently launched its DLSS (deep learning supersampling) technology on the video game Battlefield V, available with RTX GPU. This is essentially a deep learning technique which enables higher resolutions and settings while maintaining solid frame rates.

Below is Synced’s overview of NVIDIA’s main AI-related research and products since the last GTC.

StyleGAN

NVIDIA researchers took a big step towards photorealistic image generation by introducing StyleGAN (A Style-Based Generator Architecture for Generative Adversarial Networks). The research team proposed a novel generator architecture for GAN that draws insights from style transfer techniques. The system can learn and separate different aspects of an image unsupervised; and enable intuitive, scale-specific control of the synthesis.

While previous GANs could not control what specific features they wanted to regenerate, the new generator can adjust the effect of a particular style — for example high-level facial attributes such as pose, identity, shape — without changing any other features. This enables better control of specific features such as eyes and hairstyles.

Paper: https://arxiv.org/abs/1812.04948

GitHub: https://github.com/NVlabs/stylegan

WaveGlow

WaveGlow is NVIDIA’s latest innovation tackling the task of end-to-end text-to-speech synthesis. In the paper WaveGlow: A Flow-based Generative Network for Speech Synthesis, NVIDIA researchers proved the auto-regressive approach is unnecessary for synthesizing speech. Instead, they combined ideas from Glow and WaveNet, and proposed a network capable of generating high quality speech from mel-spectrograms, a feature that represents the short-term power spectrum of a sound.

WaveGlow is implemented using only a single network and trained with a single cost function. While most models struggle to synthesize audio faster than 16kHZ without sacrificing audio quality, WaveGlow can generate speech at more than 500kHz — and more than 25 times faster than real time — while delivering audio quality comparable with the best publicly available WaveNet implementations.

Paper: https://arxiv.org/pdf/1811.00002.pdf

GitHub: https://github.com/NVIDIA/waveglow

Video-to-video translation

Video synthesis technology has a wide range of applications in gaming, automotive, virtual reality, and more. Traditional graphic rendering engines however require cumbersome and time-consuming detailing involving scene geometry, materials, lighting, and dynamics. Many researchers are now investing in end-to-end deep learning models.

In the paper Video-to-Video Synthesis, NVIDIA researchers introduced a GAN-based model to synthesize high-quality videos. The proposed approach can generate photorealistic 2K resolution videos up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. The paper was accepted by NeurIPS 2018.

Following on that success, NVIDIA researchers trained the model on video of cities to render an interactive 3D urban environment. While this is still a proof-of-concept, it is believed the technique can push development in multiple areas, for example in autonomous driving where it can providing a simplified approach for simulating road conditions.

Paper: https://arxiv.org/pdf/1808.06601.pdf

GitHub: https://github.com/NVIDIA/vid2vid

Super SlowMo

Slow motion is common in today’s television and filmmaking industry, used to detail short-term perspectives in sports broadcasts and home videos, or to create artistic effects in movies, etc. In the paper Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation, NVIDIA proposed an end-to-end convolutional neural network which can produce high-quality slow-motion videos from a standard 30-frame-per-second video.

The goal of the slow motion task is essentially to generate multiple intermediate video frames. NVIDIA’s method can interpolate a frame at any arbitrary time step between two frames. The experiment results show the new approach outperforming other state-of-the-art models. The paper was accepted by CVPR 2018.

Paper: https://arxiv.org/pdf/1712.00080.pdf

Noise2Noise

Poor image quality is an enemy of many visual editing and presentation tasks. In the paper Noise2Noise: Learning Image Restoration without Clean Data, NVIDIA researchers introduced a deep learning approach which can easily remove image noise and artifacts.

While image editing processes have existed for years, previous image restoration techniques required training data including both noisy and clean images. NVIDIA proved that the collection of clean images is unnecessary for denoising, which exempted researcher from the time-consuming practices of data collection and preprocessing. The paper was accepted by ICML 2018.

Paper: https://arxiv.org/pdf/1803.04189.pdf

GitHub: https://github.com/NVlabs/noise2noise

Image inpainting

Image inpainting is an image restoration task that basically fills blank holes in an image. A number of photo editing softwares for example use this technique when removing unwanted content, to fill the resulting “holes” with contiguous and realistic computer-generated content. The challenge becomes more difficult if the holes are irregularly shaped.

In the paper Image Inpainting for Irregular Holes Using Partial Convolutions, NVIDIA researchers proposed a model to edit images with irregular hole patterns and produce generated content that seamlessly incorporates with the rest of the image. The paper was accepted by ICLR 2018.

Paper: https://arxiv.org/pdf/1804.07723.pdf

Synced will be reporting from GTC 2019 throughout the week.