http://cs.cmu.edu/~aayushb/hallucinate/

PixelNN: Example-based Image Synthesis

Abstract

We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode collapse problem. (2) they are not interpretable, making it difficult to control the synthesized output. We demonstrate that NN approaches potentially address such limitations, but suffer in accuracy on small datasets. We design a simple pipeline that combines the best of both worlds: the first stage uses a convolutional neural network (CNN) to maps the input to a (overly-smoothed) image, and the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner. We demonstrate our approach for various input modalities, and for various domains ranging from human faces to cats-and-dogs to shoes and handbags.

Paper

PixelNN: Example-based Image Synthesis.



A. Bansal, Y. Sheikh, and D. Ramanan



arXiv | bibtex





Comparison with Pix-to-Pix

Multiple Outputs

Edges-to-Faces

Normals-to-Faces

Edges-to-Cats-&-Dogs

Normals-to-Cats-&-Dogs

Example Frequency Analysis

We did frequency analysis via FFT to understand the frequency content in the output of our images.

Comments, questions to Aayush Bansal.