With the end of 2018 approaching and preparations for Danbooru2018 starting, I’d like to highlight initial uses. The release of Danbooru2017 made a minor splash on Reddit, and there have been many downloads: several dozen copies have been seeded via BitTorrent with an unknown total number of downloads, and there have been ~600 downloads via rsync (or attempts to, at least, the logs suggest many people did not download full copies).

What uses have been made so far? Primarily generative models. No one seems to have attempted a full-strength tagger, but it has mostly been applied to GANs, and various kinds of image transformations.

Perhaps the most exciting & impressive application so far is the style2paints project, which trains a CNN to colorize anime images with remarkably high quality, and even provides an in-browser interface for easy use - although due to popularity it is often too over-loaded to use. (Discussion: 1/2/Twitter samples.) The paper style2paints for V3 explains how this is done: regular anime images are ‘corrupted’ by superimposing random colors, and the CNN is trained to undo it. It’s quite a tool for hobbyist artists, and even professionals might find it useful.

Projects:

Publications:

“Improving Shape Deformation in Unsupervised Image-to-Image Translation”, Gokaslan et al 2018: Unsupervised image-to-image translation techniques are able to map local texture between two domains, but they are typically unsuccessful when the domains require larger shape change. Inspired by semantic segmentation, we introduce a discriminator with dilated convolutions that is able to use information from across the entire image to train a more context-aware generator. This is coupled with a multi-scale perceptual loss that is better able to represent error in the underlying shape of objects. We demonstrate that this design is more capable of representing shape deformation in a challenging toy dataset, plus in complex mappings with significant dataset variation between humans, dolls, and anime faces, and between cats and dogs.

“Two Stage Sketch Colorization”, Zhang et al 2018: (on style2paints, version 3) Sketch or line art colorization is a research field with significant market demand. Different from photo colorization which strongly relies on texture information, sketch colorization is more challenging as sketches may not have texture. Even worse, color, texture, and gradient have to be generated from the abstract sketch lines. In this paper, we propose a semi-automatic learning-based framework to colorize sketches with proper color, texture as well as gradient. Our framework consists of two stages. In the first drafting stage, our model guesses color regions and splashes a rich variety of colors over the sketch to obtain a color draft. In the second refinement stage, it detects the unnatural colors and artifacts, and try to fix and refine the result. Comparing to existing approaches, this two-stage design effectively divides the complex colorization task into two simpler and goal-clearer subtasks. This eases the learning and raises the quality of colorization. Our model resolves the artifacts such as water-color blurring, color distortion, and dull textures. We build an interactive software based on our model for evaluation. Users can iteratively edit and refine the colorization. We evaluate our learning model and the interactive system through an extensive user study. Statistics shows that our method outperforms the state-of-art techniques and industrial applications in several aspects including, the visual quality, the ability of user control, user experience, and other metrics.

“Application of Generative Adversarial Network on Image Style Transformation and Image Processing”, Wang 2018 Image-to-Image translation is a collection of computer vision problems that aim to learn a mapping between two different domains or multiple domains. Recent research in computer vision and deep learning produced powerful tools for the task. Conditional adversarial networks serve as a general-purpose solution for image-to-image translation problems. Deep Convolutional Neural Networks can learn an image representation that can be applied for recognition, detection, and segmentation. Generative Adversarial Networks (GANs) has gained success in image synthesis. However, traditional models that require paired training data might not be applicable in most situations due to lack of paired data. Here we review and compare two different models for learning unsupervised image to image translation: CycleGAN and Unsupervised Image-to-Image Translation Networks (UNIT). Both models adopt cycle consistency, which enables us to conduct unsupervised learning without paired data. We show that both models can successfully perform image style translation. The experiments reveal that CycleGAN can generate more realistic results, and UNIT can generate varied images and better preserve the structure of input images.

Danbooru2018 will add another >350k images to the archive, and a large number of additional tags. With any luck, 2019 will see an application as exciting as style2paints v4!