Recreating 3D texture details from one 2D texture image with artificial intelligence. Using the neural network to generate a normal map and a displacement map from one given 2D color texture.

Introduction:

This is my second deep learning project. In this project, I built and trained a neural network to generate a normal map from a given 2D color texture. This maps can be used in 3D rendering engine to recreate 3D texture details and increase the realism of a 3D rendered graphics image. My model is a cheaper and better method of generating normal maps for Computer Graphic (CG) industry. My model creates a similar result to the one from the manual method of 3D scanning that currently has the best outcome in the CG industry. At the same time, it saves the week-long process of 3D scanning and postprocessing in creating a high-quality 3D texture pack. Comparing to other automatic methods of generating normal maps such as the Normal Map Effect in Adobe Photoshop, my method delivers a more realistic result.

Below is the comparison between a sphere rendered with only color texture applied and a sphere render with color texture, normal map texture and displacement texture applied. Normal map and displacement map provided 3D lighting and spatial information to the render engine to recreate 3D details such as bumps, and cracks.

Click Demo link to try the generator in your browser

(currently doesn’t support mobile)

Motivation:

As a designer and an artist, I am very interested in creating realist computer rendered 3D images. Mimicking the materials of the real physical object is important for generating a convincing 3D rendering. To best recreate the surface of an object we need a good shader with high-quality texture maps including color texture, normal map, displacement map, specular map, etc. However, a good normal map and displacement map is hard to find on the internet and time consuming to create.

Poliigon is a website that sales high-quality material packs for designer and artist. It uses the photo scanning techniques to create those textures. Hundreds of photos were taken from different angles and software was used to calculate the 3D replica of that material. Finally, that 3D model is baked into different maps for that material. This method creates an impressive result, but it can take a skilled artist 4 to 7 hours to create one material with the photo scanning method according to Poliigon. The time-consuming nature of this process causes those materials being relatively expensive and not affordable for a non-commercial artist to obtain multiple materials.

Despite the Poliigon offers hundreds of different incredible material packs ranging from dirt to fabrics. Sometimes, an artist still can not find the specific material that he or she is looking for. A 2D high-quality color texture is very easy to find online, there are some options to automatically generate a normal map and a displacement map from a 2D color texture. However, the existing applications’ outcomes are not ideal. The linear way of calculating the normal map cannot produce very realist result in the rendering.

The Poliigon’s material library is a perfect dataset for training a neural network to generate normal maps and displacement maps, so I downloaded some materials from the Poliigon website and trained my Normal map generator with those images.

Normal Map:

A normal map is a purplish image that adds 3D shading details to a low poly surface. It saves the computational power required to produce the realistic 3d renderings in games and movies. The behavior of the bumps on a surface reacting to light is precalculated into a normal map with different color channels. A Computer interprets the color of a normal map as a guide of how the surface will react to the light. This link explains the detail of the algorithm of the normal map very well.

Displacement Map:

A displacement map is a grey scale image that tells a render engine how to deform a mesh surface. Comparing to a normal map and a bump map, a displacement map is more demanding on computation power. It divides a surface into smaller pieces and physically moves them up and down depending on the brightness of the map. The brightness of the image reflects the height of the deformed geometry.

Dataset:

I downloaded a very high-quality rock material library from Polygon. Each material includes a color map, normal map, displacement map, and specular map. I focus on the generation of the normal map first, so I will only use the color map and normal map. I use the color map for x and normal map for the ground truth.

Each image is around 6000 pixels wide. To increase the dataset size I cropped and saved each image into 224*224 pixels with OpenCV. I ended with 4000 samples in the training data.

Model:

First I experimented with autoencoder, the encoder is a convolutional layer, batch normal, ReLu and downsampling structure. The decoder is a convolutional layer, batch normal, ReLu and upsampling structure. I trained the model for 100 epoch. It took about half an hour to finish training on a 1080Ti. The result is promising, but the inference looks like a blurred version of the ground truth. Unfortunately, the blurred normal map generated an unnoticeable result in a rendering.

Then I experimented with a GAN network. I used the same autoencoder structure for the generator in the network and a stride convolutional layer, and dense layer for the discriminator network. It took about one hour to train 100 epoch on a 1080Ti. The result is very good, the discriminator network allows the GAN network to generate clear images. the inference from the training set is extremely close to the ground truth with a glance.

When compared with the normal map effect in Adobe Photoshop, the AI model creates a superior normal map. The linear models focus on the small details of the texture and ignore the large structure and the overall relation in the texture.

The neural network was only trained on a very limited range of rock textures. Surprisingly, it generates an acceptable result for many other materials, such as wood and fabric. Even when a picture of a human face is trough into the network it generates a promising result that shows some spatial structure in the image.

Model:

The using of the neural network on generating a normal map is very promising for me. Please check out the real-time AI normal map generator powered by Tensorflow.js. I would love the hear from some options from CG enthusiasts and professionals.

I also experimented on using a neural network to generate a displacement map. The displacement map expresses more abstract information on the texture, so it requires a deeper network to produce a usable map.

Normally, the texture maps that are used in 3D rendering are large (more than 1200 pixels). The Model that is used here is an FCN, fully convolutional network. Without a dense layer, it can accept any size image input that is larger than the convolution kernel. After experimenting with different sizes of image input, the normal map output becomes overly focus on the detail and seems flat when the input size is over 800 pixels. I believe this is caused by two factors. First, the training sample size is fixed to 224*224 pixels. Random sizes of training samples may resolve the problem. Secondly, a different size of convolution kernels can also improve the performance of the network on large size input images.

In the future, I will train the model with a larger dataset and a larger variety of textures to make it more stable on different textures. As a designer and an amateur AI trainer, I have limited knowledge of programming and machine learning. Please let me know any problem with my project. I recently found out a paper published this year that is using AI to solve the same problem, please check them out.