Computers have been able to make pictures that look like real life for years now, however the hurdle comes when we want them to do it quickly. In the setting of a video game, for instance, we expect the machine to make these pictures sixty times per second, consistently. This article covers how we can, with the current hardware, make this “real-time” (read: fast) rendering a little more realistic.

Example of a rendered scene

The first thing that should be understood is how computers even draw complex pictures that quickly to begin with. It’s done with “physically based rendering”, a concept made more than a decade ago based around approximating how light interacts with physical surfaces. It uses 3D models (shapes), lights, and “surface materials” to draw a picture. The surface materials are what we are going to be focusing on — number values describing color, roughness, and the metallic-ness of a surface at given points.

For instance, say we wanted to describe an apple (like the above). You might say this apple’s surface is “red and shiny”. With our rendering model, we would give the surface materials high red values and low roughness values (“red” and “shiny”), but low blue, green, and metallic values.

A cartoonish apple render

Previously, I made a device (link here, some discussion here and here) that derived another image used in rendering, the “normal map”, to see how easy it is to get this data from real life. Getting the normal map worked fine, but “normal maps” don’t exist in real life. We don’t need to know about them for this article because of that. However, what was made apparent by emailing multiple researchers about it was that deriving the other images (images that do have real-life equivalent effects) from photographs or otherwise can’t be done, or would at least be very difficult. This is probably because it wasn’t based on real life to begin with — physically based rendering just provides a fast estimate, loosely based on real life optics properties.

Summary of images used in Physically Based Rendering. For the purposes of this article, disregard “NORMAL” and “AO” images, which have to do more with the shape of an object than the material.

However, we can still derive the “right” values in a way. Suppose someone were to tell you to guess the decimal number they’re thinking of between one and one hundred. If you kept guessing, eventually you’d get pretty close. The way this applies to computer graphics follows logically. Out of all the possibilities of rendering values, which we will go over in a bit, supposedly one of these renders is “closer” to a real life photograph than all the rest. Because these things that are thought of as qualitative — color, roughness, and metal-ness — are actually numbers in the case of rendering, this could work.

Those of you familiar with rendering might throw out this idea immediately, and for good reason — there are a lot of variables that go into rendering, like 3D shapes, lighting, different material maps, and so on. What would make the idea practical is if a lot of these variables were taken out — or “held constant”. To not worry about the shape anymore, for instance, scan a 3D model of the real life object. To not worry about lighting or placement, recreate the scene in 3D and control for lighting and distance by having, say, one point light. In essence, we would replicate everything in a controlled real life scene in a controlled 3D scene we would make, other than surface materials (or colors) — our rendering values.

Courtesy of a paper from the Technical University of Denmark, linked about below

A previous implementation of this idea, where variables such as distances and background are held constant between rendering and a photograph.

Let’s say we get to this point — and it can certainly be done, as it has been done before — what then? How difficult would it be to compute the most realistic rendering values possible? How many possible renders are there to compute?

To find the number of renders we need, we can think of the problem in terms of one pixel. We can do this because we can assume the output of one pixel is not affected by the output of another in our controlled 3D render (unlike the image above), which may be easier said than done. However, after that, we can apply our changes uniformly. For instance, to test if a certain shade of blue is correct somewhere, change the entire surface of the object to those values and compare it with the real life equivalent. Because no pixel outputs are affected by any others, if we do this for all the inputs, we have exhausted all the possibilities. So, we can think of the problem in terms of one pixel because of this — how many possible inputs are there for one pixel?

The RGB color range, every color your screen can represent

Let’s think about that for a bit. First, there is color. These values account for three out of five of our rendering values, numbers from zero to 255 for red, green, and blue (a blue value of zero is not blue at all, and a blue value of 255 is the most blue that a color can be — and the same applies with our red and green values). There are 256 ^ 3 possibilities there — or around 16 and a half million. This range of possibilities is represented above in the RGB color space. We will look at how long this might take to compute later, but for now we want a rough estimate of the number of possibilities.

An increasing metal value — various aspects of reflection change

An increasing roughness value — light scatters more

How the range of possibilities for this type of rendering was calculated

Next, there are the other two rendering values in our shading model — metalness and roughness. Metalness defines how “metal” something is at a given point, because metallic objects in particular behave differently with light. Roughness defines how light is distributed at a given point — is the object really shiny, or really dull? More like an apple or sandpaper? These two, metalness and roughness, are also defined by numbers zero to 255 per pixel. That increases our total number of possibilities from 256 ^ 3 to 256 ^ 5, or about 1.1 trillion rendering possibilities, as pictured above. That is a very large number — but there are ways we can sort of skirt around computing that much, or at least compute it in a reasonable amount of time, as I will explain later.

A geometrical representation of the distance formula

A diagram of how the distance formula applies

For now, let’s look at the function that determines how close a given render is to a real life photograph. If you have taken high-school math, say geometry or calculus, you may already be familiar with this — it’s called the distance formula. Given two points in 3D space, we can determine their distance from each other. You might say, “but that’s a distance between 3D points, we are talking about a render and a photograph!”, and you would be correct. However, every point or pixel in a render, and in the photograph, correspond to a color — one of 16.7 million values as we mentioned previously — and those do correspond to a 3D point, in what is known as the sRGB range or 3D space. Instead of (x, y, z), points are (R, G, B). And so, to measure the difference in color, or distance between these points, the distance formula absolutely applies. The lower this number is, the more “real” we can consider a render to be at a given point. In other words, we want to minimize this number at all points. This should satisfy the worry of measuring how “real” a render is, as stated by one of the creators of “Physically Based Rendering” when I emailed them about my previous project.

From a high level, then, the program that would need to be made to generate the right renders and make the closest possible “surface materials” would be trivial. Start with a model, a light, and a corresponding real life picture of the object. Make the render with one of the 1.1 trillion possible rendering values on the surface of the object, uniformly. For each of these renders, compare the output color values with the photograph’s color values, and change the material values if the colors are closer to the photograph’s values than the previous renders. Repeat for all possible rendering values.

The logo of OpenGL, a popular graphics library

From a low level, however, there are a number of hurdles. I, for one, don’t have enough knowledge about the graphics libraries available (such as OpenGL and Vulkan) to implement this. Just look at how much information is needed to draw a triangle! If anyone works with graphics routinely, and is interested in helping make a working demonstration of this, I encourage you to contact me (my email is at the bottom of this article). Otherwise, anyone is free to try this themselves.

Our test render, with one object and a few point lights, which changes rendering inputs uniformly every frame (every render)

Our apparent rendering rate (number of renders per second), which gives us some idea of how long this would take

Secondly, there is the issue of the sheer number of renders that need to be computed. After these computations, the materials are available for real-time use, but before then a lot of computational power is needed. To give a rough estimate of how long it might take to make 1.1 trillion renders, I ran a little experiment on my machine. I loaded a simple model with changing PBR (physically based rendering) materials and one point light and let it sit in a bare-bones renderer (pictured above. Raylib, the library pictured, is available here: (https://www.raylib.com/)). It made around 3000 frames, or renders, per second, assuming the counter is accurate. That’s a lot, but keep in mind much of the logic necessary to compare the renders and the photograph at the same time has not been implemented yet, so the real render rate may be much slower. Going with this 3000-rate, however, it would still take a lot of time if the goal is to make 1.1 trillion renders — about 11 years. Solutions to this problem include using some sort of server farm, a faster GPU (this system used AMD’s Vega 64), or making less renders to begin with. The last option seems the most reasonable — instead of computing 256 ^ 5 possibilities, 64 ^ 5 would take much less time and still be accurate to within plus-minus two units of each rendering value. 64 ^ 5, or 1.07 billion renders, would take around four days to make on the same machine. This would at least be sufficient to prove the concept.

A visual representation of the pigeonhole principle — notice that there are two pigeons in the first pigeonhole

The last hurdle is a logical one — suppose you had ten things, and nine boxes to place them in. It would stand to reason that at least one of the boxes will contain at least two things. This is known as the pigeonhole principle. Following the same logic, because there are far more possible inputs than outputs (256⁵ input rendering values vs 256³ output color values), more than one of the sets of input values can produce the same output result. This means that we have not obtained the most accurate rendering values, but one of the accurate values from that lighting angle. Because the output of the same input values changes with the angle of light — this is the purpose of physically based rendering to begin with, and can be conceptualized best with metallic objects, for instance — it would make sense to repeat the process as is but with a different lighting angle. That way, it can be deduced which values are correct from not only one lighting angle, but two. This could avoid the previously mentioned issue. It remains to be seen if this would have any effect, or if even more lighting conditions would be needed.

Lastly, I’d like to say this is an oversimplified version of what the real-life experiment would need. Details I’ve skipped include an object-telecentric camera lens, mapping locations on the render to UV-map locations, and the use of a material like black-3.0. However, I’ve gotten the general idea across. The end result would be the most true-to-life rendering values possible given our real time shading models, which will either give us an amazing real time render or tell us how the shading model can be improved. Either way, it’s a win-win.

If you thought this idea was interesting, or you have in-depth graphics knowledge that would make this possible, or have seen experiments that are even remotely similar, I can be contacted at mriadzaky [at] fordham [dot] edu.