MIP Mapping

Multum in Parvo

First a bit of discussion about what mip mapping is, and why it’s a good thing. It exists to solve the problem of texture aliasing during minification. In simpler terms, reduce flickering, jaggies, and complete loss of some image details when an image is displayed smaller than its original resolution.

A GPU renders a texture by sampling it once at the center of each screen pixel. Since the texture is only being sampled once at each pixel, when the texture is being shown at a higher resolution than there are screen pixels, some details won’t be “seen”. This results in parts of the image effectively disappearing.

Now it’s possible to figure out how many texels are within the bounds of each pixel, sample all of those, and then output the average color. This would be a form of Supersampling. Offline rendering tools often do this, and it yields extremely high quality results. But it is also stupendously expensive. The below image which I’ll be using as a ground truth reference was generated using 6400 anisotropic samples per pixel.

Extreme Supersampling (200% Pixel Scale)

If rendered at 1920x1080 on an Nvidia GTX 970, this scene takes roughly 350 ms on the GPU (as measured with Nvidia Nsight), or about a third of a second per frame. And even this isn’t quite perfect. Really I’d need to use closer to 15,000 samples per pixel, but that brings the rendering time to multiple seconds per frame. That is the kind of quality that we’re aiming for, and it’s way too expensive for real time. So what if we skip that and just sample the full resolution texture once per pixel? It can’t be that bad, right?

No mipmaps, Bilinear Filtering (200% Pixel Scale)

Ouch.

Lines are popping in and out of existence in the foreground, and in the background it’s just a noisy mess. It would be even worse if the texture had more than just some white lines. This is what we mean when we say aliasing in the context of image sampling.

Mip mapping seeks to avoid both the problems of aliasing and the cost of Supersampling by prefiltering the image with multiple successively half resolution versions of the previous sized image. These are the mipmaps. Each image’s pixel is the average of the 4 pixels of the larger mip level. The idea is that as long as you pick the correct mip level, all of the original image details are accounted for. This successive half sizing is also why power of 2 texture sizes are a thing; it’s harder to make a mipmap for when a half size might not end up with a whole number value for the resolution.

The other benefit of mip mapping is memory bandwidth usage. While the overall texture memory usage is now roughly 33% larger due to the inclusion of the mip maps, when the GPU goes to read the texture, it only has to read the data from the appropriate mip level. If you’ve got a very large texture that’s being displayed very small, the GPU doesn’t have to read the full texture, only the smaller mip, reducing the amount of data being passed around. This is especially important on mobile devices where memory bandwidth is in very short supply. Technically a GPU never really loads the full resolution version of the texture at once out of its memory. Instead only a small chunk of it depending on what’s needed by the shader. The full resolution textures exists in the GPU’s main memory, and when a shader tries to sample from that texture the GPU pulls a small section of that texture into the L1 Cache for the TMU (Texture Mapping Unit, the physical part of the GPU that does the work of sampling textures) to read from. If multiple pixels all sample from a small region, then the GPU doesn’t have to fetch another part of the texture later and can reuse the one chunk already loaded. That’s what saves memory bandwidth.

So now, all a GPU has to do to prevent aliasing is pick the best mip level to display. It does this by calculating the expected texel to pixel ratio for both the horizontal and vertical screen axis at each pixel¹, and then using the largest ratio picks the closest mip level that would keep it as close to 1 texel to 1 pixel as possible.

The below image is a 256x256 texture with custom colored mipmaps being rendered at 256x256.

Colored Mipmaps, Bilinear Filtering (200% Pixel Scale)

The only time the full 256x256 top mip is sampled is when the quad is very close to the camera. Notice the floor is never showing the top mip. If this was rendered at a higher resolution, or the texture was a lower resolution, the top mip would continue to be shown until further away. Again, this is due to that 1:1 texel to pixel ratio the mip level calculations are trying to achieve. Those game analysis videos online that complain about texture filtering quality being reduced when they lower the screen resolution or a game uses dynamic resolution … the filtering quality is exactly the same, it’s just the resolution ratio that’s changing.

Isotropic Filtering

Lets go over the basics of texture filtering. Really there’s two main kinds of texture filtering, point and linear. There’s also anisotropic, but we’ll come back to that. When sampling a texture you can tell the GPU what filter to use for the “MinMag” filter and the “Mip” filter.

The “MinMag” filter is how to handle blending between texels themselves. Point sampling chooses the closest texel to the position being sampled and returns that color. Linear finds the 4 closest texels and returns a bilinear interpolation of the four colors.

The “Mip” filter determines how to blend between mip levels. Point simply picks the closest mip level and uses that. Linear blends between the colors of the two closest mip levels.

Most people reading this are likely familiar with Point, Bilinear, and Trilinear filtering. Point filtering is using a point filter for both MinMag and Mip. Bilinear is using a linear filter for MinMag, and a point filter for Mip, as you probably guessed.

Bilinear Filtering (200% Pixel Scale)

As you can see, Bilinear shows clear jumps between the mip levels, both on the floor and as the quad moves forward and back. The point at which the changes happen is important, as the texture isn’t fully scaled down to the next mip size, but roughly 40% larger. This leads to the changes not only being abrupt, but for the texture to be obviously blurry when the change occurs.

Trilinear is the same linear filter for MinMag as Bilinear filtering with the addition of a linear filter for Mip, hence the name. This hides the harsh transitions between mip levels, but the blurring still remains as the next mip still starts being faded in early.

Trilinear Filtering (200% Pixel Scale)

This is the downside of mip mapping. Mip maps only accurately represent the image at exactly those half sizes. In between those perfectly scaled mip levels, the GPU has to pick which mip level to display, or blend between two levels. But remember, mip mapping’s goals are to reduce aliasing and rendering cost. It has technically achieved that, even if it’s at the expense of clarity.

The most obvious issue is how blurry the floor gets. This is because mipmaps are isotropic. In simple terms, each mipmap is scaled down uniformly in both the horizontal and vertical axis, and thus can only accurately reproduce a uniformly scaled surface. This works well enough for camera facing surfaces, like the rotating quad in the examples. But when viewing a surface that isn’t perfectly facing the camera, like the ground plane, one axis of the displayed texture is scaling down faster than the other in screen space. The ground plane has non-uniform, or anisotropic, scaling. Mipmaps alone don’t handle that case well. As I mentioned above, GPUs pick the mip level based on the worst case as the alternative would cause aliasing.

Anisotropic Filtering

Anisotropic filtering exists to try to get around the blurry ground problem. Roughly speaking it works by using the mip level of the smaller texel to pixel ratio, then sampling the texture multiple times along the non-uniform scale’s orientation. The mip level used is still limited by the number of samples allowed, so low Anisotropic levels will still become blurry in the distance to prevent aliasing.

Anisotropic Filtering quality level comparison (200% Pixel Scale)

However the overall result is much sharper ground textures even at lower settings. To my eyes, 4x or 8x are good options for improving the sharpness over a significant portion of the ground at this viewing angle and resolution.

Anisotropic Filtering 8x (200% Pixel Scale)

Looks pretty good, right? So I guess we’re done? Well, not so fast. Let's compare back to the “ground truth”.

“Ground Truth” Supersampling (200% Pixel Scale)

Notice how much sharper it is still? Especially the quad? No? Wait, I know it’s hard to compare those two when they’re not next to each other. How about this.

Anisotropic Filtering 8x vs “Ground Truth” (200% Pixel Scale)

Anisotropic filtering helps a lot with textures that are angled away from the camera, but directly facing is actually no different than Trilinear! Even the near ground plane isn’t quite right. Look closely at that center line and you’ll see there’s a little bit of blurring there still with Anisotropic filtering.²

So now what?