Stepping Down Up until now, we've been exploring the X11 Window System from a somewhat high level. We've talked about Windows and Drawables and Pixmaps but we're always talking about them in terms of the protocol. "You draw on a Drawable with protocol extension RENDER," or "you can configure a window with the protocol request ConfigureWindow", but I haven't really gone into any detail about how that's implemented. I think it's about time we dropped from the abstract to the concrete. Today's article won't be about X11, or even the data structures that are inside of it. Today, we're going to start to unravel 2D Graphics. You can explore the things I'm going to talk about on any operating system, in any programming language, and even on paper. I often see comments like "oh, if only SVG was implemented with WebGL, then it would be fast", and realize that people do not quite understand the challenges and complications of 2D graphics rendering. Despite it going against your intuition, fast, good looking 2D graphics are actually harder and more computationally expensive to accomplish than 3D graphics, at least on traditional consumer GPUs. I hope to eventually explain why, but unfortunately that won't come today. Even though fast, good-looking 2D graphics rendering is an extremely difficult problem, the basics and principles behind it aren't. Today, we will be exploring the basics of 2D graphics by writing a software rasterizer of our very own, from the ground up. You should be able to write one of your very own at the end of this article! All of the code you see here was written for this article, from nothing more than basic principles and ideas. It's open source on GitHub, and I've tried to and document it well as a source to learn from. I hope you're as excited as I am to get started! So, without further ado...

The Pixel Grid The X11 articles take place in a hand-written X11 server. Regional Geometry took place, well, in the land of geometry. Today's article takes place... ... on the pixel grid! If you can peel your eyes away from the absolute graphical marvel I've created above, we're going to dig into exactly what's going on inside those little squares, and how they interact. This series will go on for multiple parts. Today, we're going to start with the basics of 2D graphics: the graphics buffer and its layout, abstract shapes, a basic introduction to sampling theory, lerping and blending colors, transparency, and end it off by adding a bit of antialiasing! A bit of note about the interactive demos on display in here today. First, I'm presenting a "zoomed in" buffer so you can see the pixels easier. Specifically, each "demo pixel" here is 16 "real pixels" wide, and the same tall. The grid itself is composed of 46 of these "demo pixels" horizontally, and 10 of them vertically, which are basically a few arbitrary constants I chose to make it fit nicely inside these margins, which are 800 "real pixels" wide. It might hurt your head to think about "demo pixels" and "real pixels". Graphics programming often takes place in lots of different coordinate spaces like this. After quite a long time of doing it, I still get confused; it's just part of the job description. If you are having trouble, what helps me is to just turn off the monitor, grab some physical pen and paper, and just draw it out. Often times, just drawing and labelling the parts often shows me my confusion. What doesn't help is blind trial and error. If you are serious about graphics programming, you will eventually reach a point when things go wrong, and you'll start peppering x*8 and (y+15)/8 into your code in a vain attempt to just get everything to match up correctly. You will start fiddling with your plus and minus signs, wildly reversing your translations and rotations at random in desperation. You might find yourself even getting close, but later on, finding that it breaks something else. It's OK, and we've all been there. I did it multiple times writing this article. Just take a break, come back, and try to figure out what's really going on. ... To prevent any confusion like this, let's start at the beginning.

Coordinates While I'm sure everyone here doesn't need this refresher, let's over some notation and basic math for how to address pixels in this grid. We often treat pixel buffers as large arrays in memory, and we usually leave them one-dimensional, and do the math ourselves to find a specific index for a given position. Buffer layouts in the real world have a lot of subtleties: stride alignment, endianness, pixel formats. If you don't know what any of those things are, don't worry, I'll explain them later. In this article series, we'll be using the convention established by HTML5's ImageData API. It sees this pixel grid as a giant, one-dimensional array of bytes, with each pixel taking four bytes: red, green, blue, alpha, in that order. We can find the index of the first byte of a pixel, and vice versa, with some very simple math: var BYTES_PER_PIXEL = 4; function indexForPixelLocation(imageData, x, y) { return (y * imageData.width + x) * BYTES_PER_PIXEL; } function pixelLocationForIndex(imageData, idx) { var pixelIdx = Math.floor(idx / BYTES_PER_PIXEL); var y = Math.floor(pixelIdx / imageData.width); var x = pixelIdx % imageData.width; return { x: x, y: y }; } Familiarize yourselves with what these functions do — they convert from the x and y coordinates of pixels on the grid to their index into the array of pixels. Pixels go from top left to bottom right, first in the X direction, then in the Y direction. The top left of the pixel grid is at 0, 0 at index 0. The next pixel in the array, which starts at index 4, is located directly to the right.

Let's Draw a Rectangle Now that we've established ourselves with the format of the pixel grid, let's try drawing a rectangle. For simplicity's sake, let's just fill it with black for now. function fillPixel(imageData, x, y) { var idx = indexForPixelLocation(imageData, x, y); imageData.data[idx + 0] = 0; // Red imageData.data[idx + 1] = 0; // Green imageData.data[idx + 2] = 0; // Blue imageData.data[idx + 3] = 255; // Alpha } function fillRectangle(imageData, x1, y1, width, height) { for (var y = y1; y < y1 + height; y++) for (var x = x1; x < x1 + width; x++) fillPixel(imageData, x, y); } And let's try it out! This should be pretty straightforward, but there are a few peculiarities I do want to go over. First, it may strike some of you as odd to iterate over the "y" first. Graphics programmers often think about things in rows. This is a holdover from early computer graphics, and it's for performance reasons — if you think about the memory layout of our pixel grid, you'll notice that I'm iterating over the indexes in order. While RAM does stand for "random access memory", CPUs cheat and have things called "caches". Just know that it is cheaper to write to indexes in order than it is to actually access randomly. You will often see this pattern, iterating over the rows, rather than the columns, come up in graphics algorithms, even for things that aren't pixel grids or such. This will become more apparent when we start going over more complex topics. It should also hopefully be pretty clear how to replace this "black" with another color, so I won't bother explaining that. I will, however, up the ante. Let's try filling this rectangle with something a bit more fancy. ... Let's try a gradient.

Space and Time Gradients aren't actually that tricky, but we do need some basic grounding in one of the most fundamental concepts of computer graphics: linear interpolation, or, the "lerp" for short. Yes, lerping is that important that we give it a special abbreviation, which can even be used as a verb. It's actually a simple concept. A lerp takes two values, a position parameter, often called time, and returns something in between. A time of 0 gives you the first value. 1 gives you the second value. 0.5 gives you the value halfway in between both. function lerp(a, b, t) { return (a * (1.0 - t)) + (b * t); // It's also sometimes written as: // return a + ((b - a) * t); // ... which might be easier to read for some people. // The two are mathematically equivalent. } function draw(imageData, secs) { var startX = 1; var endX = 38; var x = Math.floor(lerp(startX, endX, secs)); var y = 1; fillRectangle(imageData, x, y, 8, 8); } A few more notes. This time variable is often called "t", but I've also seen "position" or "pos", and "alpha". I don't like "position" because I use that to mean a point on our pixel grid, and I don't like "alpha" since it's confusing when we already have an "alpha channel", which is completely unrelated to the lerp here. The time parameter, "t", is between 0 and 1. In this case, we derive it from the number of seconds that have passed in the animation, so it is quite literally "time". One obvious thing we can do is to warp time. We can multiply to speed it up, divide it to slow it down, but we can also warp it in more fancy ways. For instance, to make it slow down near the ends, we can warp "t" by passing it through a famous easing curve known as "smoothstep": function lerp(a, b, t) { return (a * (1.0 - t)) + (b * t); } function smoothstep(t) { return t*t*(3 - t*2); } function draw(imageData, secs) { var startX = 1; var endX = 38; var smoothSecs = smoothstep(secs); var x = Math.floor(lerp(startX, endX, smoothSecs)); var y = 1; fillRectangle(imageData, x, y, 8, 8); } ... which looks a bit easier on the eyes. The etymology behind the time "t" parameter is derived from the math-y "f(t)" sense of the word "time". We don't have to use an input that corresponds to time passing in the real world: we can use any value between 0 and 1 as the input. We also can choose to lerp things other than position. For instance, we can lerp between two colors with just a bit more code. function newRGB(r, g, b) { return { r: r, g: g, b: b }; } // Lerp between colors "color1" and "color2". function lerpRGB(color1, color2, t) { var newR = lerp(color1.r, color2.r, t); var newG = lerp(color1.g, color2.g, t); var newB = lerp(color1.b, color2.b, t); return newRGB(newR, newG, newB); } This isn't doing anything more fancy than just doing a lerp across all three components in a color, and if we draw a ton of 1px-wide rectangles that all use this: function fillPixel(imageData, x, y, rgb) { var idx = indexForPixelLocation(imageData, x, y); imageData.data[idx + 0] = rgb.r; imageData.data[idx + 1] = rgb.g; imageData.data[idx + 2] = rgb.b; imageData.data[idx + 3] = 255; // Alpha } function fillRectangle(imageData, rgb, x1, y1, width, height) { for (var y = y1; y < y1 + height; y++) for (var x = x1; x < x1 + width; x++) fillPixel(imageData, x, y, rgb); } function draw(imageData) { var startX = 1; var endX = 45; var y = 1, width = 1, height = 8; var red = newRGB(255, 0, 0); var blue = newRGB(0, 0, 255); for (var x = startX; x < endX; x++) { var t = (x - startX) / (endX - startX); var rgb = lerpRGB(red, blue, t); fillRectangle(imageData, rgb, x, y, width, height); } } ... we end up with a smooth transition between the colors, also known as a "linear gradient".

Styling our Rectangle Let's try applying this knowledge to our rectangle drawing code. The biggest change here is that we'll need to modify what colors we draw based on the position in the image. To help us out, let's introduce a new concept, known as the "fill style". This is a function that takes a position, and returns a color for that position. The simplest possible fill style is a radial gradient, whose color depends on the position from the center. // Basic fill style. function newSolidFill(rgb) { return function(x, y) { // A solid fill returns the same color, no matter the position. return rgb; }; } function newRadialGradient(centerX, centerY, radius, centerRGB, edgeRGB) { return function(x, y) { // Calculate distance from the center point. Basic Pythagoras. var distX = x - centerX, distY = y - centerY; var distance = Math.sqrt(distX*distX + distY*distY); // If we're outside the circle, then just return the color at the edge. // This is a choice -- we could instead choose to repeat or ping-pong // between the colors. if (distance >= radius) return edgeRGB; // Translate the [0, radius] ranged value to a [0, 1] ranged value // so we can lerp the colors. var t = distance / radius; return lerpRGB(centerRGB, edgeRGB, t); }; } // The same code as above, but slightly adapted to handle fill styles // and custom colors. function fillRectangle(imageData, fillStyle, x1, y1, width, height) { for (var y = y1; y < y1 + height; y++) { for (var x = x1; x < x1 + width; x++) { var rgb = fillStyle(x, y); fillPixel(imageData, x, y, rgb); } } } function draw(imageData) { var x = 19; var y = 1; var red = newRGB(255, 0, 0), blue = newRGB(0, 0, 255); var gradient = newRadialGradient(x + 4, y + 4, 6, red, blue); fillRectangle(imageData, gradient, x, y, 8, 8); } The biggest complexity here is figuring out which "t" to use. In the case of a radial gradient, it's actually just the distance from the center point, normalized against the radius. As an exercise, try working out, using this base, how to have arbitrary gradient stops at different time values, rather than just two colors at the starts and ends.

Drawing Other Shapes Now that we've become familiarized with the concepts of rendering boxes and fills, let's try our hand at something a tad bit more fancy — rendering other shapes. To start with, let's try a circle. We just learned above how to calculate the distance of a point x, y against the center of a circle with any radius. Once we have that, we can just run over our pixel grid, test when we're "inside" the circle, and if so, choose to fill in the pixel. The only complication here is that we have to pick some start and end bounds for where we start iterating. We could use our entire pixel grid, but we know a bunch of pixels will never be filled in. We need a tight set of pixels. Thankfully, for a circle, it's quite easy to compute. By definition, the left edge ("x1", since it is the left X coordinate) is the X coordinate of the circle's center minus the radius, and the right edge ("x2") is the center plus the radius. The same thing happens for the top and bottom edges. function fillCircle(imageData, fillStyle, centerX, centerY, radius) { var x1 = centerX - radius, y1 = centerY - radius; var x2 = centerX + radius, y2 = centerY + radius; for (var y = y1; y < y2; y++) { for (var x = x1; x < x2; x++) { var distX = (x - centerX), distY = (y - centerY); var distance = Math.sqrt(distX*distX + distY*distY); if (distance <= radius) { var rgb = fillStyle(x, y); fillPixel(imageData, x, y, rgb); } } } } Oof. What happened here? This doesn't look very... circular. Is it an off-by-one error? It looks like it might be. It's tempting to brute force your way to a possible fix: changing the <= above to a single <, peppering various + 1 and -1s throughout. But, trust me on this, this approach won't net you anything. It's actually not an off-by-one error, it's an off-by-0.5 error. "Wha?" I hear some of you cry. There is actually nothing wrong with the above code in the abstract. Instead of this being a simple implementation detail change, the issue with this algorithm is actually a more fundamental and conceptual one, one that causes us to rethink a bit about how we're imagining the pixel grid. You might also have spotted it above in the radial gradient example: the gradient isn't centered, it's sort of down and to the right. These are both related problems.

Sample Location When we do something like the algorithm above — iterate over a bunch of pixels, and then test whether a pixel should be in or out, we're making use of a concept known as sampling. Basically, we have some functional concept of a shape, like a circle, and we can give it different x and y points and it tells us whether they are inside or outside of the circle. But what do these points actually mean? Well, we know that in our pixel grid, each one of these numbers corresponds to a pixel. But have we thought about the abstract space where these circle descriptions live? We're talking about a circle centered at, let's say, 5, 5, with a radius of 10. We have to have some concept of mapping this abstract space to the pixel grid. Up until now, we haven't thought about this and have been hacking it together based on what makes sense. But to get this right, we need to think more closely about the relationship between the two. I'm going to cheat, and for my next figure, show you what this abstract space circle looks like we've been using so far, laid on top of the pixel grid. Maybe the revelation is clear now about what's actually going on. If not, don't worry. Right now, when we test each pixel against the abstract space circle, we've been testing whether the top left of each pixel's square is inside the abstract circle. When you think about it, though, that doesn't quite make sense. Really, what we're trying to ask is "is more than 50% of the pixel square inside the circle". Using something like the pixel's center would more accurately answer that question. function fillCircle(imageData, fillStyle, centerX, centerY, radius) { var x1 = centerX - radius, y1 = centerY - radius; var x2 = centerX + radius, y2 = centerY + radius; for (var y = y1; y < y2; y++) { for (var x = x1; x < x2; x++) { var distX = (x - centerX + 0.5), distY = (y - centerY + 0.5); var distance = Math.sqrt(distX*distX + distY*distY); if (distance <= radius) { var rgb = fillStyle(x, y); fillPixel(imageData, x, y, rgb); } } } } And indeed it does! By adjusting our fillCircle function to test distance against the pixel's center in the abstract space, we get something that looks a lot more circular. Since pixel centers are halfway between pixels, all we need to do is add 0.5 to both dimensions before calculating the distance. This also explains the bizarre gradient bug we were seeing earlier. When we were comparing distances in the gradients, we were also comparing distances with the top left of the pixel, rather than the pixel's center. I'll leave it as an exercise to the reader to add the necessary + 0.5s to fix that one. If you've worked with graphics APIs before like HTML5 <canvas>, you might have had to add these 0.5 increments yourselves, e.g. to lines, to make the resulting line look sharp. This is because a line is basically a "thin" rectangle which is lineWidth wide, and centered upon the position you give it. HTML5 <canvas> also uses this "pixel center" sampling strategy. Giving it a bit of thought (imagine a tall, skinny rectangle "growing" in width from a pixel's center until it becomes two pixels wide), and it should be obvious why you need to add 0.5 offsets yourself.

Transparency and Blending OK, so now we have basic shapes, and basic fills. What would be really cool is to try to add transparency, and to blend multiple shapes together. This is actually easier than you think, at least for a toy implementation. But first, the theory. As a term stolen from the visual effects industry, blending multiple transparent shapes together is formally called "alpha compositing", though I don't like that term very much, for reasons that will become clearer as the rest of the article goes on. I prefer "blending". For our toy implementation, we'll make some simplifications. Our resulting pixel grid is designed to be displayed directly on my monitor. Since my monitor isn't transparent (at least isn't yet!), we won't bother changing its storage, and just assume it's full opaque. However, we'll add a new parameter to our fill color: alpha, or the "A" in "RGBA". Alpha is commonly taken to be identical with "transparency", though this isn't fully accurate, as we'll see later (and one of my personal pet peeves)! If we wanted to take an image we rendered and then reuse it, transparency and all, it would definitely be appropriate to retain the alpha channel. This makes the math a bit too complicated for this basic introduction though, so I'll cheat and omit it for now. We'll also be a bit more formal about our fill parameters. We'll call the fill pattern the "source image", and we'll call the pixel grid that's being filled the "destination image". To blend an RGBA "source image" into the opaque "destination image", instead of just setting the values like the old fillPixel did... we lerp them! function newRGBA(r, g, b, a) { return { r: r, g: g, b: b, a: a }; } function getPixel(imageData, x, y) { var idx = indexForPixelLocation(imageData, x, y); // Reminder: ImageData stores things as a giant array with bytes // in "RGBA" order. So index 0 = R, index 1 = G, and index 2 = B. return newRGB(imageData.data[idx + 0], imageData.data[idx + 1], imageData.data[idx + 2]); } function blendPixel(imageData, x, y, src) { // Retrieve the existing pixel in the destination image. var dst = getPixel(imageData, x, y); // Lerp using the src's alpha to blend. var blended = lerpRGBA(dst, src, src.a); fillPixel(imageData, x, y, blended); } It's a bit clunky, but it works fine. If you think about it, it makes sense too: at 0% alpha, you want the untouched pixel grid (the destination), and at 100% alpha, you want the source fully filled in between, and at 50%, you want half of one, half of the other.

Anti-aliasing So, this is looking a lot better. We have shapes, we have colors and gradients. But it still doesn't look great. The edges are all "jaggies", like you might see out of something made entirely in MS Paint. Looking at the zoomed in pixel grid vs. the abstract one, it should be obvious what the problem is: the pixel grid is much, much coarser than the abstract grid! In signal processing terms, we're aliased: to construct our pixel grid, we're sampling from a much higher frequency space, the abstract grid, and that results in aliasing artifacts. ... OK. If that was a bit too technical, here's a quick signal processing intro: the word "frequency" just means "how fast things change". In our case, these "changes" are basically "how fast" our circle is changing, in the abstract grid case, and how fast our pixels are allowed to change, in the concrete grid case. Our abstract circle changes from "being inside" to "being outside" much faster than how often the grid can change, or, at a "higher frequency". ... Erm, that might also be too technical. Imagine a striped rectangle in our abstract grid that changes rapidly from black to white over and over again. Sampling this in our extremely course pixel grid will have a seemingly random pattern of sometimes white pixels and sometimes black pixels. This is known as a moire pattern, and it happens when we can't sample fast enough or fine enough for the source. Signal processing theory tells us that in order to prevent artifacts like this, we have two options: sample faster, or just remove the high frequencies altogether. Computer graphics implementations, conceptually, do the former, and then downsample using a special filter that blends the pixels to avoid artifacts. If that was a bit too abstract for you, let's try writing this in code. The easiest way to sample at a high frequency is to literally just sample more. This is an approach known as supersampling, since we're sampling at a higher, or "more super", frequency. In order to do that, we're going to need to refactor our code a bit and introduce a bit of infrastructure so we can sample at arbitrary positions. The first thing we're going to do is take that if statement testing if the point is inside the circle, and move it out into a new function. function insideCircle(centerX, centerY, radius, sampleX, sampleY) { var distX = (sampleX - centerX), distY = (sampleY - centerY); var distance = Math.sqrt(distX*distX + distY*distY); return (distance <= radius); } We also remove the "center pixel" 0.5 bias now, because this will be part of our passed in sample point. This gives us a nice functional sampling test. Now, for each pixel, we're going to sample it 16 different times, and collect the results. Watch closely. The code below might be a bit dense, but read it line-by-line and it should make sense. function fillCircle(imageData, fillStyle, centerX, centerY, radius) { // Sample the shape 16 times in a 4x4 grid. var nSubpixelsX = 4; var nSubpixelsY = 4; var x1 = Math.floor(centerX - radius), y1 = Math.floor(centerY - radius); var x2 = Math.ceil(centerX + radius), y2 = Math.ceil(centerY + radius); for (var y = y1; y < y2; y++) { for (var x = x1; x < x2; x++) { // Compute the coverage by sampling the circle at "subpixel" // locations and counting the number of subpixels turned on. var coverage = 0; for (var subpixelY = 0; subpixelY < nSubpixelsY; subpixelY++) { for (var subpixelX = 0; subpixelX < nSubpixelsX; subpixelX++) { // Sample the center of the subpixel. var sampX = x + ((subpixelX + 0.5) / nSubpixelsX); var sampY = y + ((subpixelY + 0.5) / nSubpixelsY); if (insideCircle(centerX, centerY, radius, sampX, sampY)) coverage += 1; } } // Take the average of all subpixels. coverage /= nSubpixelsX * nSubpixelsY; // Quick optimization: if we're fully outside the circle, // we don't need to compute the fill. if (coverage === 0) continue; var rgba = fillStyle(x, y); // Apply coverage to the alpha. rgba = newRGBA(rgba.r, rgba.g, rgba.b, rgba.a * coverage); blendPixel(imageData, x, y, rgba); } } } There's quite a lot to go through here, but the core important idea is that for each pixel, we're sampling the abstract grid 16 times, and figuring out how much of the abstract circle this square pixel contains. Once we know this, we compensate by making the pixel more transparent. This makes some intuitive sense: if a circle only covers half of a pixel, the other half should be occupied by what's underneath it, so the two colors get merged into one. This, unfortunately, has a cost: we still only have 8 bits per channel to work with. We are trading color depth for the appearance of more space. This is why it's hard to "un-anti-alias" pictures in Photoshop if you've ever tried to remove a background or similar: because some of the original color was lost! Secondly, note the use of the word coverage, which tells you how much of the concrete pixel was covered by the abstract shape. Note that we also put this coverage value into the alpha of our source, even though it's conceptually not a transparent image. This is important: the alpha channel of an image isn't only just for transparency, it's also used for pixel coverage, though they do end up blending the same, and so we tend to combine the two into one channel. One other thing: You might notice that I'm not computing the fill style at every subsample, only the shape's coverage. This is an optimization known in the 3D graphics world as multisampling. The hope is that the fill color of just the pixel's center really doesn't change from an average of all possible sample points, but the coverage does. When you go into a game's advanced settings and turn on "MSAAx16", this is exactly the algorithm that runs there. You might also notice that the animation still seems a bit "jerky". This is because I'm drawing the circle locked to the pixel grid with a Math.floor . But now we can represent arbitrary sample points thanks to proper antialiasing. Taking the Math.floor out lets us draw something like drawCircle(3.5, 2.32, 6.7); , to pull some numbers out of a hat. Our abstract grid can represent fractions of pixels just perfectly, and, with us taking multiple samples, our circle should look a lot more... well... circular, with fractional sample positions. I've also slowed down time and given you a visualizer tool to see the subpixel sampling in action. Just hover over a pixel to see the details of it. Also, now that we can draw shapes starting at any pixel, we have to be a bit more careful computing the bounding box to help accommodate this. Note the additions of Math.floor and Math.ceil above; I sneakily put those in there, but they are required for this to work correctly. Otherwise, the X and Y values we pass to blendImage would be fractional, and our attempts at setting the ImageData array would fail on fractional indexes. And last, I should mention. Super- or multisampling are not the only approaches to doing antialiasing. Another approach here is to use analytic methods to figure out how much area the shape actually covers. The common algorithm for this is commonly attributed to Raph Levien in the libart rendering library. I might do an interactive article for this some other day, but for now, I'll link to Sean Barrett, who has an excellent whiteboard explanation up on his website. The sampling approaches are easier to understand, much more visualizable and obvious to the theory, and give us pretty great results.