I decided to write a bit about the programming aspect and how components talk to each other. Maybe it'll shed some light on certain areas.

The Presentation

What does it take to even have that single image, that you posted in your question, drawn on the screen?

There are many ways to draw a triangle on the screen. For simplicity, let's assume no vertex buffers were used. (A vertex buffer is an area of memory where you store coordinates.) Let's assume the program simply told the graphics processing pipeline about every single vertex (a vertex is just a coordinate in space) in a row.

But, before we can draw anything, we first have to run some scaffolding. We'll see why later:

// Clear The Screen And The Depth Buffer glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Reset The Current Modelview Matrix glMatrixMode(GL_MODELVIEW); glLoadIdentity(); // Drawing Using Triangles glBegin(GL_TRIANGLES); // Red glColor3f(1.0f,0.0f,0.0f); // Top Of Triangle (Front) glVertex3f( 0.0f, 1.0f, 0.0f); // Green glColor3f(0.0f,1.0f,0.0f); // Left Of Triangle (Front) glVertex3f(-1.0f,-1.0f, 1.0f); // Blue glColor3f(0.0f,0.0f,1.0f); // Right Of Triangle (Front) glVertex3f( 1.0f,-1.0f, 1.0f); // Done Drawing glEnd();

So what did that do?

When you write a program that wants to use the graphics card, you'll usually pick some kind of interface to the driver. Some well known interfaces to the driver are:

OpenGL

Direct3D

CUDA

For this example we'll stick with OpenGL. Now, your interface to the driver is what gives you all the tools you need to make your program talk to the graphics card (or the driver, which then talks to the card).

This interface is bound to give you certain tools. These tools take the shape of an API which you can call from your program.

That API is what we see being used in the example above. Let's take a closer look.

The Scaffolding

Before you can really do any actual drawing, you'll have to perform a setup. You have to define your viewport (the area that will actually be rendered), your perspective (the camera into your world), what anti-aliasing you will be using (to smooth out the edged of your triangle)...

But we won't look at any of that. We'll just take a peek at the stuff you'll have to do every frame. Like:

Clearing the screen

The graphics pipeline is not going to clear the screen for you every frame. You'll have to tell it. Why? This is why:

If you don't clear the screen, you'll simply draw over it every frame. That's why we call glClear with the GL_COLOR_BUFFER_BIT set. The other bit ( GL_DEPTH_BUFFER_BIT ) tells OpenGL to clear the depth buffer. This buffer is used to determine which pixels are in front (or behind) other pixels.

Transformation



Image source

Transformation is the part where we take all the input coordinates (the vertices of our triangle) and apply our ModelView matrix. This is the matrix that explains how our model (the vertices) are rotated, scaled, and translated (moved).

Next, we apply our Projection matrix. This moves all coordinates so that they face our camera correctly.

Now we transform once more, with our Viewport matrix. We do this to scale our model to the size of our monitor. Now we have a set of vertices that are ready to be rendered!

We'll come back to transformation a bit later.

Drawing

To draw a triangle, we can simply tell OpenGL to start a new list of triangles by calling glBegin with the GL_TRIANGLES constant.

There are also other forms you can draw. Like a triangle strip or a triangle fan. These are primarily optimizations, as they require less communication between the CPU and the GPU to draw the same amount of triangles.

After that, we can provide a list of sets of 3 vertices which should make up each triangle. Every triangle uses 3 coordinates (as we're in 3D-space). Additionally, I also provide a color for each vertex, by calling glColor3f before calling glVertex3f .

The shade between the 3 vertices (the 3 corners of the triangle) is calculated by OpenGL automatically. It will interpolate the color over the whole face of the polygon.

Interaction

Now, when you click the window. The application only has to capture the window message that signals the click. Then you can run any action in your program you want.

This gets a lot more difficult once you want to start interacting with your 3D scene.

You first have to clearly know at which pixel the user clicked the window. Then, taking your perspective into account, you can calculate the direction of a ray, from the point of the mouse click into your scene. You can then calculate if any object in your scene intersects with that ray. Now you know if the user clicked an object.

So, how do you make it rotate?

Transformation

I am aware of two types of transformations that are generally applied:

Matrix-based transformation

Bone-based transformation

The difference is that bones affect single vertices. Matrices always affect all drawn vertices in the same way. Let's look at an example.

Example

Earlier, we loaded our identity matrix before drawing our triangle. The identity matrix is one that simply provides no transformation at all. So, whatever I draw, is only affected by my perspective. So, the triangle will not be rotated at all.

If I want to rotate it now, I could either do the math myself (on the CPU) and simply call glVertex3f with other coordinates (that are rotated). Or I could let the GPU do all the work, by calling glRotatef before drawing:

// Rotate The Triangle On The Y axis glRotatef(amount,0.0f,1.0f,0.0f);

amount is, of course, just a fixed value. If you want to animate, you'll have to keep track of amount and increase it every frame.

So, wait, what happened to all the matrix talk earlier?

In this simple example, we don't have to care about matrices. We simply call glRotatef and it takes care of all that for us.

glRotate produces a rotation of angle degrees around the vector x y z . The current matrix (see glMatrixMode) is multiplied by a rotation matrix with the product replacing the current matrix, as if glMultMatrix were called with the following matrix as its argument: x 2 ⁡ 1 - c + c x ⁢ y ⁡ 1 - c - z ⁢ s x ⁢ z ⁡ 1 - c + y ⁢ s 0 y ⁢ x ⁡ 1 - c + z ⁢ s y 2 ⁡ 1 - c + c y ⁢ z ⁡ 1 - c - x ⁢ s 0 x ⁢ z ⁡ 1 - c - y ⁢ s y ⁢ z ⁡ 1 - c + x ⁢ s z 2 ⁡ 1 - c + c 0 0 0 0 1

Well, thanks for that!

Conclusion

What becomes obvious is, there's a lot of talk to OpenGL. But it's not telling us anything. Where is the communication?

The only thing that OpenGL is telling us in this example is when it's done. Every operation will take a certain amount of time. Some operation take incredibly long, others are incredibly quick.

Sending a vertex to the GPU will be so fast, I wouldn't even know how to express it. Sending thousands of vertices from the CPU to the GPU, every single frame, is, most likely, no issue at all.

Clearing the screen can take a millisecond or worse (keep in mind, you usually only have about 16 milliseconds of time to draw each frame), depending on how large your viewport is. To clear it, OpenGL has to draw every single pixel in the color you want to clear to, that could be millions of pixels.

Other than that, we can pretty much only ask OpenGL about the capabilities of our graphics adapter (max resolution, max anti-aliasing, max color depth, ...).

But we can also fill a texture with pixels that each have a specific color. Each pixel thus holds a value and the texture is a giant "file" filled with data. We can load that into the graphics card (by creating a texture buffer), then load a shader, tell that shader to use our texture as an input and run some extremely heavy calculations on our "file".

We can then "render" the result of our computation (in the form of new colors) into a new texture.

That's how you can make the GPU work for you in other ways. I assume CUDA performs similar to that aspect, but I never had the opportunity to work with it.

We really only slightly touched the whole subject. 3D graphics programming is a hell of a beast.



Image Source