This project was more like a proof of concept than actually making a fully functional 3D engine on the ATmega328, I was curious to see if the AVR could handle something like this and display lowpoly 3D models.

After meeting and talking with Xark (https://github.com/XarkLabs) on IRC, I discovered his fork of the Adafruit GFX library and was surprised with how great it was performing, it was another good reason to start this project.

From the very beginning, Xark told me that the bottleneck of this project would be the SPI communication and even with his fork performing between 3.5x and 12x faster than the Adafruit libraries it would still not perform fast enough to get a smooth colored mesh without flickering. He was absolutely right but I have to admit I had a lot of fun with this project and wireframe models are still looking nice.

The cube

I started with a really basic 3D rotating cube, using floats and cos/sin math functions. Clearing the screen completely was causing flicker so I decided to make another array to store a copy of the projected nodes (vertices). Before drawing a new frame with the updated cube transformation I would draw the old nodes position with the background color giving the illusion I was completely clearing the screen. The flickering was completely gone. Another tweak for better performance was to actually make a memory comparison of both old and new projected nodes to decide if I had to update the frame or not.

The main loop at this stage:

// ---------------------------------------------- // main loop // ---------------------------------------------- void loop() { loops = 0; while( millis() > next_tick && loops < MAX_FRAMESKIP) { rotate_z(0); rotate_y(4); rotate_x(1); // ... next_tick += SKIP_TICKS; loops++; } // =============== // draw // =============== // only redraw if nodes position have changed (less redraw - less flicker) if (memcmp(old_nodes, nodes, sizeof(nodes))) { // erase old projected nodes draw_wireframe(proj_nodes, ST7735_WHITE); // make new projection with new nodes position project(nodes); // draw new projection draw_wireframe(proj_nodes, ST7735_BLACK); // copy new nodes location to erase cube memcpy(old_nodes, nodes, sizeof(nodes)); } }

While not apparent yet because of the small number of nodes, it was a problem that I was using floats and sin/cos for the rotation. There would be issues later with complicated meshes containing more vertices. Storing a couple values for sin and cos in an array was a good start to avoid using these functions:

static const float sin_lut[LUTSIZE] = { // to avoid calling multiple times sin() store a couple values in a lut 0.010000, 0.019999, 0.029995, 0.039989, 0.049979, }; static const float cos_lut[LUTSIZE] = { // to avoid calling multiple times cos() store a couple values in a lut 0.999950, 0.999800, 0.999550, 0.999200, 0.998750, }; // ---------------------------------------------- // rotation on axis functions (0 - 4) // ---------------------------------------------- void rotate_z(char theta) { i = NODECOUNT-1; do { tmp_axis = nodes[i][0]; nodes[i][0] = (tmp_axis * cos_lut[theta]) - (nodes[i][1] * sin_lut[theta]); nodes[i][1] = (nodes[i][1] * cos_lut[theta]) + (tmp_axis * sin_lut[theta]); } while(i--); } void rotate_y(char theta) { i = NODECOUNT-1; do { tmp_axis = nodes[i][0]; nodes[i][0] = (tmp_axis * cos_lut[theta]) - (nodes[i][2] * sin_lut[theta]); nodes[i][2] = (nodes[i][2] * cos_lut[theta]) + (tmp_axis * sin_lut[theta]); } while(i--); } void rotate_x(char theta) { i = NODECOUNT-1; do { tmp_axis = nodes[i][1]; nodes[i][1] = (tmp_axis * cos_lut[theta]) - (nodes[i][2] * sin_lut[theta]); nodes[i][2] = (nodes[i][2] * cos_lut[theta]) + (tmp_axis * sin_lut[theta]); } while(i--); }

It was of course temporary since the rotation was based on the earlier transformation of the mesh and wouldn't cover 360 degrees. But it was the start of the 'optimization' process of the project.

Can we go faster?

Another attempt at making things going faster was to actually store the screen in a buffer. It's not possible on the ATmega328 to store the full screen resolution (128x160) since there is not enough memory, but storing 1/4 of it was. The idea was to store a 32x40 array of char where I would specify which color index the 'render' function should draw and each represented pixel would be a 4x4 tile. This array by itself was already taking 1280 bytes which represented about 64% of the available memory on the ATmega328 and the compiler started to warn me that I might encounter stability issue because of the limited amount of memory left.

Beside the memory limitation, I was also losing a lot of details by drawing with this new resolution. But now I was able to completely clear the screen without flickering:

Your browser doesn't support HTML5 video tag. Cube using a 32x40 buffer and drawing 4x4 tiles for each 'pixel'

Switching to 3x3 tiles made the 'rendering' much faster, but the resolution was pretty rudimentary and would be an issue for complex meshes later. I was also limited to 256 colors because of the indexing in the buffer array since I was using chars to limit memory usage, and the render function would turn into a huge switch/case to return the appropriate color depending on said index:

// ---------------------------------------------- // render buffer to screen // ---------------------------------------------- void render() { uint16_t col; TFT.spi_begin(); for (char y = 0; y < BUFFH; y++) { for (char x = 0; x < BUFFW; x++) { switch(buff[y][x]) { case 0: col = COLOR0; break; case 1: col = COLOR1; break; case 2: col = COLOR2; break; case 3: col = COLOR3; break; case 4: col = COLOR4; break; case 5: col = COLOR5; break; } //TFT.drawPixel(x*4, y*4, col); // 4x to fill screen since buffer is 1/4 of resolution TFT.setAddrWindow_(x<<2, y<<2, (x<<2)+PIXELOFFSET-1, (y<<2)+PIXELOFFSET-1); i = PIXELOFFSET-1; do { TFT.spiWrite16(col, PIXELOFFSET); } while(i--); } } TFT.spi_end(); } ... // ---------------------------------------------- // main loop // ---------------------------------------------- void loop() { loops = 0; while( millis() > next_tick && loops < MAX_FRAMESKIP) { rotate_z(0); rotate_y(4); rotate_x(1); // ... next_tick += SKIP_TICKS; loops++; } // =============== // draw // =============== // only redraw if nodes position have changed (less redraw - less flicker) if (memcmp(old_nodes, nodes, sizeof(nodes))) { // make new projection with new nodes position project(); // clear buffer memset(buff, 0, sizeof(buff)); // draw new projection with COLOR1 draw_wireframe(1); render(); // copy new nodes location to erase cube memcpy(old_nodes, nodes, sizeof(nodes)); } }

At this stage the 'engine' was still using floats which are extremly slow on the AVR and after implementing transformation matrix (ref: codinglabs.net) it was time to switch to fixed point. Depending on the project you are working on this task can be really frustrating, you have to make sure you are between valid ranges to avoid overflowing and ending with values much bigger than what your variable type can handle.

There is a nice thread on TIGForums by J-Snake explains the process: Fixed Point Arithmetic - A Comprehensive Introduction

Your browser doesn't support HTML5 video tag. 32x40 buffer, 3x3 tiles and tranformation matrix

Because of the limited amount of memory, you really need to store as much as possible in PROGMEM as long as you do not plan to change these variables during runtime. But 32KB is still somehow limited if you are not careful, and trying to limit and optimize memory and storage usage is a good idea if you know that you might end up using much more than the current state of your project.

I had enough storage memory to store a full look up table for my my sin/cos calls, but if you observe a graph with sine and cosine you would notice that you actually don't need 360 degrees for each one of these. (you can also check values here: mathwarehouse.com)

You can actually just use 90 degrees values of either sine or cosine, and 'mirror' the returned value depending on the angle requested, here is a snippet of the fixed point look up table I ended up making:

#define LUT(a) (long)(pgm_read_word(&lut[a])) ... const unsigned int lut[] PROGMEM = { // 0 to 90 degrees fixed point COSINE look up table 16384, 16381, 16374, 16361, 16344, 16321, 16294, 16261, 16224, 16182, 16135, 16082, 16025, 15964, 15897, 15825, 15749, 15668, 15582, 15491, 15395, 15295, 15190, 15081, 14967, 14848, 14725, 14598, 14466, 14329, 14188, 14043, 13894, 13740, 13582, 13420, 13254, 13084, 12910, 12732, 12550, 12365, 12175, 11982, 11785, 11585, 11381, 11173, 10963, 10748, 10531, 10310, 10086, 9860, 9630, 9397, 9161, 8923, 8682, 8438, 8191, 7943, 7691, 7438, 7182, 6924, 6663, 6401, 6137, 5871, 5603, 5334, 5062, 4790, 4516, 4240, 3963, 3685, 3406, 3126, 2845, 2563, 2280, 1996, 1712, 1427, 1142, 857, 571, 285, 0 }; ... // ---------------------------------------------- // SIN/COS from 90 degrees LUT // ---------------------------------------------- long SIN(unsigned int angle) { angle += 90; if (angle > 450) return LUT(0); if (angle > 360 && angle < 451) return -LUT(angle-360); if (angle > 270 && angle < 361) return -LUT(360-angle); if (angle > 180 && angle < 271) return LUT(angle-180); return LUT(180-angle); } long COS(unsigned int angle) { if (angle > 360) return LUT(0); if (angle > 270 && angle < 361) return LUT(360-angle); if (angle > 180 && angle < 271) return -LUT(angle-180); if (angle > 90 && angle < 181) return -LUT(180-angle); return LUT(angle); }

At this point I was completely avoiding the use of floats, cos/sin functions and was able to clear the screen without any apparent flickering:

Your browser doesn't support HTML5 video tag. 32x40 buffer, 3x3 tiles, matrices, fixed point, lut

It was time to implement a basic backface culling, it would save us from calculating and drawing edges that shouldn't be visible. Using the shoelace algorithm you can determine the surface of a triangle in your mesh, if the return value is negative it means the triangle is facing away from us. (ref: mathopenref.com)

Along with the nodes (vertices) position of the cube, I also had a multidimensional array of triangles specifying which nodes represent each one of them. In order to see if my current triangle was hidden or not I would just send its index and depending on the return value would decide if it needs to be drawn on screen. The following snippet might be a bit confusing without the full source, EDGE is a macro that returns a node from a triangle stored in the triangles array mentioned earlier:

// ---------------------------------------------- // Shoelace algorithm to get the surface // ---------------------------------------------- int shoelace(unsigned char (*n)[2], unsigned char index) { unsigned char t = 0; int surface = 0; for (; t<3; t++) { // (x1y2 - y1x2) + (x2y3 - y2x3) ... surface += (n[EDGE(index,t)][0] * n[EDGE(index,(t<2?t+1:0))][1]) - (n[EDGE(index,(t<2?t+1:0))][0] * n[EDGE(index,t)][1]); } return surface * 0.5; } // ---------------------------------------------- // Shoelace algorithm for triangle visibility // ---------------------------------------------- bool is_hidden(unsigned char index) { // (x1y2 - y1x2) + (x2y3 - y2x3) ... return ( ( (proj_nodes[EDGE(index,0)][0] * proj_nodes[EDGE(index,1)][1]) - (proj_nodes[EDGE(index,1)][0] * proj_nodes[EDGE(index,0)][1]) ) + ( (proj_nodes[EDGE(index,1)][0] * proj_nodes[EDGE(index,2)][1]) - (proj_nodes[EDGE(index,2)][0] * proj_nodes[EDGE(index,1)][1]) ) + ( (proj_nodes[EDGE(index,2)][0] * proj_nodes[EDGE(index,0)][1]) - (proj_nodes[EDGE(index,0)][0] * proj_nodes[EDGE(index,2)][1]) ) ) < 0 ? true : false; }

Triangles facing away from us were successfully hidden now:

Your browser doesn't support HTML5 video tag. 32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling

At this point I knew I was able to color the cube faces, my initial plan (which I eventually aborted) was to render a flat shaded version of a rotating mesh. I made various test with triangle filling and it seemed possible until I realized something:

Your browser doesn't support HTML5 video tag. 32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling, flat colors

Your browser doesn't support HTML5 video tag. 32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling, flat colors

It's just an ugly cube..

Now let's be honest, it works fine, I can put colors, rotate the mesh and I don't have any flickering going on. But wow it is ugly..

It was now certain that if I had a complex mesh to render I would be witnessing the battle of a bunch of pixels moving around in the middle of the screen, I had to weight pros and cons of the current state and eventually decide how I should eventually proceed and which goal (flat shading) I would need to forget about.

Pros of the buffered version:

fast

clear screen without flickering

flat shading is possible (initial goal)

Cons:

use a lot of memory

aesthetically unpleasant (it's amazingly ugly!)

resolution too small to display complex meshes

256 colors maximum

huge switch/case as more colors are defined

In short, I'm stuck with an ugly cube and I thought it would be a shame to stay at this state and start working on the flat shading.

Back to full res

Switching back to full resolution wasn't a problem, I just had to remove the buffer eating most of the AVR memory and use XarkLabs and Adafruit drawing functions to directly draw on screen. It was also really nice to see how fast the wireframe version of the cube was rendering, and I could compare my first experiments before implementing fixed point and lut with the current stage of the project.

Out of curiosity I tried a colored version of the cube but clearing the screen, or even just the area using a bounding box dirt mask was causing a lot of flicker:

Your browser doesn't support HTML5 video tag. Full resolution (128x160), matrices, fixed point, lut, backface culling, render types

This is the moment I decided to forget about flat shading (and colors) to focus on the wireframe rendering. The way the engine was loading meshes allowed me to try something different than a cube for a change, it was time to find how to add models following the proper format I was using.

More 3D models!

At first I was planning on making a script to convert OBJ files, but after having a closer look at Blender's different exporting format options I noticed that STL (ASCII) files are much simpler to use. It's already using a set of vertices to declare each triangle, the STL (ASCII) file for the cube mesh is the following:

solid Exported from Blender-2.74 (sub 0) facet normal -0.000000 0.000000 -1.000000 outer loop vertex 1.000000 1.000000 -1.000000 vertex 1.000000 -1.000000 -1.000000 vertex -1.000000 -1.000000 -1.000000 endloop endfacet ... more facets ... facet normal 0.000000 1.000000 0.000000 outer loop vertex -1.000000 1.000000 -1.000000 vertex -1.000000 1.000000 1.000000 vertex 1.000000 0.999999 1.000000 endloop endfacet endsolid Exported from Blender-2.74 (sub 0)

Each facet (triangle) is first defined by its normal value and after that, each node (vertex) representing the facet is declared between outer loop and endloop as a set of three vertices. You just need to gather all vertices, keep only unique vertex by position, and then gather all facets pointing to these vertices into an array by looking for these values.

The following python script is doing this exact procedure and output a header file with the proper data format:

#!/usr/bin/env python # ----------------------------------------------------------------------------- # STL2H by Themistokle "mrt-prodz" Benetatos # ------------------------------------------- # Convert STL 3D models (ASCII) to header for Tiny 3D Engine # # ------------------------ # http://www.mrt-prodz.com # http://github.com/mrt-prodz/ATmega328-Tiny-3D-Engine # ----------------------------------------------------------------------------- import sys, getopt, os # global parameters param_verbose = False param_normals = False param_yes = False param_scale = 1.0 def checkFile(outfile): # keep asking user until overwrite is chosen or a non-existing file name is entered while (os.path.isfile(outfile) is True): overwrite = raw_input('[!] Output data file "%s" already exists, overwrite? [y/n] ' % outfile) if overwrite in ('y', 'Y'): return outfile elif overwrite in ('n', 'N'): outfile = raw_input('[?] Enter new output data file name: ') if (outfile == ''): outfile = 'temp.h' return outfile def printVerbose(str): global param_verbose if param_verbose is True: print str, def saveDAT(nodes, triangles, outfile, normals = None): print '[+] Saving output file:', outfile data = '// exported with stl2h

' data += '// ' data += ' '.join(sys.argv[:]) + '

' data += '#ifndef MESH_H

' data += '#define MESH_H

' data += '

' data += '#define NODECOUNT ' + str(len(nodes)) + '

' data += '#define TRICOUNT ' + str(len(triangles)) + '

' data += '

' data += '#define NODE(a, b) (long)(pgm_read_dword(&nodes[a][b]))

' data += '#define EDGE(a, b) pgm_read_byte(&faces[a][b])

' data += '#define NORMAL(a, b) (long)(pgm_read_dword(&normals[a][b]))

' data += '

' data += 'const long nodes[NODECOUNT][3] PROGMEM = {

' for index, node in enumerate(nodes): data += ' {(long)(' + str(round(float(node[0]), 5)*param_scale) + '*PRES), '\ + '(long)(' + str(round(float(node[1]), 5)*param_scale) + '*PRES), '\ + '(long)(' + str(round(float(node[2]), 5)*param_scale) + '*PRES)},

' data += '};



' data += 'const unsigned char faces[TRICOUNT][3] PROGMEM = {

' for index, face in enumerate(triangles): data += ' {' + str(face[0]) + ', ' + str(face[1]) + ', ' + str(face[2]) + '},

' data += '};



' data += 'const long normals[TRICOUNT][3] PROGMEM = {

' for index, normal in enumerate(normals): data += ' {(long)(' + str(round(float(normal[0]), 5)) + '*PRES), '\ + '(long)(' + str(round(float(normal[1]), 5)) + '*PRES), '\ + '(long)(' + str(round(float(normal[2]), 5)) + '*PRES)},

' data += '};



' data += '#endif // MESH_H

' dat = open(outfile, 'w') dat.write(data) dat.close() printVerbose(data) def parseSTL(infile, outfile): global param_verbose if param_verbose is True: print '[+] Parsing STL file with verbose output' # if -y is not set check for file being overwritten global param_yes if param_yes is False: outfile = checkFile(outfile) print '[+] Input STL file:', infile print '[+] Output header file:', outfile stl = open(infile, 'r') nodes = [] unique_nodes = [] triangles = [[]] normals = [] global param_normals if param_normals is True: print '[+] Saving facet normals information' # store vertex into list first print('[+] Gathering vertices') for index, line in enumerate(stl): # split line into tokens and keep only vertices token = line.split() if (len(token) == 4) and (token[0] == 'vertex'): # store x y z into list nodes.append([token[1], token[2], token[3]]) printVerbose(' ' + line) # quickly check that we have sets of 3 vertices for each triangle check = len(nodes) % 3 print '[+] STL file check:', ('good' if check == 0 else 'bad') if check != 0: print '[!] Error with STL file, each triangle should be made of 3 vertices (missing %d vertex)' % (3 - check) sys.exit(2) # keep only unique nodes print('[+] Keeping unique vertices') [unique_nodes.append(item) for item in nodes if item not in unique_nodes] # output stored vertices if param_verbose is True: for node in unique_nodes: # print previous triangle created printVerbose(' ' + str(node[0]) + ', ' + str(node[1]) + ', ' + str(node[2]) + '

') # seek start of file to get triangles stl.seek(0, 0) # assign vertex index to triangle list print('[+] Gathering triangles') index = 0 for line in stl: # split line into tokens token = line.split() if (len(token) == 4) and (token[0] == 'vertex'): # compare current array to unique_nodes and set index in triangle for i, node in enumerate(unique_nodes): # if xyz match if [token[1], token[2], token[3]] == node: # add to current triangle list if array at index has less than 3 items if len(triangles[index]) < 3: triangles[index].append(i) if len(triangles[index]) == 3: # print previous triangle created printVerbose(' ' + str(triangles[index][0]) + ', ' + str(triangles[index][1]) + ', ' + str(triangles[index][2]) + '

') break # else create new array else: # increase index index += 1 # append array with new item triangles.append([i]) break # gather normals if parameter is true if param_normals is True: print('[+] Gathering normals') # seek start of file to get normals stl.seek(0, 0) # gather normals for line in stl: # split line into tokens token = line.split() if (len(token) == 5) and (token[0] == 'facet') and (token[1] == 'normal'): # store vector normal normal = [token[2], token[3], token[4]] # print normal if verbose mode printVerbose(' ' + str(normal[0]) + ', ' + str(normal[1]) + ', ' + str(normal[2]) + '

') normals.append(normal) # print stats print '[+] Vertices: ', len(unique_nodes) print '[+] Triangles:', len(triangles) if param_normals is True: print '[+] Normals:', len(normals) saveDAT(unique_nodes, triangles, outfile, normals) else: saveDAT(unique_nodes, triangles, outfile, normals) print '[+] Done

' def usage(): print 'Usage: %s -i -o ' % sys.argv[0] print 'Convert a 3D mesh saved as STL format (ASCII) to header for Tiny 3D Engine.

' print ' -i, --inputfile \t3D mesh in STL file format' print ' -o, --outputfile\toutput filename of converted data' print ' -s, --scale \tscale ratio (default 1.0)' print ' -n, --normals \tsave face normals' print ' -y, --yes \tanswer yes to all requests' print ' -v, --verbose \tverbose output' def main(argv): infile = '' outfile = '' # parse command line arguments try: opts, args = getopt.getopt(sys.argv[1:],'i:o:s:nhyv',['help', 'yes', 'normals', 'scale=', 'input=', 'output=', 'verbose=']) except getopt.GetoptError as error: print 'Error:', str(error) usage() sys.exit(2) if len(opts) < 2: usage() sys.exit(2) for opt, arg in opts: if opt in ('-h', '--help'): usage() sys.exit() elif opt in ('-i', '--input'): infile = arg elif opt in ('-o', '--output'): outfile = arg elif opt in ('-s', '--scale'): global param_scale try: param_scale = float(arg) except Exception as converror: print 'Error:', str(converror) sys.exit(2) elif opt in ('-v', '--verbose'): global param_verbose param_verbose = True elif opt in ('-y', '--yes'): global param_yes param_yes = True elif opt in ('-n', '--normals'): global param_normals param_normals = True if (not infile) or (not outfile): print 'Error: You need to specify both input and output files' usage() sys.exit(2) if infile == outfile: print 'Error: Input and output files are the same' usage() sys.exit(2) # parse STL file and convert to data parseSTL(infile, outfile) if __name__ == '__main__': main(sys.argv[1:])

Running the script with cube.stl as input and cube.h as output with a scale of 14 would save the following header file:

#ifndef MESH_H #define MESH_H #define NODECOUNT 8 #define TRICOUNT 12 #define NODE(a, b) (long)(pgm_read_dword(&nodes[a][b])) #define EDGE(a, b) pgm_read_byte(&faces[a][b]) const long nodes[NODECOUNT][3] PROGMEM = { {(long)(14.0*PRES), (long)(14.0*PRES), (long)(-14.0*PRES)}, {(long)(14.0*PRES), (long)(-14.0*PRES), (long)(-14.0*PRES)}, {(long)(-14.0*PRES), (long)(-14.0*PRES), (long)(-14.0*PRES)}, {(long)(-14.0*PRES), (long)(14.0*PRES), (long)(-14.0*PRES)}, {(long)(14.0*PRES), (long)(14.0*PRES), (long)(14.0*PRES)}, {(long)(-14.0*PRES), (long)(14.0*PRES), (long)(14.0*PRES)}, {(long)(-14.0*PRES), (long)(-14.0*PRES), (long)(14.0*PRES)}, {(long)(14.0*PRES), (long)(-14.0*PRES), (long)(14.0*PRES)}, }; const unsigned char faces[TRICOUNT][3] PROGMEM = { {0, 1, 2}, {2, 3, 0}, {4, 5, 6}, {6, 7, 4}, {0, 4, 7}, {7, 1, 0}, {1, 7, 6}, {6, 2, 1}, {2, 6, 5}, {5, 3, 2}, {4, 0, 3}, {3, 5, 4}, }; #endif // MESH_H

Including this file in the source would make the engine automatically load and render the mesh after compiling. Converting and loading new meshes after that was really easy:

Your browser doesn't support HTML5 video tag.

Interaction with the mesh

I implemented a really basic controller for the mesh rotation using a 3 axis accelerometer (ADXL335) and a joystick thumb. The way the mesh behaves based on movement of the controllers could be much better but it gives a basic idea of what is possible and I found the idea pretty amusing. You can see these in action in the video preview below.

Source and video preview

Here is a small video showcasing the engine running on the ATmega328 and ST7735.

(Arduino UNO and Sainsmart 1.8" TFT screen)

You can find the project source on github: github.com/mrt-prodz/ATmega328-Tiny-3D-Engine