This is a three part series on a somewhat broad technical 3d topic. The first part is a theoretical overview of the technique, the second part edits an existing demo to apply instancing and the last part explores optimizations.

Intro

Imagine we have a 3d world with lots of trees, or lamp posts. When we render such a world we issue a lot of draw calls. Draw calls have overhead and are expensive. For the sake of interactive frame rates, we want to remove them where possible.

If we have something like:

const myGeom = new THREE.BoxGeometry()

const myMaterial = new THREE.MeshBasicMaterial()

const myGroup = new THREE.Group() for ( let i = 0 ; i < 25 ; i ++ ) {

const myMesh = new THREE.Mesh(myGeom, myMaterial)

myGroup.add(myMesh)

myMesh.frustumCulled = false

myMesh.position.set(random(),random(),random())

}

And add myGroup to a scene and render it, without any optimization, we will cause 25 different draw calls to happen (in addition to anything else that may be in the scene, including a clear call).

.frustumCulled = false turns of an optimization that aims to reduce these draw calls. This intersects the bounding sphere with the camera’s frustrum. If it is entirely outside of it, no draw call is issued for that mesh. If this were true, we would potentially see less than 25 draw calls, depending on our space configuration and where the camera is in it.

This, with some trade offs, could be one draw call.

The “brute force” way

One approach we can take to optimize the draw calls is to merge these meshes (geometries) into one.

If we consider these meshes to be static (a lamp post doesn’t have to change it’s position or scale during the life of the app) we partially care what describes the “group”.

In the snippet above it’s a scene graph node Group that holds instances of Mesh nodes. Each node is part of that cluster, we can translate them individually, but we can also move them all together in sync by translating the parent.

What if instead this were:

const geom = new THREE.BoxGeometry()

const material = new THREE.MeshBasicMaterial() const mergedGeometry = new THREE.BufferGeometry() for ( let i = 0 ; i < 25 ; i ++ ) {

const nodeGeometry = geom.clone()

nodeGeometry.translate(random(),random(),random())

mergedGeometry.merge(nodeGeometry)

} const myCluster = new THREE.Mesh( mergedGeometry, material)

We merge all the individual lamp posts into one cluster. We still have access to a scene graph node myCluster and we can move the entire group, but for example, we lost the ability to easily adjust the spacing between them (there is no individual lamp node any more).

We can however render all the lamp posts in the world, or a tile, with one draw call.

Drawbacks

This approach is a memory hog.

Since we kinda “unroll” this geometry, the GPU now has to store much more data. In the first example, it only stores one instance of geometry and references it with each draw call. In the second, it’s still one geometry, but it’s 25 times larger since that’s how many times we duplicated it.

The merge operation itself may be slow and cause a lot of GC activity.

It is complicated to update individual instances within the cluster.

On the flip side, we could remove a matrix multiplication from the shader.

The clever way

GPUs and WebGL are all about managing memory and issuing commands. We have a feature called “instancing” that would allow us to perform the optimization we just did with merging, but in a much more efficient way.

It is possible to use a much smaller set of data to describe what we want and render it the same. First let’s refresh a bit on the scene graph and what it does in GLSL.

When we make some geometry:

const geometry = new THREE.PlaneGeometry()

We can always expect three to produce some GLSL as such:

attribute vec3 position;

Without normals for lighting and uvs for mapping, this is the variable that the shader is going to access and get the position of a vertex in model space. This is the value from lamp_post.obj or some corner of a plane.

The GLSL that Material produces is of no interest yet, so let’s move onto the scene graph:

const mesh = new THREE.Mesh(geometry)

We usually manipulate mesh.rotation, mesh.position, mesh.scale , but these all get baked into a single 4x4 matrix on their way to GLSL, yielding:

uniform mat4 modelMatrix;

Whenever we change the position for example, the engine will recompute an appropriate THREE.Matrix4 and the shader will have a fresh modelMatrix variable.

While not directly related to instancing, let’s note how the camera maps:

const camera = new THREE.PerspectiveCamera

GLSL:

uniform mat4 projectionMatrix;

uniform mat4 viewMatrix;

uniform mat4 modelViewMatrix; //camera + mesh/line/point node

modelViewMatrix here actually belongs to both camera and mesh more on that in a bit.

A simple shader

Let’s transform the mesh with a very simple vertex shader. THREE.ShaderMaterial actually injects all these uniforms for us so we don’t have to:

void main(){

gl_Position =

projectionMatrix * viewMatrix * modelMatrix * vec4(position,1.);

}

Going right to left:

we cast the attribute from BufferGeometry to a vec4 since it comes in as vec3 .

from to a since it comes in as . We apply the world transformation derived from position,scale and rotation.

We project this into a camera

As mentioned, you’d have to use THREE.RawShaderMaterial in order to declare all these uniforms yourself. THREE.ShaderMaterial receives them from wrappers and abstractions.

Compare the two

In the first example, where the scene graph holds a parent with 25 children, the engine will compute 25 different modelMatrix values. If you move the parent, three will do something along the lines of:

parentMatrix.multiply(childMatrix)

because the GLSL shader needs it:

vec4 worldPosition =

modelMatrix * //moves the instance into world space (parent+child)

vec4(position,1.); //model space

When we merge, we remove the need to do the 25 matrix updates because we remove the child-parent relationship.

cluster.position.set(1,1,1)

cluster.rotation.set(Math.PI,0,0)

cluster.scale.set(2,1,2)

Still affects the modelMatrix , but three has to only ever compute one.

In the first example the matrix from Group is never actually directly encountered in GLSL since there is no draw call issued for such a node. It is present in all draw calls though since it needs to be computed on the CPU by multiplying it with the matrices that are actually used in the shader (mesh modelMatrix) .

In the second example, we actually conflate this with attribute vec3 position; :

const geom = new THREE.BufferGeometry() for ( let i = 0 ; i < 25 ; i ++ ) {

const nodeGeometry = geometry.clone()

nodeGeometry.applyMatrix( myMatrix[i] )

geom.merge(nodeGeometry)

}

We do a one time cpu operation, where we apply the matrix directly on the vertex:

vec4 worldSpace =

modelMatrix * //moves the entire cluster (parent)

vec4( position, 1.); //not really model space any more, since it has the transformation "baked in" from outside (child)

attribute vec3 position; no longer maps to lamp_post.obj . We’ve burnt in part of the scene graph, and lost the uniqueness of model space.

Instancing

Let’s take a step back and consider some of the elements we have after the lengthy overview so far:

some 3d world ( THREE.Scene )

) some spatial entity, like a neighborhood, a village or a tile ( THREE.Group )

) some asset, like a tree or a lamp post ( THREE.BufferGeometry )

) some intent on how the asset fits the world, ie. 25 lamp scattered in the world in some pattern ( THREE.Mesh )

The basic idea is :

const asset = OBJLoader.load('lamp_post.obj') //load a small asset once //scatter asset

const tile = new THREE.Mesh(new THREE.PlaneGeometry) myPositions.forEach( pos=>{

const mesh = new THREE.Mesh(asset, myMaterial)

mesh.position.copy(pos)

tile.add(mesh)

})

When we call tile.position.set() we move all the instances of the asset with it. We want to retain that convenience.

When we call tile.children[0].position.set() we can move a single asset relative to the others (we lose this with merging) but cause draw calls and expensive cpu side matrix computation with a child-parent relationship of a scene graph.

We want the convenience, but not the side effects. We can address all of these, with a gotcha.

Low level

The only instancing referenced in the docs are classes InstancedBufferAttribute and InstancedBufferGeometry .

There are a few examples but they are all low level, including this one.

You’ll notice that the convenience is gone:

myLampPost.clone().position.copy(myPosition)

And in it’s place it’s something like this:

var offsets = new Float32Array( INSTANCES * 3 ); // xyz

var colors = new Float32Array( INSTANCES * 3 ); // rgb

var scales = new Float32Array( INSTANCES * 1 ); // s

for ( var i = 0, l = INSTANCES; i < l; i ++ ) {

var index = 3 * i;

// per-instance position offset

offsets[ index ] = positions[ i ].x;

offsets[ index + 1 ] = positions[ i ].y;

offsets[ index + 2 ] = positions[ i ].z;

// per-instance color tint - optional

colors[ index ] = 1;

colors[ index + 1 ] = 1;

colors[ index + 2 ] = 1;

// per-instance scale variation

scales[ i ] = 1 + 0.5 * Math.sin( 32 * Math.PI * i / INSTANCES );

}

geometry.addAttribute( 'instanceOffset', new THREE.InstancedBufferAttribute( offsets, 3 ) );

geometry.addAttribute( 'instanceColor', new THREE.InstancedBufferAttribute( colors, 3 ) );

geometry.addAttribute( 'instanceScale', new THREE.InstancedBufferAttribute( scales, 1 ) );

Looks pretty gnarly, and thats only a small portion of the code needed to get (partial) instancing running on what’s trivially done through the scene graph.

Still this snippet is useful to tell what’s going on. But let’s also include a portion of the shader.

The relevant part from the example is:

#ifdef INSTANCED

attribute vec3 instanceOffset;

attribute float instanceScale;

#endif

This is a little bit different than the format we used in the previous two examples, since it does not use a matrix, but individual components (and it’s missing the rotation).

Unfortunately to set up a mat4 with instancing is a bit involved, but let’s pretend for a moment that we can:

attribute mat4 instanceMatrix; //instance attribute attribute vec3 position; //regular attribute void main(){

gl_Position =

projectionMatrix * viewMatrix * //from THREE.Camera

modelMatrix * //from THREE.Mesh

instanceMatrix * //we add this to the chain,

vec4(position,1.) //from THREE.BufferGeometry

;

}

We add another transformation step to the shader. Unlike projectionMatrix, viewMatrix and modelMatrix instanceMatrix is not a uniform, but an attribute. It’s a special kind of instanced attribute, which is part of the memory management magic that WebGL is supposed to do.

We can’t declare a mat4 attribute, so we have to compose it out of several vec4 attributes. Although this statement may not be 100% correct, it’s a straight forward way of going about the problem.

Compared to the previous two examples, this shader will still run 25 times the number of vertices per instance.

It will do an extra matrix multiplication, but in the case of 25 unique nodes, it will be done on the GPU not the CPU. The way GPUs work, they might be idle waiting for the draw call to be issued, so the gain can be made here by keeping them busier at the same overhead cost (draw call).

Memory wise, attribute vec3 position; still references the vertices of a BufferGeometry or lamp_post.obj only once, the same way it does if we were to reuse BufferGeometry with 25 unique nodes. This saves a lot of memory over merging.

The WebGL API prevents us from having uniform mat4 instanceMatrix; the same way we use other transformation matrices. It has to reference this as an attribute, hence:

geometry.addAttribute( 'instanceOffset', new THREE.InstancedBufferAttribute( offsets, 3 ) );

Three.js under the hood set’s up the other uniforms before each draw call. Since we compress many draw calls into one, we need to have all of this information available for it. This is what the InstancedBufferAttribute does. It’s an array of numbers formatted in such a way that they correspond to the data you would otherwise draw with a unique draw call.

In this snippet, we don’t use a matrix, but a simple 3d vector, to move instances individually. For our 25 lamp posts, this would mean 75 numbers to represent 25 different 3d vectors.

In a simple shader, for each draw call we would have:

uniform vec3 offset;

Since we compress these into one draw call:

attribute vec3 offset; //this is actually 25 different values that will be referenced

The draw call then draws many vertices the same, over and over for each instance, with each instance, it accesses a different value:

void main(){

vec3 myPosition = position + offset; //offset will change value 25 times during the draw call

}

This attribute can be much larger than the geometry, for example, drawing a simple plane a million times would require a much larger instance attribute than the geometry ones. Vice versa, rendering detailed geometry a few times would make it smaller.

With this kind of a setup, we retain some control over per instance positioning, being limited by the same limitations as we have when updating geometry.

Ie. modifying the offsets of the instances is the same as modifying the vertices of a mesh (it’s operating over an attribute, and the attribute needs to be updated), this can still be very performant. But the convenience is gone:

for ( var i = 0, l = INSTANCES; i < l; i ++ ) { var index = 3 * i; offsets[ index ] = positions[ i ].x;

offsets[ index + 1 ] = positions[ i ].y;

offsets[ index + 2 ] = positions[ i ].z;

Just like any other attribute, three.js leaves us to fill up an array with appropriate values.

Problems

As mentioned, this is all pretty low level, and it’s all three.js offers, but with a good reason. A game engine would guess how assets are being used, and try to optimize this under the hood. Three.js can be used to make a game engine, but it could be used for something else where the need for instancing would be vastly different.

So in order to use instancing with three.js we need to know GLSL and how WebGL data structures work. And that’s just for generic instancing, in order to make it work with three.js’s entire system, it becomes quite a bit more involved.

The three.js example only tackles the lambert material, through one approach for material extensions (copying the shader). There would be more code involved for a more complex material such as MeshStandardMaterial .

The user has to format the attribute properly, which involves converting BufferGeometry to InstancedBufferGeometry .

The responsibility of the scene graph get’s a bit conflated with Geometry which is not part of it. The bare bones of the scene graph is a node that we can set position on and which holds a Matrix4 . Since our uniform turned into an attribute, we have to set position as if it were a vertex of a mesh (on Geometry ).

InstancedBufferGeometry ends up doing the job of both Geometry and Object3D ( Group ).

In the next part we are going to actually write some code and apply instancing to a demo.