Shading Languages

So for better or worse, I’ve designed my own shading language, written a compiler that parses it, it gives syntax and semantic errors, and if there are no errros, outputs the program in several other shading languages. This way, I can write a shader once and target multiple platforms. And plus, now I own the Dragon Book!

In line with GLSL, HLSL, PSSL, and the others, I’m calling my language SRSL for Shining Rock Shading Language. I’m seriously debating calling it SRSLY, because that’s funny, and owls are cool. I just need to decide what the Y stands for. Or not.

Defining the entire grammer and all the extensive language rules is probably beyond the scope of this post, so I’ll just be giving an overview here and lots of examples.

First some basics:

The body text of the shaders is very C like and should be understood easily coming from other shading languages. SRSL has all the intrinsic functions you’d expect from HLSL or GLSL. dot, cross, reflect, clamp, sin, cos, step, lerp, saturate, etc.

You’ll notice that all types in the examples are HLSL style – float, float2, float3, float4, int, int2, uint, uint2, float4x4, etc. I prefer this over GL style vec, ivec, uvec, and mat4. I think the HLSL style conveys more and better information about the built in types.

While none of these examples shows it, while loops, do loops, for loops, and conditionals are available in the language. However switch statements are not implemented.

One difference from C like languages is for the casting of variables between types. First, there is no automatic casting between int types and float types, as HLSL has. Also, if you had an int4 and wanted to cast it to float4, The C style cast would be:

(float4)variable;

But SRSL uses a more left to right readable post-fix op:

variable.cast(float4);

Another difference is declaration of arrays. In SRSL, arrays are defined with the array size as part of the type definition, like so:

// two arrays of matrix float4x4[64] transforms, bones;

Whereas the C style declaration would be:

float4x4 transforms[64], bones[64];

The language has no global variables. Everything is passed to the shader entry point and if a function needs it, it has to be passed as a parameter. I’m pretty sure I’ve rarely or never used globals in shader programming so I didn’t see an immediate need for the language to have them. (Obviously in other languages vertex attributes, uniforms and textures can be globals, but that’s not what I’m talking about here.)

The only use case I can think of having globals are constants you want to define once, such as pi, or magic numbers used in more than one place. But rather than add another language feature, functions are used to return constant values. Due to shader compilers aggressive inlining, the call will always go away and just be a constant in the assembly.

For example:

float pi() { return 3.14159265359; }

Instead of something like:

const float pi = 3.14159265359;

So now I’ll get right to it, let’s take a look at a shader. Here’s the body of the simplest shader used in Banished.

program Debug { stream(DebugVertex) uniforms(VSConstants) vertexshader vs { float3 worldpos = tc.transforms[0] * float4(input.position, 1.0); output.position = gc.worldToProjection * float4(worldpos, 1.0); output.color = input.color; } stream(Interpolant) pixelshader ps { output.color = input.color; } stream(PixelOutput) }

My first goal for the language was to treat the vertex shader, geometry shader, pixel shader, etc as a single program – they’re usually written as pairs and triplets (or however many stages are in use), so the language should treat them as such. It should be hard to write a valid program where the parts of the pipeline (vertex, geometry, pixel) aren’t sync’d by default, however writing a lone pipeline stage should be possible if needed.

The way the language sees the graphics pipeline is stream of data, followed by some code, which outputs a stream of data, and then more code runs, and another stream of data is output.

If I ever add compute style shaders to the language, it probably won’t follow this paradigm – as compute shaders aren’t necessarily going to be used to put pixels on the screen. But since it’s my own language, I can always add specific syntax for whatever I need.

Anyhow, streams of data are defined by a struct. Each member of the struct can have an attribute that binds it to a specific hardware resource. Attributes aren’t required, except in a few cases – such as screen space position, instance index, vertex index, depth output, etc. All the : 0, : 1, : 2, assignments below are optional and the compiler will assign them if they aren’t specified.

// debug vertex describes the input data from the host program struct DebugVertex { float3 position : 0; // bound to vertex attribute 0 float4 color : 1; // bound to vertex attribute 1 } // The interpolant passes data from stage to stage struct Interpolant { float4 position : clipposition; // special attribute for vs outputs float4 color : 0; // interpolated attribute } // output color to render target. For multiple render targets, // multiple outputs can be defined, as well as depth output. struct PixelOutput { float4 color : 0; // single color output } program Debug { stream(DebugVertex) // stream definition (vertex attributes) vertexshader vs { ... } stream(Interpolant) // stream definition (passed from vs to ps) pixelshader ps { ... } stream(PixelOutput) // stream definition (output pixel colors) }

For any shader, the stream defined above it is assigned to an automatically defined variable named ‘input’, and the stream after it is defined as ‘output’.

Often times, you want multiple pixel shaders per vertex shader, or vice versa. In that case you can define multiple shaders between streams. I’d use this for alpha test, different texture blending, skinning or morphing vertices, etc. As long as the streams are the same for shaders you can write something like this:

program Debug { stream(Interpolant) pixelshader ps1 { ... } pixelshader ps2 { ... } pixelshader ps3 { ... } stream(PixelOutput) }

This also shows how you can write just a pixel shader, without the vertex shader preceding it.

Another design goal I had was to remove repetitive code in the shaders. This tends to happen quite a bit when you have a lot of similar shaders with small differences. An extra rim lighting highlight, or skinning a model. These are the same as a base shader with only a few lines added.

So the language allows you to insert shader code in a previously defined shader. In the next example, the shader ‘psalpha’ derives from ‘ps’ – all the code from the body is used, and then the clip instruction is appended on the bottom. This is a very common operation when defining some shader where a version is needed that discards pixels based on the alpha channel.

program Normal { stream(Interpolant) uniforms(PSConstants) textures(OpaqueTexture) pixelshader ps { float shadow = GetShadowValue(shadowMap, input.shadowProjection, pc.texelSize.x); float3 ao = sample(aoMap, input.texcoord.zw).xxx; float4 color = sample(diffuseMap, input.texcoord.xy); output.color.xyz = ComputeLighting(input.lightfog, color.xyz, shadow, float3(1.0, 1.0, 1.0), ao, pc.lightColor, pc.ambientColor, pc.fogColor); output.color.w = 0.0; } pixelshader psalpha : ps { // discard pixels when diffuse alpha is less than threshold clip(color.w - pc.alphaRef); } stream(PixelOutput) }

Not only can you append code to the end of a shader, but you can insert it somewhere in the middle using a label.

Below is a shader that computes the position of a vertex for use in shadow mapping. Note the label keyword. At that location, any vertex can be modified in local space if code is inserted there.

program Depth { stream(ModelDepthVertex) uniforms(VSConstants) vertexshader vs { // get transform from list of instances float4x4 localToWorld = tc.transforms[input.instance]; // decompress position from fixed point to float float3 position = input.position.xyz.cast(float3) * (1.0 / 512.0); label positionmod; // insertion point for vertex modification // apply local scale position *= localToWorld.row3.xyz; // transform to world, then to screenspace float3 worldPosition = localToWorld.cast(float3x4) * float4(position, 1.0); output.position = gc.worldToProjection * float4(worldPosition, 1.0); } stream(DepthInterpolant) }

When a skinned model needs to be rendered into the shadow map, just the skinning of the vertex can be inserted at the label positionmod. Note that the stream input for this shader is different, but as long as it contains all the inputs from the parent shader, it will compile just fine.

program DepthSkin { stream(ModelDepthSkinVertex) uniforms(VSConstants) vertexshader vs : Depth.vs(positionmod) { position = SkinPosition(position, input.index, input.weight, bc); } stream(DepthInterpolant) }

At this point, you may be wondering about this fancy code insertion language feature, and why I’m not just using macros or functions to do the same thing.

With functions, each shader would have a long list of function calls, with many parameters, and many declarations for out parameters that are used by different parts of the shader. In my experience shaders are very volatile during development – they change all the time as features get added and removed, or new ideas are tested. Function signatures change frequently. If a function signature changes, I’d rather not spend the time to change 50 or 100 shaders to update the calling parameters. It’s easier to just have all the code inline and allow variables from one shader to be accessed without issue in another.

At least, that’s the idea – It’s worked well for reducing code size for Banished, and hopefully will do so for future projects as well.

Macros are something I’m not interested in implementing in the language, however there’s a simple preprocessor in the languages tokenizer, with simple #if, #ifn, #define, #else, #end, and #include. It allows for different compilation based on target and features, and for sharing of common functions and structs.

You might see something like this in the shader, to disable computation of shadow mapping coordinates at the lowest shader detail levels.

#ifn DETAIL0 // only include shadow computation when detail level isn't 0 output.shadowProjection = lc.shadowProjection[0] * float4(worldPosition, 1.0); #end

There is no requirement for preprocessor like tokens to be the first item on a line, so you might also see something like this with a conditional compile inline. DirectX 9 has no instance input as a shader variable so it has to be faked somehow. In Banished, it’s currently done like this:

// get transform from list of instances float4x4 localToWorld = tc.transforms[#if DX9 input.position.w #else input.instance #end];

Functions can be defined outside of a program block for shared functionality, and look like typical C style functions:

float3 SkinPosition(float3 position, int4 index, float4 weight, BoneConstants bc) { return ((bc.transforms[index.x] * float4(position, 1.0) * weight.x) + (bc.transforms[index.y] * float4(position, 1.0) * weight.y) + (bc.transforms[index.z] * float4(position, 1.0) * weight.z) + (bc.transforms[index.w] * float4(position, 1.0) * weight.w)).xyz; }

Shaders use more than just vertex inputs – there are also uniform constants and textures that need to be passed to the shader. In designing the language, I wanted the use of constants and textures and their bindings to registers to look exactly like binding stream variables to hardware registers.

If you look back at the first example I presented, the vertex shader uses several constants, namely tc and gc. These are defined like this:

// vertex constants that can be accessed anytime and don't change per render struct GlobalConstants { float4x4 worldToProjection; // used to transform to screenspace float4x4 worldToCamera; // used to transform to cameraspace float4 cameraPosition; // camera location float4 time; // current time in seconds float4 fog; // values for computing linear fog float4 fogColor; // fog color } // list of instance transforms, changes per draw struct TransformConstants { float4x4[128] transforms; } // VSConstants is a list of all constant buffers available to the shader. // If used as constants input, this struct can only contain fields of other // user defined structs struct VSConstants { GlobalConstants gc : 0; // bound to constant buffer 0 TransformConstants tc : 3; // bound to constant buffer 3 }

When you want to use a set of vertex constants in a shader program it’s referenced like this:

stream(Vertex) uniforms(VSConstants) vertexshader vs { ... } stream(Interpolant) uniforms(PSConstants) pixelshader ps { ... }

The idea here is that there’s no need for a whole lot of loose global uniform constants (or constant buffers) like in HLSL and GLSL. The host program only provides certain constants, and they are generally known to the shader program and are available all the time. This way they are explicitly defined, and once setup it’s hard to make a mistake, such as using a uniform constant meant for a pixel shader in a vertex shader.

For instances where constants are different, say for drawing a specific type of geometry, a different set of constants could be specified making sure that only the available constants are actually used by the shader.

Textures are defined in a similar manner. The texture struct can only contain texture types.

struct PSTextures { texture2d diffuse : 0; // here the attribute defines which index texture2d snow : 1; // the texture / sampler is bound to. texture2d ao : 3; shadow2d shadow : 5; } stream(Interpolant) uniforms(PSConstants) textures(PSTextures) pixelshader ps { ... float4 color = sample(diffuse, input.texcoord); color *= sampleproj(shadow, input.shadowProjection); ... }

I’m going digress a little bit here, and you’ll see some of my thought process when designing this language. The language is still new and may need some tweaking – this is one of those places.

If you’ve been close paying attention to any of the examples, you’ll notice a glaring inconstancy with uniforms and textures versus the shader inputs and outputs. Shader inputs and outputs are automatically defined variables of the input and output type – input.position, input.texcoord, output.position, output.color, etc.

Textures and uniforms are currently used without a name and the variables inside the struct are simply declared as locals to the shader. This is okay. But I’ve been trying to decide if I should make this consistent with the other shader inputs.

Currently uniforms and textures would be accessed such as:

float4 position = gc.worldToProjection * input.position; float4 color = sample(diffuse, input.texcoord);

But I’ve been thinking about changing it to

float4 position = uniform.gc.worldToProjection * input.position; float4 color = sample(textures.diffuse, input.texcoord);

I like this change for a few reasons. First, it’s consistent the with the way streams are handled, and second, it stops you from inadvertently polluting the local variable namespace with unintended names that you might otherwise use. One day I might add a new texture to a struct, and it’s name clashes with an existing local in a shader – requiring a name change to one item or the other.

On the flipside, streams could have the input. and output. dropped as well, but too often I want to put the same names in both structs (position, texcoord, color, etc) so prefixing them with input. and output. is better in my opinion.

In the case of textures, I might want a texture named diffuse and a variable named diffuse to represent the resulting color when the textures is sampled.

float4 diffuse = sample(textures.diffuse, input.texcoord);

That’s nice and fairly clear as to what the variable holds.

The real downside here is for uniforms. Having to write something like ‘uniform.gc.worldToProjection‘ all over the place may be overly verbose, however it’s absolutely clear what’s going on. I can think of a few ways to reduce the length, such as allowing a user specified name when declaring uniforms and textures such as…

stream(Interpolant) uniforms(PSConstants, u) textures(ModelTextures, t) ... float4 position = u.gc.worldToProjection * input.position; float4 color = sample(t.diffuse, input.texcoord);

On the other had, I could scope the textures with a variable and leave uniforms alone. Really this is just sugar on the language. It works fine as is, and I’ll probably make a decision one way or the other the more I use it.

Changing Banished to use the new language (once the compiler was written and debugged) has been fairly painless and the reduction in code redundancy is very good. (I’ve actually found several bugs in the original shader code by doing the conversion, Whoops!)

Banished has also been a good test bed for a variety of shaders – I think it would be hard to design something like this without a real world test case.

Everything is pretty much done, but I’m sure the compiler still has bugs in it that i’ll find as I write more shaders. There are also missing features I’d like to add at some point. Depending on what else I’m working on I may not add them until I need them.

There’s currently no texture array type yet, and there aren’t sampling functions to specify which mip to use. (but a new texture type and sampling function are fairly easy to add). There are no multidimensional arrays, but I can’t think of the last time I even used one in C++. Geometry shader support isn’t finished. And there’s no tessellation shader as of yet.

Phew. Don’t fool yourself, compilers and languages are big projects.

So that’s SRSL (or SRSLY…) in a nutshell. It works, I can draw stuff using it, it’s cross platform ready. Now I can finally finish the OpenGL graphics renderer. Woot.