Irradiance Map

For the convolution of the diffuse irradiance map there's just one change required to have the best peformance, which is to avoid using a sampler, i.e. samplerCube (which may also no be possible on certain platforms), and use images instead as they have better memory coherency for the compute shader execution:

layout(binding = 0, rgba8) uniform readonly imageCube environmentMap; layout(binding = 1, rgba16f) writeonly uniform imageCube irradianceMap;

In Vulkan (and OpenGL) a cube texture is treated as an array of six 2D textures, because of this we have to convert from the compute task IDs (i.e. gl_GlobalInvocationID) to the cube coordinates, this is straightfoward (note that we had to flip some of the UV coordinates, for more details see Table 3.11 from link):

vec3 cubeCoordToWorld( ivec3 cubeCoord, vec2 cubemapSize) { vec2 texCoord = vec2 (cubeCoord.xy) / cubemapSize; texCoord = texCoord * 2.0 - 1.0; // -1..1 switch (cubeCoord.z) { case 0: return vec3 (1.0, -texCoord.yx); // posx case 1: return vec3 (-1.0, -texCoord.y, texCoord.x); //negx case 2: return vec3 (texCoord.x, 1.0, texCoord.y); // posy case 3: return vec3 (texCoord.x, -1.0, -texCoord.y); //negy case 4: return vec3 (texCoord.x, -texCoord.y, 1.0); // posz case 5: return vec3 (-texCoord.xy, -1.0); // negz } return vec3 (0.0); }

We also must convert from 3D coordinates (from which to sample) to 2D coordinates, that we must clamp to the [0, size -1] range, and a texture index in the array:

ivec3 texCoordToCube( vec3 texCoord, vec2 cubemapSize) { vec3 abst = abs(texCoord); texCoord /= max3(abst); float cubeFace; vec2 uvCoord; if (abst.x > abst.y && abst.x > abst.z) { // x major float negx = step(texCoord.x, 0.0); uvCoord = mix(-texCoord.zy, vec2 (texCoord.z, -texCoord.y), negx); cubeFace = negx; } else if (abst.y > abst.z) { // y major float negy = step(texCoord.y, 0.0); uvCoord = mix(texCoord.xz, vec2 (texCoord.x, -texCoord.z), negy); cubeFace = 2.0 + negy; } else { // z major float negz = step(texCoord.z, 0.0); uvCoord = mix( vec2 (texCoord.x, -texCoord.y), -texCoord.xy, negz); cubeFace = 4.0 + negz; } uvCoord = (uvCoord + 1.0) * 0.5; // 0..1 uvCoord = uvCoord * cubemapSize; uvCoord = clamp(uvCoord, vec2 (0.0), cubemapSize - vec2 (1.0)); return ivec3 ( ivec2 (uvCoord), int (cubeFace)); }

Using cubeCoordToWorld we get the cube coordinates and convert to tangent space:

ivec3 cubeCoord = ivec3 (gl_GlobalInvocationID); vec3 worldPos = cubeCoordToWorld(cubeCoord, cubemapSize); // tagent space from origin point vec3 normal = normalize(worldPos); vec3 up = vec3 (0.0, 1.0, 0.0); vec3 right = normalize(cross(up, normal)); up = cross(normal, right);

After we can get samples via imageLoad using the texture array coordindates:

// spherical to cartesian, in tangent space vec3 sphereCoord = vec3 (sinTheta * cosPhi, sinTheta * sinPhi, cosTheta); // tangent space to world vec3 sampleVec = sphereCoord.x * right + sphereCoord.y * up + sphereCoord.z * normal; // world to cube coord ivec3 sampleCoord = texCoordToCube(sampleVec, cubemapSize); irradiance += imageLoad(environmentMap, sampleCoord).rgb * cosTheta * sinTheta;

And finally we simply write via imageStore:

irradiance *= PI * invTotalSamples; imageStore(irradianceMap, cubeCoord, vec4 (irradiance, 1.0));

Pre-filtered Map

Compared to the irradiance map the pre-filtered map uses mipmaps based on the number of roughness bins you want to use, because of this we'll have to set the target mipmap for the compute shader, for OpenGL we must use glBindImageTexture before dispatching:

const int textureSize = 1024; const int maxMipLevels = 5; for ( int mip = 0; mip < maxMipLevels; ++mip) { const int mipmapSize = textureSize >> mip; const float roughness = mip / ( float )(maxMipLevels - 1); glBindImageTexture(1, prefilteredMapTexID, mip, GL_TRUE, 0, GL_READ_WRITE, GL_RGB16F); // after set uniforms (i.e. roughness and mipmapSize size) and dispatch }

While for Vulkan you must create an image view with the current mip level (equivalent of glBindImageTexture) and set it to the descriptor set before dispatching:

VkImageViewCreateInfo viewInfo = {}; viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO; // set other fiels as usual viewInfo.subresourceRange.baseMipLevel = baseMipLevel; viewInfo.subresourceRange.levelCount = m_mipLevels - baseMipLevel; viewInfo.subresourceRange.baseArrayLayer = 0; viewInfo.subresourceRange.layerCount = 6; vkCreateImageView(m_device, &viewInfo, nullptr, &mipViewHandle); // after set it to a descriptor set VkWriteDescriptorSet descriptorWrite; descriptorWrite.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; VkDescriptorImageInfo imageInfo; imageInfo.sampler = VK_NULL_HANDLE; imageInfo.imageView = mipViewHandle; imageInfo.imageView = cubemapImageLayout; // finally set uniforms (i.e. roughness and mipmapSize size), bind the descriptor and dispatch

As in the previous case we use cubeCoordToWorld to get the world coordinates:

ivec3 cubeCoord = ivec3 (gl_GlobalInvocationID); vec3 worldPos = cubeCoordToWorld(cubeCoord, u_mipmapSize); // tagent space from origin point vec3 N = normalize(worldPos); // assume view direction always equal to outgoing direction vec3 R = N; vec3 V = N;

However we have to use a samplerCube and sample via textureLod if we want to use the PDF based filtering to get rid of the aliasing artifacts as described by Chetan Jaggi1:

float totalWeight = 0.0; vec3 prefilteredColor = vec3 (0.0); for (uint i = 0u; i < totalSamples; ++i) { // generate sample vector towards the alignment of the specular lobe vec2 Xi = hammersleySequence(i, totalSamples); vec3 H = importanceSampleGGX(Xi, N, u_roughness); float dotHV = dot(H, V); vec3 L = normalize(2.0 * dotHV * H - V); float dotNL = max(dot(N, L), 0.0); if (dotNL > 0.0) { float dotNH = max(dot(N, H), 0.0); dotHV = max(dotHV, 0.0); // sample from the environment's mip level based on roughness/pdf float D = d_ggx(dotNH, roughness); float pdf = D * dotNH / (4.0 * dotHV) + 0.0001; float saTexel = 4.0 * PI / (6.0 * originalSamples); float saSample = 1.0 / (totalSamples * pdf + 0.0001); float mipLevel = roughness == 0.0 ? 0.0 : 0.5 * log2(saSample / saTexel); prefilteredColor += textureLod(environmentMap, L, mipLevel).rgb * dotNL; totalWeight += dotNL; } } prefilteredColor = prefilteredColor / totalWeight;

If quality is acceptable w/o the PDF-based filtering, or you want to a more simplified filtering method, you can use instead an imageCube and texCoordToCube for sampling:

if (dotNL > 0.0) { ivec3 sampleCoord = texCoordToCube(L, mipmapSize); prefilteredColor += imageLoad(environmentMap, sampleCoord).rgb * dotNL; totalWeight += dotNL; }

BRDF LUT Map

Using a compute shader for the integrated BRDF map is straightforward especially as it's also computed once, the only small note as a general good practice (if performance matters most) is using a constant variable for the texture size instead of using textureSize (which on most platforms is implemented as an uniform behind the scenes), but of course the performance impact is negligible.

const vec2 cubemapSize = vec2 (1024.0, 1024.0);

Shader Dispatch

To dispatch use the texture size and numbers of faces, however note that the local size (which by default is 1) will affect performance and is hardware dependent, in my case for an 1024 texture size using 32 gave slightly better performance:

// must match local_size in GLSL int workGroupSize = 32; // in vulkan vkCmdDispatch(cmd, cubeMapSize / workGroupSize, cubeMapSize / workGroupSize, 6); // in OpenGL glDispatchCompute(cubeMapSize / workGroupSize, cubeMapSize / workGroupSize, 6);

For a better understanding why local size does impacts peformance I recommend reading DirectCompute Optimizations and Best Practices4. However, the limitation is that local size is a compile-time constant so you will have to precompile the shaders for multiple sizes and then use the optimum one based on the hardware used.