The AMD blog features an in-depth post on the work they did on Blender Cycles. In particular, they broke up the Cycles kernel into 10 smaller kernels, and they optimized the pre-processing step from CPU into a much, much faster GPU implementation. Interesting stuff to read!

This article is part of an occasional series about what developers can do when they collaborate. AMD is a real believer in open source projects. Our developers actively contribute to and maintain a variety of open source projects, from highly optimized math libraries to… well, let’s talk about Blender Cycles.

AMD undertook to improve the support for GPU compute inside Blender Cycles. Prior to this effort, the GPU kernel used for rendering was monolithic and huge. As a result of the kernel’s size, the generated code had to spill/unspill registers. These spill/unspill operations cause slower performance, and reduced occupancy. (Occupancy represents the actual number of waves running on the GPU simultaneously. More is better.)

In addition to producing inefficient code, the compiler would sometimes not successfully complete the build, or would generate incorrect code that could lead to black screens or a kernel hang. These are certifiable “bad things.”