AnyDSL - A Partial Evaluation Framework for Programming High-Performance Libraries

AnyDSL is a framework for domain-specific libraries (DSLs). These are implemented in our language Impala. In order to achieve high-performance, Impala partially evaluates any abstractions these libraries might impose. Partial evaluation and other optimizations are performed on AnyDSL’s intermediate representation Thorin.

Support

You can ask for support on Discord.

AnyDSL Architecture

Embedding of DSLs in Impala

When developing a DSL, people from different areas come together:

the application developer who just wants to use the DSL,

the DSL designer who develops domain-specific abstractions, and

the machine expert who knows the target machine very well and how to massage the code in order to achieve good performance.

AnyDSL allows a separation of these concerns using

higher-order functions,

partial evaluation and,

triggered code generation.

Application Developer

fn main () { let img = load ( "dragon.png" ); let blurred = gaussian_blur ( img ); }

DSL Designer

fn gaussian_blur ( field : Field ) -> Field { let stencil : Stencil = { /* ... */ }; let mut out : Field = { /* ... */ }; for x , y in @ iterate ( out ) { out .data ( x , y ) = apply_stencil ( x , y , field , stencil ); } out }

Machine Expert

fn iterate ( field : Field , body : fn ( int , int ) -> ()) -> () { let grid = ( field .cols , field .rows , 1 ); let block = ( 128 , 1 , 1 ); with nvvm ( grid , block ) { let x = nvvm_tid_x () + nvvm_ntid_x () * nvvm_ctaid_x (); let y = nvvm_tid_y () + nvvm_ntid_y () * nvvm_ctaid_y (); body ( x , y ); } }

Talk

Selected Results

Rodent is a BVH traversal library and renderer implemented using the AnyDSL compiler framework. Rodent is a renderer-generating library that converts 3D scenes into optimized/specialized code the scene on CPUs and GPUs. Compared with state-of-the-art renderer, we obtain the following speedups:

Embree (Intel): up to 23% faster

OptiX (NVIDIA): up to 31% faster (megakernel)

OptiX (NVIDIA): up to 42% faster (wavefront)

Rodent supports also ARM CPUs and AMD GPUs.

Stincilla is a DSL for stencil codes. We used the Gaussian blur filter as example and compared against the implementations in OpenCV 3.0 as reference. Thereby, we achieved the following results:

Intel CPU: 40% faster

Intel GPU: 25% faster

AMD GPU: 50% faster

NVIDIA GPU: 45% faster

Up to 10x shorter code

RaTrace is a DSL for ray traversal.