The ability to build robust array type abstractions without overhead is one of the most fantastic features of Julia. It can be surprising how much one can do with the idea of an array to end up with better code. In this blog post I want to show how you can utilize Julia's interface tools and @generated functions to build these kinds of zero-cost abstractions.

What is a zero-cost abstraction?

First let's define a zero-cost abstraction. A zero-abstraction is a coding tool that has a zero run time cost associated with using it. Essentially what it does is it transforms into the fastest possible code during compilation so that its use has no effect on the run time of your code.

Example: Diagonal Matrices

A quick example is simple type wrappers in Julia. An example of this from the standard library is the Diagonal type. A Diagonal matrix is a matrix which is zero off of its diagonal, so it makes sense to represent a diagonal matrix by an array which is simply that diagonal. We can thus create a type which holds an array:

struct MyDiagonal { S,T < : AbstractVector { S } } < : AbstractMatrix { S } diag::T end

Notice that the parametric typing makes this a strictly typed type which the compiler will specialize on when it knows the input. Now for this to be useful we want this to act like a diagonal matrix. To do so, we define a few overloads for arithmetic. For example, multiplication of a vector by a diagonal matrix is the same as element-wise multiplication between the diagonal and said vector. Thus we can define the overload:

Base.: * ( D::MyDiagonal,v:: AbstractVector ) = D. diag . * v

and an in-place non-allocating version:

using LinearAlgebra LinearAlgebra. mul ! ( w:: AbstractVector ,D::MyDiagonal,v:: AbstractVector ) = w .= D. diag . * v

Now we can use this "matrix" in any code which accepts an AbstractMatrix type and only uses its multiplication. We can continue along and define its matrix addition, its multiplication against other matrices, etc. For full compatibility, it will be useful to implement the AbstractArray interface defined in Base. Once we define enough overloads we are free to use in anyone's code which accepts an AbstractMatrix and uses the standard arithmetic functions, even other packages which never heard of MyDiagonal. Want to use this matrix as an operator in DifferentialEquations.jl? Want to use this as an operator for a GMRES routine from IterativeSolvers.jl? It'll work!

But here's a question: is this efficient? If I wanted to make an optimized version of an algorithm for diagonal matrices, I would replace all matrix multiplications, additions, etc. with the fast operation on an array. If I take this type and throw it into someone's existing package, will it do something similar to the code I will have wanted to write? The answer is... YES! At compile time any use of D*v is now translated to be our element-wise multiplication of the diagonal against the vector, and so the generated code is the same as the code you would have wanted to hand-roll to optimize this case. To see this in action, let's write function that computes D*v+x for any matrix D and check the resulting compiled code. For simplicity, we will use a version of mul! which is a look and avoids broadcast, simply because broadcast is a complicated compilation beast.

@inline function LinearAlgebra. mul ! ( w:: AbstractVector ,D::MyDiagonal,v:: AbstractVector ) d = D. diag for i in eachindex ( w ) w [ i ] = d [ i ] * v [ i ] end end function muladd1 ! ( w:: AbstractVector ,D:: AbstractMatrix ,v:: AbstractVector ,x:: AbstractVector ) mul ! ( w,D,v ) w .+= x end function muladd2 ! ( w:: AbstractVector ,D::MyDiagonal,v:: AbstractVector ,x:: AbstractVector ) d = D. diag for i in eachindex ( w ) w [ i ] = d [ i ] * v [ i ] end w .+= x end v = rand ( 10 ) dmat = MyDiagonal ( v ) w = similar ( v ) x = rand ( 10 ) @code_llvm muladd1 ! ( w,dmat,v,x ) @code_llvm muladd2 ! ( w,dmat,v,x )

You can compare the two LLVM Intermediate Representations (IR, the platform-independent assembly used as input to the LLVM compiler) by looking at the result here and the result here. Notice that the only difference between the two codes is that the muladd1! code references that part of its code comes from the mul! function. Other than that reference note, the output IRs are exactly the same. Thus using the muladd1! version, which is generic to the input matrix type, is a fine substitute for our muladd2! specialized implementation.

Zero-Cost Abstraction for Convenience: LabelledArrays.jl

Now let's look at a more in-depth example. One thing that shows up when solving differential equations is that we want to think about components of an array by name and not by number. For example, the Lorenz equation is written as:

The input into the numerical differential equation solver translates (x,y,z) to (1,2,3) and uses array indexing, which looks like:

function lorenz_f ( du,u,p,t ) du [ 1 ] = p [ 1 ] * ( u [ 2 ] -u [ 1 ] ) du [ 2 ] = u [ 1 ] * ( p [ 2 ] -u [ 3 ] ) - u [ 2 ] du [ 3 ] = u [ 1 ] * u [ 2 ] - p [ 3 ] * u [ 3 ] end

One can guess how confusing it can be to check a code against a mathematical description when there's hundreds of variables. The purpose of LabelledArrays.jl is to allow indexing an array by its name with zero cost and without requiring a macro-based DSL. Here's what it looks like:

function lorenz_f ( du,u,p,t ) du. x = p.σ * ( u. y -u. x ) du. y = u. x * ( p.ρ-u. z ) - u. y du. z = u. x * u. y - p.β * u. z end

Here's how this is done

A LVector is an AbstractVector which contains as type information a tuple list of symbols for names and holds an array as run time data. This looks like:

struct LVector { T,A < : AbstractVector { T } ,Syms } < : AbstractVector { T } __x::A LVector { Syms } ( __x ) where { T,A,Syms } = new { eltype ( __x ) ,typeof ( __x ) ,Syms } ( __x ) end

Let's build an LVector for the Lorenz equation. What we would want this to do is allow us to use u.x or u[:x] as a replacement for u[1]. To do this, we first have to come up with a function for translating between the symbol and the number for the index. This is done by using findfirst against the tuple of types found in the type parameter. It looks like:

symnames ( :: Type { LVector { T,A,Syms } } ) where { T,A,Syms } = Syms function Base. getindex ( x::LVector,::Val { s } ) where s idx = findfirst ( y- > y==s,symnames ( typeof ( x ) ) ) x.__x [ idx ] end

Now let's use this to build our notational transform. u.x is simply a call to the getproperty! function, and u[:x] is a call to the getindex! function. Thus to make these efficient, what we need to do is make these functions call the translation to indices code only once. We can do this by using an @generated function. In Julia, a generated function is a function where you are given the types and you use Julia's metaprogramming to tell it how to build the function given the types. In this case, our types are our labelled array and a value-type for the name, and from that we can compile a code which simply returns the value at the correct index:

@inline @generated function Base. getindex ( x::LVector,::Val { s } ) where s idx = findfirst ( y- > y==s,symnames ( x ) ) : ( x.__x [ $idx ] ) end

As a last step, we need to bridge between the getproperty! and getindex! calls which take in a symbol and make these do the same call with the value-type.

@inline function Base. getproperty ( x::LVector,s:: Symbol ) getindex ( x,Val ( s ) ) end

You may ask if the compiler is able to handle this translation of a runtime symbol to a value-type at compile time. It cannot do it in all cases, and you can come up with multiple examples for why this can in general require knowing run time information. However, Julia's compilation process does constant propagation on literal values. A literal is a value that is directly written into the code. For example, in w=v*1, 1 is a literal and thus the compiler will make use of all of its properties and push its constantness into the multiplication code to produce a faster output. In our case, u.x is always a literal, and thus the compiler will specialize on the fact that the input to getproperty! this constant symbol, which it knows at compile time, and infer what the Val{symbol} is as a type, and then from there know how to call the already compiled version for the translation to indexing, making u.x give the same compiled code as u[1]. Similarly, if you have written u[:x] in your code, constant propagation will kick in and the result will be the same compiled code as u[1].

We can double-check all of this by once again inspecting the compiled code using Julia's @code_warntype introspection tooling to see what kind of AST the function generates. We can easily check that the compiled codes of u.x and u[:x] match that of u[1].

using LabelledArrays x = @LArray [ 1.0 , 2.0 , 3.0 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 ] ( :a,:b,:c,:d,:e,:f,:g,:h,:i,:j,:k,:l,:m,:n,:o,:p,:q,:r,:s,:t,:u,:v,:w,:x,:y,:z ) f ( x ) = x [ 26 ] g ( x ) = x. z @code_warntype f ( x ) ### Generated Code Body:: Float64 15 1 ─ % 1 = ( LabelledArrays. getfield ) ( x, :__x ) :: Array { Float64 , 1 } │ % 2 = ( Base. arrayref ) ( true , % 1, 26 ) :: Float64 └── return % 2 │ @code_warntype g ( x ) ### Generated Code Body:: Float64 14 1 ─ % 1 = ( LabelledArrays. getfield ) ( x, :__x ) :: Array { Float64 , 1 } │ % 2 = ( Base. arrayref ) ( true , % 1, 26 ) :: Float64 └── return % 2 │ using BenchmarkTools @btime f ( $x ) # 1.463 ns (0 allocations: 0 bytes) @btime g ( $x ) # 1.463 ns (0 allocations: 0 bytes)

Notice that the two are the same, with (Base.arrayref)(true, %1, 26) meaning to reference at the value of 26. No find commands exist in the generated code at all. Tada! At zero cost we can now index values out of an array by name. This is by definition a zero cost abstraction. Note that, with enough overloads, this can be used in any code which takes an AbstractVector.

Conclusion

You don't want to write algorithms which use arrays of numbers and hand-optimize every version. You want to write simple codes that work on any possible matrix or vector, and have it optimize itself given how its operations are defined. MyDiagonal showed how zero-cost abstractions can work to give you performance by allowing you to take control of how operations work in a generic code. But zero-cost abstractions can also just be a coding tool. LabelledArrays and its LVector showed how to build a naming scheme that keeps code clean but has zero run time cost. Using these kinds of tools together is how Julia gets both clean and fast code. The future of technical computing isn't about writing sneakily indexed loops and hand-optimizing codes to your domain: the future is optimizing how a type works, and then writing down a high level mathematical description of an algorithm and having it be easy to read and give optimal performance. Julia is already there.