In this post I am going to try to explain in detail the type-dispatch design which is used in Julian software architectures. It's modeled after the design of many different packages and Julia Base, and has been discussed in parts elsewhere. This is actually just a blog post translation from my "A Deep Introduction to Julia for Data Science and Scientific Computing" workshop notes. I think it's an important enough topic to share more broadly. The tl;dr: Julia is built around types. Software architectures in Julia are built around good use of the type system. This makes it easy to build generic code which works over a large range of types and gets good performance. The result is high-performance code that has many features. In fact, with generic typing, your code may have more features than you know of! The purpose of this tutorial is to introduce the multiple dispatch designs that allow this to happen. Now let's discuss the main components of this design! Duck Typing If it quacks like a duck, it might as well be a duck. This is the idea of defining an object by the way that it acts. This idea is central to type-based designs: abstract types are defined by how they act. For example, a `Number` is some type that can do things like +,-,*, and /. In this category we have things like Float64 and Int32. An AbstractFloat is some floating point number, and so it should have a dispatch of eps(T) that gives its machine epsilon. An AbstractArray is a type that can be indexed like `A[i]`. An AbstractArray may be mutable, meaning it can be "set": A[i]=v. These abstract types then have actions which abstract from their underlying implmentation. A.*B does element-wise multiplication, and in many cases it does not matter what kind of array this is done on. The default is Array which is a contiguous array on the CPU, but this action is common amongst AbstractArray types. If a user has a DistributedArray (DArray), then A.*B will work on multiple nodes of a cluster. If the user uses a `GPUArray`, then A.*B will be performed on the GPU. Thus, if you don't restrict the usage of your algorithm to Array, then your algorithm actually "just works" as many different algorithms. This is all well and good, but this would not be worthwhile if it were not performant. Thankfully, Julia has an answer to this. Every function auto-specializes on the types which it is given. Thus if you look at something like: my_square ( x ) = x^ 2 then we see that this function will be efficient for the types that we give it. Looking at the generated code: julia > @code_llvm my_square ( 1 ) ; Function my_square ; Location: REPL [ 1 ] : 1 ; Function Attrs: uwtable define i64 @julia_my_square_34543 ( i64 ) #0 { top: ; Function literal_pow ; { ; Location: intfuncs. jl : 243 ; Function *; { ; Location: int . jl : 54 % 1 = mul i64 % 0, % 0 ; } } ret i64 % 1 } julia > @code_llvm my_square ( 1.0 ) ; Function my_square ; Location: REPL [ 1 ] : 1 ; Function Attrs: uwtable define double @julia_my_square_34579 ( double ) #0 { top: ; Function literal_pow ; { ; Location: intfuncs. jl : 243 ; Function *; { ; Location: float . jl : 399 % 1 = fmul double % 0, % 0 ; } } ret double % 1 } See that the function which is generated by the compiler is different in each case. The first specifically is an integer multiplication x*x of the input x. The other is a floating point multiplication x*x of the input x. But this means that it does not matter what kind of Number we put in here: this function will work as long as * is defined, and it will be efficient by Julia's multiple dispatch design. Thus we don't need to restrict the types we allow in functions in order to get performance. That means that my_restricted_square ( x:: Int ) = x^ 2 is no more efficient than the version above, and actually generates the same exact compiled code: julia > @code_llvm my_restricted_square ( 1 ) ; Function my_restricted_square ; Location: REPL [ 4 ] : 1 ; Function Attrs: uwtable define i64 @julia_my_restricted_square_34585 ( i64 ) #0 { top: ; Function literal_pow ; { ; Location: intfuncs. jl : 243 ; Function *; { ; Location: int . jl : 54 % 1 = mul i64 % 0, % 0 ; } } ret i64 % 1 } Thus we can write generic and efficient code by leaving our functions unrestricted. This is the practice of duck-typing functions. We just let them work on any input types. If the type has the correct actions, the function will "just work". If it does not have the correct actions, for our example above say * is undefined, then a MethodError saying the action is not defined will be thrown. We can be slightly more conservative by restricting to abstract types. For example: my_number_restricted_square ( x:: Number ) = x^ 2 will allow any Number. There are things which can square which aren't Numbers for which this will now throw an error (a matrix is a simple example). But, this can let us clearly define the interface for our package/script/code. Using these assertions, we can then dispatch differently for different type classes. For example: my_number_restricted_square ( x:: AbstractArray ) = ( println ( x ) ; x.^ 2 ) Now, my_number_restricted_square calculates x^2 on a Number, and for an array it will print the array and calculate x^2 element-wise. Thus we are controlling behavior with broad strokes using classes of types and their associated actions. Type Hierarchies This idea of control leads to type hierarchies. In object-oriented programming languages, you sort objects by their implementation. Fields, the pieces of data that an object holds, are what is inherited. There is an inherent limitation to that kind of thinking when looking to achieve good performance. In many cases, you don't need as much data to do an action. A good example of this is the range type, for example 1:10. a = 1 : 10 This type is an abstract array: typeof ( a ) < : AbstractArray #true It has actions like an Array fieldnames ( typeof ( a ) ) # Output ( : start , :stop )

It is an immutable type which just holds the start and stop values. This means that its indexing, A[i], is just a function. What's nice about this is that means that no array is ever created. Creating large arrays can be a costly action:

@ time collect ( 1 : 10000000 ) 0.038615 seconds ( 308 allocations: 76.312 MB, 45.16 % gc time )

But creating an immutable type of two numbers is essentially free, no matter what those two numbers are:

@ time 1 : 10000000 0.000001 seconds ( 5 allocations: 192 bytes )

The array takes memory to store its values while this type is , using a constant 192 bytes (if the start and stop are Int64). Yet, in cases where we just want to index values, they act exactly the same.

Another nice example is the UniformScaling operator, which acts like an identity matrix without forming an identity matrix.

using LinearAlgebra println ( I [ 10 , 10 ] ) # prints 1 println ( I [ 10 , 2 ] ) # prints 0

This can calculate expressions like A-b*I without ever forming the matrix (eye(n)) which would take memory.

This means that a lot of efficiency can be gained by generalizing our algorithms to allow for generic typing and organization around actions. This means that, while in an object-oriented programming language you group by implementation details, in typed-dispatch programming you group by actions. Number is an abstract type for "things which act like numbers, i.e. do things like *", while AbstractArray is for "things which index and sometimes set".

This is the key idea to keep in mind when building type hierarchies: things which subtype are inheriting behavior. You should setup your abstract types to mean the existence or non-existence of some behavior. For example:

abstract type AbstractPerson end abstract type AbstractStudent < : AbstractPerson end abstract type AbstractTeacher < : AbstractPerson end struct Person < : AbstractPerson name:: String end struct Student < : AbstractStudent name:: String grade:: Int hobby:: String end struct MusicStudent < : AbstractStudent grade:: Int end struct Teacher < : AbstractTeacher name:: String grade:: Int end

This can be interpreted as follows. At the top we have AbstractPerson. Our interface here is "a Person is someone who has a name which can be gotten by get_name".

get_name ( x::AbstractPerson ) = x. name

Thus codes which are written for an AbstractPerson can "know" (by our informal declaration of the interface) that get_name will "just work" for its subtypes. However, notice that MusicStudent doesn't have a name field. This is because MusicStudents just want to be named whatever the trendiest band is, so we can just replace the usage of the field by the action:

get_name ( x::MusicStudent ) = "Justin Bieber"

In this way, we can use get_name to get the name, and how it was implemented (whether it's pulling something that had to be stored from memory, or if it's something magically known in advance) does not matter. We can keep refining this: an AbstractStudent has a get_hobby, but a MusicStudent's hobby is always Music, so there's not reason to store that data in the type and instead just have its actions implicitly "know" this. In non-trivial examples (like the range and UniformScaling above), this distinction by action and abstraction away from the actual implementation of the types allows for full optimization of generic codes.

Small Functions and Constant Propagation

The next question to ask is, does storing information in functions and actions affect performance? The answer is yes, and in favor of the function approach! To see this, let's see what happens when we use these functions. To make it simpler, let's use a boolean function. Teachers are old and don't like music, while students do like music. But generally people like music. This means that:

likes_music ( x::AbstractTeacher ) = false likes_music ( x::AbstractStudent ) = true likes_music ( x::AbstractPerson ) = true

Now how many records would these people buy at a record store? If they don't like music, they will buy zero records. If they like music, then they will pick up a random number between 1 and 10. If they are a student, they will then double that (impulsive Millennials!).

function number_of_records ( x::AbstractPerson ) if ! likes_music ( x ) return 0 end num_records = rand ( 1 : 10 ) if typeof ( x ) < : AbstractStudent return 2num_records else return num_records end end

Let's check the code that is created:

x = Teacher ( "Randy" , 11 ) println ( number_of_records ( x ) ) @code_llvm number_of_records ( x )

on v1.0, we get:

; Function number_of_records ; Location: REPL [ 33 ] : 2 ; Function Attrs: uwtable define i64 @julia_number_of_records_34671 ( % jl_value_t addrspace ( 10 ) * nonnull align 8 dereferenceable ( 16 ) ) #0 { top: ret i64 0 }

Notice that the entire function compiled away, and the resulting compiled code is "return 0"! Then for a music student:

x = MusicStudent ( 10 ) @code_typed number_of_records ( x ) # Output CodeInfo ( 2 1 ── goto #3 if not false │ 2 ── nothing::Nothing │ 5 3 ── ( Base. ifelse ) ( true , 10 , 0 ) :: Int64 Colon │ % 4 = Random. GLOBAL_RNG ::MersenneTwister rand │ % 5 = ( Base. slt_int ) ( 10 , 1 ) :: Bool ││╻╷╷╷╷╷╷ rand └─── goto #5 if not %5 │││┃│││ Type 4 ── % 7 = % new ( Core. ArgumentError , "range must be non-empty" ) ::ArgumentError │ ( Random. throw ) ( % 7 ) :: Union { } │││││┃│ Type └─── $ ( Expr ( :unreachable ) ) :: Union { } │││││┃ Type 5 ── % 10 = ( Base. sub_int ) ( 10 , 1 ) :: Int64 │││││││╻ - │ % 11 = ( Base. bitcast ) ( UInt64, % 10 ) ::UInt64│╻ rem │ % 12 = ( Base. ifelse ) ( true , 64 , 0 ) :: Int64 │││╻ << │ % 13 = ( Base. ctlz_int ) ( % 11 ) ::UInt64 │││││││╻ leading_zeros │ % 14 = ( Core. lshr_int ) ( % 13, 63 ) ::UInt64│││││╻╷╷ Type │ % 15 = ( Core. trunc_int ) ( Core. UInt8 , % 14 ) ::UInt8│ toInt64 │ % 16 = ( Core. eq_int ) ( % 15, 0x01 ) :: Bool │││││││││┃│ check_top_bit └─── goto #7 if not %16 │││││││││││ 6 ── invoke Core. throw_inexacterror ( :check_top_bit:: Symbol , UInt64:: Any , % 13::UInt64 ) :: Union { } └─── $ ( Expr ( :unreachable ) ) :: Union { } ││││││││││ 7 ── goto #8 │││││││││││ 8 ── % 21 = ( Core. bitcast ) ( Core. Int64 , % 13 ) :: Int64 │ └─── goto #9 ││││││││││ 9 ── goto #10 │││││││││ 10 ─ goto #11 ││││││││ 11 ─ % 25 = ( Base. sub_int ) ( % 12, % 21 ) :: Int64 │││││╻ - │ % 26 = ( Base. bitcast ) ( UInt64, % 25 ) ::UInt64│╻ rem │ % 27 = ( Base. shl_int ) ( 0x0000000000000001, % 26 ) ::UInt64 << │ % 28 = ( Base. sub_int ) ( % 27, 0x0000000000000001 ) ::UInt64 - │ % 29 = % new ( SamplerRangeFast { UInt64, Int64 } , 1 , % 26, % 11, % 28 ) ::SamplerRangeFast { UInt64, Int64 } └─── goto #12 │││││││ 12 ─ goto #13 ││││││ 13 ─ goto #14 │││││ 14 ─ goto #15 ││││ 15 ─ % 34 = invoke Random. rand ( % 4::Random. MersenneTwister , % 29::Random. SamplerRangeFast { UInt64, Int64 } ) :: Int64 └─── goto #16 │││ 16 ─ goto #17 ││ 6 17 ─ ( MusicStudent < : Main. AbstractStudent ) ::Const ( true , false ) 7 │ % 38 = ( Base. mul_int ) ( 2 , % 34 ) :: Int64 │╻ * └─── return % 38 │ 18 ─ $ ( Expr ( :unreachable ) ) :: Union { } ) = > Int64

we get a multiplication by 2 (this is seen in the line "(Base.mul_int)(2, %34)::Int64"), while for a regular person,

x = Person ( "Miguel" ) @code_typed number_of_records ( x ) # Output CodeInfo ( 2 1 ── goto #3 if not false │ 2 ── nothing::Nothing │ 5 3 ── ( Base. ifelse ) ( true , 10 , 0 ) :: Int64 Colon │ % 4 = Random. GLOBAL_RNG ::MersenneTwister rand │ % 5 = ( Base. slt_int ) ( 10 , 1 ) :: Bool ││╻╷╷╷╷╷╷ rand └─── goto #5 if not %5 │││┃│││ Type 4 ── % 7 = % new ( Core. ArgumentError , "range must be non-empty" ) ::ArgumentError │ ( Random. throw ) ( % 7 ) :: Union { } │││││┃│ Type └─── $ ( Expr ( :unreachable ) ) :: Union { } │││││┃ Type 5 ── % 10 = ( Base. sub_int ) ( 10 , 1 ) :: Int64 │││││││╻ - │ % 11 = ( Base. bitcast ) ( UInt64, % 10 ) ::UInt64│╻ rem │ % 12 = ( Base. ifelse ) ( true , 64 , 0 ) :: Int64 │││╻ << │ % 13 = ( Base. ctlz_int ) ( % 11 ) ::UInt64 │││││││╻ leading_zeros │ % 14 = ( Core. lshr_int ) ( % 13, 63 ) ::UInt64│││││╻╷╷ Type │ % 15 = ( Core. trunc_int ) ( Core. UInt8 , % 14 ) ::UInt8│ toInt64 │ % 16 = ( Core. eq_int ) ( % 15, 0x01 ) :: Bool │││││││││┃│ check_top_bit └─── goto #7 if not %16 │││││││││││ 6 ── invoke Core. throw_inexacterror ( :check_top_bit:: Symbol , UInt64:: Any , % 13::UInt64 ) :: Union { } └─── $ ( Expr ( :unreachable ) ) :: Union { } ││││││││││ 7 ── goto #8 │││││││││││ 8 ── % 21 = ( Core. bitcast ) ( Core. Int64 , % 13 ) :: Int64 │ └─── goto #9 ││││││││││ 9 ── goto #10 │││││││││ 10 ─ goto #11 ││││││││ 11 ─ % 25 = ( Base. sub_int ) ( % 12, % 21 ) :: Int64 │││││╻ - │ % 26 = ( Base. bitcast ) ( UInt64, % 25 ) ::UInt64│╻ rem │ % 27 = ( Base. shl_int ) ( 0x0000000000000001, % 26 ) ::UInt64 << │ % 28 = ( Base. sub_int ) ( % 27, 0x0000000000000001 ) ::UInt64 - │ % 29 = % new ( SamplerRangeFast { UInt64, Int64 } , 1 , % 26, % 11, % 28 ) ::SamplerRangeFast { UInt64, Int64 } └─── goto #12 │││││││ 12 ─ goto #13 ││││││ 13 ─ goto #14 │││││ 14 ─ goto #15 ││││ 15 ─ % 34 = invoke Random. rand ( % 4::Random. MersenneTwister , % 29::Random. SamplerRangeFast { UInt64, Int64 } ) :: Int64 └─── goto #16 │││ 16 ─ goto #17 ││ 6 17 ─ % 37 = ( Person < : Main. AbstractStudent ) ::Const ( false , false ) └─── goto #19 if not %37 │ 18 ─ nothing::Nothing │ 9 19 ─ return % 34 │ ) = > Int64

we do not get a multiplication by 2 (notice that the mentioned line is gone!). This is all in the compiled-code, so this means that in one case the *2 simply doesn't exist at runtime, not even a check for whether to do it.

The key thing to see from the typed code is that the "branches" (the if statements) all compiled away. Since types are known at compile time (remember, functions specialize on types), the dispatch of likes_music is known at compile-time. But this means, since the result is directly inferred from the dispatch, the boolean value true/false is known at compile time. This means that the compiler can directly infer the answer to all of these checks, and will use this information to skip them at runtime.

This is the distinction between compile-time information and runtime information. At compile-time, what is known is:

The types of the inputs Any types which can be inferred from the input types (via type-stability) The function dispatches that will be internally called (from types which have been inferred)

Note that what cannot be inferred by the compiler is the information in fields. Information in fields is strictly runtime information. This is easy to see since there is no way for the compiler to know that person's name was "Miguel": that's ephemeral and part of the type instance we just created.

Thus by putting our information into our functions and dispatches, we are actually giving the compiler more information to perform more optimizations. Therefore using this "action-based design", we are actually giving the compiler leeway to perform many extra optimizations on our code as long as we define our interfaces by the actions that are used. Of course, at the "very bottom" our algorithms have to use the fields of the types, but the full interface can then be built up using a simple set of functions which in many cases will replace runtime data with constants.

Traits and THTT

What we just saw is a "trait". Traits are compile-time designations about types which are distinct from their abstract hierarchy. likes_music is a trait which designates which people like music, and it could in cases not be defined using the abstract types. For example, we can, using dispatch, create a WeirdStudent which does not like music, and that will still be compile-time information which is fully optimized. This means that these small functions which have constant return values allow for compile-time inheritance of behavior, and these traits don't have to be tied to abstract types (all of our examples were on AbstractPerson, but we could've said a GPUArray likes music if we felt like it!). Traits are multiple-inheritance for type systems.

Traits can be more refined than just true/false. This can be done by having the return be a type itself. For example, we can create music genre types:

abstract MusicGenres abstract RockGenre < : MusicGenres immutable ClassicRock < : RockGenre end immutable AltRock < : RockGenre end immutable Classical < : MusicGenres end

These "simple types" are known as singleton types. This means that we can have traits like:

favorite_genre ( x::AbstractPerson ) = ClassicRock ( ) favorite_genre ( x::MusicStudent ) = Classical ( ) favorite_genre ( x::AbstractTeacher ) = AltRock ( )

This gives us all of the tools we need to compile the most efficient code, and structure our code around types/actions/dispatch to get high performance out. The last thing we need is syntactic sugar. Since traits are compile-time information, the compiler could in theory dispatch on them. While this is currently not part of Julia, it's scheduled to be part of a future version of Julia (2.0?). The design for this (since Julia is written in Julia!) is known as the Tim Holy Trait Trick (THTT), named after its inventor. It's described in detail on this page. But in the end, macros can make this easier. A package which implements trait-dispatch is SimpleTraits.jl, which allows you to dispatch on a trait IsNice like:

@traitfn ft ( x::::IsNice ) = "Very nice!" @traitfn ft ( x:::: ( ! IsNice ) ) = "Not so nice!"

Composition vs Inheritance

The last remark that is needed is a discussion of composition vs inheritance. While the previous discussions have all explained why "information not in fields" makes structural relations compile-time information and increases the efficiency. However, there are cases where we want to share runtime structure. Thus the great debate of composition vs inheritance, comes up.

Composition vs inheritance isn't a Julia issue, it's a long debate in object-oriented programming. The idea is that, inheritance is inherently (pun-intended) inflexible. It forces an "is a" relation: A inherits from B means A is a B, and adds a few things. It copies behavior from something defined elsewere. This is a recipe for havoc. Here's a few links which discuss this in more detail:

https://softwareengineering.stackexchange.com/questions/134097/why-should-i-prefer-composition-over-inheritance

https://en.wikipedia.org/wiki/Composition_over_inheritance

https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose

So if possible, give composition a try. Say you have MyType, and it has some function f defined on it. This means that you can extend MyType by making it a field in another type:

struct MyType2 mt::MyType ... # Other stuff end f ( mt2::MyType2 ) = f ( mt2. mt )

The pro here is that it's explicit: you've made the choice for each extension. The con is that this can require some extra code, though this can be automated by metaprogramming.

What if you really really really want inheritance of fields? There are solutions via metaprogramming. One simple solution is the @def macro.

macro def ( name, definition ) return quote macro $ ( esc ( name ) ) ( ) esc ( $ ( Expr ( : quote , definition ) ) ) end end end

This macro is very simple. What it does is compile-time copy/paste. For example:

@def give_it_a_name begin a = 2 println ( a ) end

defines a macro @give_it_a_name that will paste in those two lines of code wherever it is used. As another example, the reused fields of Optim.jl's solvers could be put into an @def:

@def add_generic_fields begin method_string:: String n:: Int64 x:: Array { T } f_x::T f_calls:: Int64 g_calls:: Int64 h_calls:: Int64 end

and those fields can be copied around with

type LBFGSState { T } @add_generic_fields x_previous:: Array { T } g:: Array { T } g_previous:: Array { T } rho:: Array { T } # ... more fields ... end

Because @def works at compile-time, there is no cost associated with this. Similar metaprogramming can be used to build an "inheritance feature" for Julia. One package which does this is ConcreteAbstractions.jl which allows you to add fields to abstract types and make the child types inherit the fields (NOTE: This package is v0.6 only!):

# The abstract type @ base struct AbstractFoo { T } a b:: Int c::T d:: Vector { T } end # Inheritance @extend struct Foo < : AbstractFoo e::T end

where the @extend macro generates the type-definition:

struct Foo { T } < : AbstractFoo a b:: Int c::T d:: Vector { T } e::T end

But it's just a package? Well, that's the beauty of Julia. Most of Julia is written in Julia, and Julia code is first class and performant (here, this is all at compile-time, so again runtime is not affected at all). Honestly, if something ever gets added to Julia's Base library for this, it will likely look very similar, and the only real difference to the user will be that the compiler will directly recognize the keywords, meaning you would use base and extend instead of @base and @extend. So if you have something that really really really needs inheritance, go for it: there's no downsides to using a package + macro for this. But you should really try other means to reduce the runtime information and build a more performant and more Julian architecture first.

Conclusion

Programming for type systems has a different architecture than object-oriented systems. Instead of being oriented around the objects and their fields, type-dispatch systems are oriented around the actions of types. Using traits, multiple inheritance behavior can be given. Using this structure, the compiler can have maximal information, and use this to optimize the code. But also, this directly generalizes the vast majority of the code to not be "implementation-dependent", allowing for duck-typed code to be fully performant, with all of the details handled by dispatch/traits/abstract types. The end result is flexible, generic, and high performance code.

Edits 2/15/2018

Updated the type definitions to use "struct" instead of type, and updated the @ode_def macro to escape the name. Both of these are changes transitioning this post from Julia v0.5 to v0.6.

Edits 10/6/2018

Edited to be v1.0 syntax.

This post is open to read and review on The Winnower.