Through its roots in Erlang, Elixir’s robustness and reliability are often mentioned as its greatest advantages. The ultimate example of this being its ability to upgrade an application without having to restart it.

Being one of Erlang’s most amazing features, hot code reloading is sometimes compared to replacing the wheels on a driving car and it’s majestically demonstrated with phone calls and hot bug fixes in Erlang: The Movie. But, how does it work under the hood?

In this edition of Elixir Alchemy, we’ll dive into hot code reloading to see how Erlang’s code server handles seamless code upgrades in Elixir. To understand how all this works, we’ll start at the module level and work our way up. Let’s get started!

Upgrading Modules

The first part of the magic of hot code reloading is Erlang’s code server’s ability to run multiple versions of a module simultaneously. It allows existing processes to run to completion without having to be restarted or having their running code changed.

To illustrate this, let’s look at an example of a module named Counter . As the name implies, it counts up from 0 using a recursive function that sleeps for a second, prints the current number and calls itself with the number incremented by 1.

1 2 3 4 5 6 7 defmodule Counter do def count ( n ) do :timer . sleep ( 1000 ) IO . puts ( " - #{ inspect ( self ()) } : #{ n } " ) count ( n + 1 ) end end

After starting IEx ( $ iex -S mix ), we spawn a process to start the counter loop in a separate process. We pass the module ( Counter ), function name ( :count ), and the arguments ( [0] ) to the spawn/3 function.

1 2 3 4 5 6 7 8 iex(1)> spawn(Counter, :count, [0]) #PID<0.107.0> - #PID<0.107.0>: 0 - #PID<0.107.0>: 1 - #PID<0.107.0>: 2 - #PID<0.107.0>: 3 … iex(2)>

While keeping the counter running, we update the counter module to increment the number by 2 instead of 1 in lib/counter.ex . After that’s done and the file is saved, we recompile the module in the IEx session.

1 2 3 4 5 6 7 8 9 10 … - #PID<0.107.0>: 2 - #PID<0.107.0>: 3 iex(2)> r Counter {:reloaded, Counter, [Counter]} - #PID<0.107.0>: 4 - #PID<0.107.0>: 5 - #PID<0.107.0>: 6 … iex(3)>

The module gets recompiled, but the existing counter still increments by one, meaning the old version of the code is still running in this process. If we spawn a new process that runs a counter, it will be incremented by two, thus it will be running the new version.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 … - #PID<0.107.0>: 4 - #PID<0.107.0>: 5 - #PID<0.107.0>: 6 iex(3)> spawn(Counter, :count, [0]) #PID<0.114.0> - #PID<0.107.0>: 7 - #PID<0.114.0>: 0 - #PID<0.107.0>: 8 - #PID<0.114.0>: 2 - #PID<0.107.0>: 9 - #PID<0.114.0>: 4 … iex(5)>

This example shows Erlang’s code server in action. By keeping the old version of the module present, the first process ( #PID<0.107.0> ) continues running as it did before, but newly spawned processes ( #PID<0.114.0> ) automatically use the new version.

The Erlang Code Server

Erlang’s code server handles loading compiled code in a running system. At any one time, the code server can keep two versions of a module in memory. When a module is loaded, it becomes the current version of that module. If a previous version of that module was already present, it’s marked old.

Both current and old versions of a module can run at the same time, but the exported functions from the old version are replaced by the ones from the new version. This ensures that every external function call always calls functions on the current version of the module.

If a process is already running when a new version of a module is loaded, it will linger on the old version, and all of its local function calls will be handled by the module’s old version.

Hot Reloading GenServers

Let’s take this a step further by turning our example into a GenServer. Like the Counter module in the previous example, the CountServer counts up by incrementing its state every second.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 defmodule CountServer do use GenServer def start_link do GenServer . start_link ( __MODULE__ , 0 ) end def init ( state ) do Process . send_after ( self (), :increment , 1000 ) { :ok , state } end def handle_info ( :increment , n ) do incremented = n + 1 IO . puts ( " - #{ inspect ( self ()) } : #{ incremented } " ) Process . send_after ( self (), :increment , 1000 ) { :noreply , incremented } end end

Since the CountServer is a GenServer, we don’t need to start it in a separate process manually. Instead, we call CountServer.start_link/0 in a new IEx session to start the counter.

1 2 3 4 5 6 7 iex(1)> {:ok, pid} = CountServer.start_link() {:ok, #PID<0.130.0>} - #PID<0.130.0>: 1 - #PID<0.130.0>: 2 - #PID<0.130.0>: 3 … iex(2)>

Let’s try updating it like we did in the last example. We update the CountServer to increment by 2 instead of one. Then, in the running IEx session, we recompile the module.

1 2 3 4 5 6 7 8 9 10 … - #PID<0.130.0>: 2 - #PID<0.130.0>: 3 iex(2)> r CountServer {:reloaded, CountServer, [CountServer]} - #PID<0.130.0>: 5 - #PID<0.130.0>: 7 - #PID<0.130.0>: 9 … iex(3)>

This time, the running GenServer did update. After recompiling the module, the counter started incrementing by 2 instead of one without having to be restarted or starting a new counter.

Local and External Function Calls

The first example had a recursive function spawned in a process, while the second had a GenServer which spawned a process to keep its state.

As we learned while deconstructing GenServers, the GenServer’s module and its spawned state are run in separate processes. In the second example, the state, which was kept in the GenServer process, was updated by calling out to the CountServer module.

This difference is important for code reloading. Local function calls, like the first example, in which a module calls its own function, are executed on the old version of the module. External function calls, like the GenServer process calling out to the CountServer module, are always done on the current version of the module.

This explains why the first example didn’t reload the existing module, while the second one immediately did when the new module was loaded.

Transforming State

Although the state in the GenSever example got transformed correctly by the reloaded version of the CountServer module, there’s one more scenario to look at. What happens when the new version of the implementation requires a different state?

As an example, let’s say we need our CountServer to only produce even numbers from now onward. Our current implementation increments by 2 every second, so most of the work is already done.

However, if we have a process running that still increments by 1, we run the risk of upgrading at the wrong second, causing it to produce odd numbers instead. We need to make sure to update the state when we upgrade the module to the new version.

Elixir’s GenServer module has a callback named code_change/3 , that is used for updating the state in the event of a code change.

1 2 3 4 5 6 7 8 9 10 11 12 13 defmodule CountServer do use GenServer # ... def code_change ( _old_vsn , state , _extra ) when rem ( state , 2 ) == 1 do { :ok , state - 1 } end def code_change ( _old_vsn , state , _extra ) do { :ok , state } end end

This example implements the code_change/3 callback. If the state is an odd number, it will subtract 1 from the current state, making it an even number.

The _old_vsn argument holds the module’s old version that we’re upgrading from. It can be used to upgrade from a specific version, and the _extra argument can be used for extra arguments while upgrading. For brevity, both of these are ignored here.

To invoke the code_change/3 callback, we have to explicitly change the code for a process. To do that, we temporarily suspend the process, run the code change and resume it again.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … - #PID<0.130.0>: 7 - #PID<0.130.0>: 9 iex(3)> :sys.suspend(pid) :ok iex(4)> r CountServer {:reloaded, CountServer, [CountServer]} iex(5)> :sys.change_code(pid, CountServer, nil, []) :ok iex(6)> :sys.resume(pid) :ok - #PID<0.130.0>: 10 - #PID<0.130.0>: 12 - #PID<0.130.0>: 14 … iex(3)>

NOTE: When releasing code, this is done automatically for each module in your app. There’s no need to explicitly call the change_code/4 function outside of IEx.

Backward Compatibility

Having external function calls executed on the new version of the module allows for gradual switching over to new versions of the modules in your app. Forcing processes that are lingering on old versions of modules to call out to the current version when doing external function calls ensures that the lingering processes don’t spawn even more lingering processes. However, mixing old and new code in a running system can cause problems when the new code is not backward compatible with the old modules.

Let’s improve our counter a bit. Instead of hard coding the added value, we’ll allow it to be passed as an argument.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 defmodule CountServer do use GenServer def start_link do GenServer . start_link ( __MODULE__ , 1 ) end def init ( state ) do send ( self (), { :increment , 1 }) { :ok , state } end def handle_info ({ :increment , value }, state ) do new_state = state + value IO . puts ( new_state ) Process . send_after ( self (), { :increment , 1 }, 1000 ) { :noreply , new_state } end # ... end

This example allows passing a value to increment by. We’ve also made sure to update both messages in init/1 and handle_info/2 to make sure they use the new format.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 - #PID<0.130.0>: 12 - #PID<0.130.0>: 14 iex(2)> r CountServer {:reloaded, CountServer, [CountServer]} iex(3)> 15:09:01.313 [error] GenServer #PID<0.130.0> terminating ** (FunctionClauseError) no function clause matching in CountServer.handle_info/2 (odd) lib/count_server.ex:13: CountServer.handle_info(:increment, 14) (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4 (stdlib) gen_server.erl:711: :gen_server.handle_msg/6 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 Last message: :increment State: 14 ** (EXIT from #PID<0.128.0>) shell process exited with reason: an exception was raised: ** (FunctionClauseError) no function clause matching in CountServer.handle_info/2 (odd) lib/count_server.ex:13: CountServer.handle_info(:increment, 14) (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4 (stdlib) gen_server.erl:711: :gen_server.handle_msg/6 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 Interactive Elixir (1.7.2) - press Ctrl+C to exit (type h() ENTER for help) iex(1)>

This example results in a FunctionClauseError , because the old version of the code passes :increment as a message. Because we removed the clause that accepted it, the CountServer crashes, as there’s no handle_info/2 to handle this case.

Instead, we should have kept a clause that accepts the :increment message so as to provide backward compatibility for the previous version to do a clean upgrade.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 defmodule CountServer do use GenServer # ... def handle_info ( :increment , n ) do handle_info ({ :increment , 2 }, n ) end def handle_info ({ :increment , value }, n ) do incremented = n + value IO . puts ( " - #{ inspect ( self ()) } : #{ incremented } " ) Process . send_after ( self (), { :increment , 2 }, 1000 ) { :noreply , incremented } end # ... end

A First Look at Code Reloading

The logic required for hot code reloading is often already present but abstracted away. For instance, GenServers implement the code_change/3 callback with a stub that returns the unchanged state.

In this episode, we made observations from the perspective of the module being upgraded. In a next episode, we’ll look at upgrading whole applications, production releases and Phoenix applications. Subscribe to Elixir Alchemy to get the next episode delivered straight to your inbox.