Elixir in Action is a new release from Manning that aims to introduce readers to Elixir and the Erlang virtual machine while also discussing concurrent programming topics, fault-tolerance, and topics related to high-availability. InfoQ has interviewed Saša Jurić, the book’s author.

Elixir in Action was born out of Jurić’s experience with Erlang, which led him to create a blog devoted to showcasing Erlang’s advantages to programmers coming from an OO background. Jurić later moved to Elixir, a functional, concurrent programming language that aims to provide a friendlier syntax, in comparison to Prolog-inspired Erlang syntax, and advanced abstractions.

The book takes a gradual approach which goes from introducing Elixir’s syntax and fundamental features such as macros, pattern matching, modules, polymorphism, etc. all the way to building a fault-tolerant, highly available, concurrent, distributed system. A great deal of coverage is given to the Erlang platform, addressed in the central chapters of the book, covering topics such as working with processes, persistence, and several approaches to runtime error handling through supervision trees along with the “let it crash” philosophy.

Overall, Elixir in Action provides a solid foundation not only to start working with Elixir and the Erlang VM but also to enter the realm of high-availability systems. InfoQ has talked with Saša Jurić to learn more about his book.

InfoQ: What led you to write this book and how is it different than other books about Elixir?

Saša: The aim of Elixir in Action is to bring otherwise experienced programmers with no previous knowledge about Erlang, Elixir, or functional programming, to the point where they can develop production ready systems. While planning the book, I assumed most readers will come from OO background. For such people, there’s a lot of new material: new language, runtime and ecosystem, plus functional programming and actor-like concurrency. That’s a lot of ground to cover and it can seem overwhelming. I believe that the bootstrap process can be simplified with a focused approach. So instead of writing a full reference book, I decided to focus on the core concepts which will be unusual to most of the target readers: functional programming, concurrency, and the core ideas in the OTP framework. I believe that once readers “grok” these topics, it will be easier for them to pick up the remaining pieces not treated in the book. For example, if you gain enough knowledge on concurrency in Erlang, and understand arguably most important OTP parts (GenServer, Supervisor, Application), then it should be fairly straightforward to learn other abstractions such as Task, Agent, or GenEvent on your own. I think that currently this is the only Elixir book taking such approach. Anyone who wants to leverage Elixir/Erlang to build scalable, fault-tolerant, distributed systems will have to learn the material explained in Elixir in Action. You can of course learn it from other resources, but I think this book is currently the only Elixir oriented book that treats these topics in a single place.

InfoQ: Could you summarize your experience with Elixir? To what extent it helped you reach your goals?

Saša: To me, the most important aspect of Elixir is in fact Erlang VM. This is the foundational stuff where the magic happens, and I’d say this was the main ingredient that helped me when I first reached for Erlang. Some five years ago, we had to implement a long polling based push server that constantly transmits frequently changing data to a couple of thousand of connected users. After some evaluation, we opted for Erlang, and I had great experience. It helped me approach the challenge in a structural way: building the solution was pretty straightforward, even though the development platform was alien to me. The system turned out to be scalable, efficient, and resilient. I had a feeling Erlang has my back, even when I made various mistakes. The system was able to cope with all kinds of unexpected situations, although I didn’t necessarily plan for it.

InfoQ: Which kind of projects is Elixir specifically suited to?

Saša: In my opinion Elixir/Erlang are suitable for any kind of a server-side system: a piece of software that has to run continuously and always provide the service as much as possible. The obvious example for this is a web based system that handles incoming HTTP requests, but also has to perform other activities, such as background and periodical jobs or cache management. In such systems, many activities are pending at any point in time. This is where Erlang’s approach to concurrency makes the life of developers easier. If each distinct activity is powered by at least one distinct Erlang process (not to be confused with OS process), we get scalability, but also improved fault-tolerance: failure of a single process will usually not disturb the significant part of the system. And yet, there are ways of detecting such failures, and recovering from them. I find such approach very intuitive and beneficial in any kind of a back-end system. I’ve seen some opinions that Erlang is suitable only for large-scale systems or for special domains such as telecoms. I disagree with that. Erlang helps in making the system highly-available, and this property is needed in any production system, regardless of the scale or domain. Because, if a system is not highly-available, it is frequently failing. You may not need to reach the mythical nine nines of availability, but you probably want your system to run with as little downtime as possible. This is a difficult challenge and Erlang can help you deal with it.

InfoQ: How would you describe the relationship between Elixir and Erlang?

Saša: I’d say that Elixir expands on top of powerful foundations provided by Erlang and OTP, aiming to improve the productivity of developers. I’ve been doing a lot of full-time Erlang development, and as much as I love it, I find many tasks to be overly tedious, and find myself needlessly wrestling with some lower-level mechanical stuff. This is where Elixir brings a lot of useful features, both in the language (e.g. metaprogramming via macros, and polymorphism via protocols) and in the tooling department (e.g. mix tool for building projects, and hex package manager), allowing us to focus more on the actual problem we’re solving. My personal impression is that it’s much easier to work with Elixir than with pure Erlang. Elixir reduces, possibly even removes the artificial trade-off between scalability and developers productivity. Just because a development platform allows us to build highly concurrent, scalable, fault-tolerant, distributed systems, doesn’t mean it should be hard to use. At the same time, Elixir honors its runtime, and doesn’t depart significantly from the Erlang philosophy. The language is functional and semantically similar to Erlang. Integration with Erlang libraries is seamless so Elixir developers have full access to the entire Erlang ecosystem.

InfoQ: Are there any drawbacks one should take into account when deciding whether using Erlang directly or Elixir?

Saša: None that I can think of. Most of the benefits come from the VM itself, together with the battle-tested OTP framework, and you can get all those benefits in either language. So it really comes down to some other values. Elixir brings some additional features to the table, so it is in fact a more complex language than Erlang. The benefit is then that the code can be more concise, less burdened with boilerplate. In contrast, Erlang is a simpler language, so the code will have to be more involved, but it will also be more explicit. Personally, I think that Elixir strikes a nicer balance of reduced boilerplate vs explicitness. It’s definitely way less magical than e.g. Ruby, while still providing useful features, most notably metaprogramming and polymorphism.

InfoQ: Building a highly available, fault tolerant, concurrent system is a hard task, even more so in view of the CAP theorem. How important is it to use the proper language and runtime environment to reach this goal?

Saša: This really comes down to a personal opinion. People have been implementing large-scale systems in various languages, so it’s definitely possible to do it without Erlang. But for me, the question is not only whether it’s possible, but also how much can a tool help me in the process. After all, the purpose of a tool is to aid us. This is precisely why I value Erlang. I think it provides simple, yet very powerful building blocks to systematically approach the challenge of writing a highly available system. The main tool here is the Erlang process which allows us to divide the work in thousands, or even millions of independent pieces. By using many processes, we get scalability and fault-tolerance. The mechanisms of crash propagation allows us to handle failures: if one piece crashes, someone else can be notified and do something about it. Finally, shared-nothing concurrency makes our system distributed, even if we’re running it on the single machine. Essentially, by dividing the total work over a large number of isolated, totally independent entities (processes), we’re already distributing the work. Of course, clustering the system over multiple machines will still be hard. Distributed systems are hard by their nature. But at least some lower level mechanics are solved for us, and we always use the same primitives for collaboration: processes and message passing. This allows us to focus on inherent challenges, instead of spending energy on lower level details. To summarize, I’d say that Erlang can simplify the challenge of high availability. You can address this challenge without Erlang, but you’ll probably have to work harder to get there.

InfoQ: One of the key ideas in Erlang is using a very light-weight process model. This make context-switching very cheap. On the other hand, on many systems where threads are used to handle scalability, thread context switching is usually the bottleneck and to avoid this a fully asynchronous model with just a small thread pool is sometimes successfully used. Could you elaborate on the advantages that Erlang’s approach provides in comparison with the fully async method?

Saša: I think Erlang gives us a nice and clean abstraction for building highly concurrent systems, and every system that has to constantly perform many different tasks is in its nature concurrent. The Erlang approach maps nicely to such problems. You always use processes for any kind of task, be it I/O or CPU bound, and you can rely that VM will spread the load efficiently. There are fewer gotchas, and less ways of shooting yourself in the foot. I think this reduces the mental load and allows us to focus on the problem at hand. In contrast, if you roll your own thread pooling, you have to worry about many problems yourself. For example, if you’re performing a long computation in a thread, you’ll block all other activities pending on that same thread. If a thread dies due to a single bug, all other activities on that thread will fail. This can of course be resolved, but you may end up investing a lot of work to implement an approximation of what Erlang VM does. The question is then, why not simply rely on a proven solution? Perhaps if raw processing speed and/or memory consumption are extremely important, custom implementation might be beneficial, but in situations I’ve encountered, this wasn’t the case.

InfoQ: Another fundamental tenet of Erlang’s that is preserved in Elixir is “let it crash”, which goes to the extent of routinely killing processes as a way to always ensure that the system is tolerant to such events. How important is this policy to make an Erlang/Elixir system fault-tolerant?

Saša: Erlang was designed with a premise that things will go wrong in production, but the system as the whole should never fail: it should preserve as much of the service as possible, and self-heal from the failure as soon as possible. Let-it-crash plays a central role here. It is a simple technique that allows us to systematically handle the scenario when something goes wrong. In such situation, we simply let the process crash, and rely on the supervisor to fix the problem. Then, the main code of the process can be liberated from error handling constructs, such as try-catch or “if err != nil”. It follows the happy path and we can even use pattern matching to elegantly assert our expectations. I think this approach works better than try-catch-ignore, because when the process terminates its state disappears. And perhaps the cause of the failure lies in the corrupt state. After the restart, the new process starts with the fresh, stable state, so it should work again, at least for some time, until the stat is again corrupted. This keeps the faulty part of the system afloat. It provides most of the service with occasional failures until the cause is fixed. Another part of the equation is recovery via supervisors. If you establish a fine-grained supervision tree, then you can reduce the area which is restarted. When something fails, you try to restart the small part of the system, and if that doesn’t help, you’ll gradually widen the area until you restart the “rotten” part of the system. In contrast, if you just try-catch-ignore, you may end up with a permanently corrupt state, end experience a recurring failure loop.

InfoQ: Is “let it crash” a special Erlang-recipe? Could it be easily ported to other environments not using the Erlang-VM?

Saša: Excellent question! First I’d like to stress that OTP is built in pure Erlang, and relies on the foundational stuff provided by the VM. This is very important, because I sometimes see notions that OTP can somehow be “ported” to other runtimes. I don’t think this is quite possible, unless the target runtime provides some strict guarantees. In particular when it comes to let-it-crash and supervisors, these are the important guarantees baked in the VM: The process state is its own private affair. When a process terminates, it doesn’t leave any garbage behind which might corrupt other processes. When a single process crashes, no one is disturbed. Other processes run without interruption, unless explicitly specified otherwise. Any other process can be notified about the process crash and do something about it. It is possible to unconditionally terminate a process (even if it is in the middle of an intensive CPU computation). External resources (e.g. file handles, sockets) are owned by a process. When a process terminates, the resources it owns are closed. The first two points help us limit effects of failures: when something goes wrong, most of the system still provides the service. The third guarantee allows us to react to a failure. When something crashes, someone else (supervisor) can take corrective measures. The last two points ensure a proper clean up. Without it, we might end up with orphan processes or dangling resources. I don’t think you can have Erlang fault-tolerance without these guarantees. You might get close, but it will never be 100% there, and there will always be some hidden gotchas. That doesn’t mean you need Erlang VM for let-it-crash, only that you need a VM that provides these guarantees.

InfoQ: Can you share any insights you have about how Elixir is currently used in the industry?

Saša: Although Elixir is fairly young language, its foundation (Erlang) is very stable and proven in large and diverse systems for over two decades. The examples include WhatsApp, RabbitMQ, Riak, real-time bidding (AdRoll), financial systems (Klarna), and so on. When it comes to Elixir, I’ve witnessed an increased mentions of it being used in production in various solutions, such as gaming backends or IoT . A curated list of companies using Elixir in production can be found here. It will be interesting to see how this list will grow with time.

Discount code ‘iqelixir’ for 39% off on 'Elixir in Action' -- valid for all editions at manning.com

About the Book Author

Saša Jurić is a developer with extensive experience using Elixir and Erlang in high-volume, concurrent server-side systems.