So, in general, there are three levels of code organization in Elixir project:

“Service level” — the most obvious way to split the complex system into separate Elixir applications ( datasets , models , utils ).

, , ). “Context level” — breaks responsibility inside particular service by implementing “context modules” ( Datasets.Fetchers , Datasets.Collections ).

, ). “Implementation level” — particular modules that define data-structures and functions ( Datasets.Fetchers.Aws , Datasets.Fetchers.Kaggle )

Umbrella project pros and cons

As mentioned above the main advantage of using “umbrella project” is that you have all the code in one place and can run it together in the development and test environment. You may play around with the whole system and, most important, write integration tests that will test components altogether. This is very important at the early stage of project development!

At the same time, you project is already split into relatively independent parts and ready for scaling.

Compare this with an approach in many other programming languages where you usually start from monolith project and then try to extract some parts to separate application. Because starting from micro-service approach tremendously complicates the development process.

But it’s time to start worrying about encapsulation!

You may have noticed that idea with including all the apps into main application dependencies is not so good. And you are right!

Elixir language doesn’t have enough constructions for proper encapsulation. There are only modules and functions (public and private). If you add another project as a dependency all the modules will be available for you, so you can call any public function. And a naive implementation of Zillow data fitting in the main application will look like:

defmodule Main.Zillow do

def rf_fit do

Datasets.Fetchers.zillow_data

|> Utils.PreProcessing.normalize_data

|> Models.Rf.fit_model

end

end

Where Datasets.Fetchers , Utils.PreProcessing and Models.Rf are modules from different applications. This freedom of thoughtless using of modules from another application will couple your services and turn the system back into a monolith!

So, there are two sides. We still want to have all the parts of the project to be accessible during development and test. But we need somehow forbid cross-application coupling.

The only way to do so is creating conventions about which functions from one application may be used in another one. And the best way is extracting all “public” functions into separate “interfaces” modules.

Interface modules

Interfaces

The idea is to move all the “public” application’s functions (functions that can be called by other applications) into separate modules. For example, datasets application has special “interface” module for Fetchers ’ functions:

defmodule Datasets.Interfaces.Fetchers do

alias Datasets.Fetchers



defdelegate zillow_data, to: Fetchers

defdelegate landsat_data, to: Fetchers

end

In this simple implementation, the interface module just delegates function calls to the corresponding module. But, in the future, when we’ve decided to extract run datasets application on another node, this module will have the main part of communication logic.

Doing so with other application we can rewrite Main.Zillow module:

def rf_fit do

Datasets.Interfaces.Fetchers.zillow_data

|> Utils.Interfaces.PreProcessing.normalize_data

|> Models.Interfaces.Rf.fit_model

end

Generally speaking, the convention is: if you want call a function from another application you must do this through “interface” module.

This approach still allows easy development and testing but creates set of simple rules which protect the code from tight coupling and creates a basis for future scaling!

Scale to distributed system

Interface applications

Imagine that data processing become time-consuming so we decide to run models on a separate node. So we need to remove {:models, in_umbrella: true} dependency and run that application on another node.

If you run Elixir console ( iex -S mix ) from the main application folder you won’t have access to models application modules anymore:

iex(1)> Models.Interfaces.Rf.fit_model(“data”)

** (UndefinedFunctionError) function Models.Interfaces.Rf.fit_model/1 is undefined (module Models.Interfaces.Rf is not available)

The code of models application is still inside umbrella project but it is not run with the main application so is not accessible. The models modules and functions exist only on another node which runs this application only.

But, you know, BEAM VM designed for the distributed applications, so there are many ways to access the code run on an another machine.

:rpc

It is easy to run a function on remote node using Erlang :rpc module. :rpc uses Erlang Distribution Protocol for the communication between nodes.

One may reproduce simple experiment: run the main project with --sname main option in one terminal tab

iex --sname main -S mix

and models project in another tab:

iex --sname models -S mix

Now you can run calculations:

iex(main@ip-192–168–1–150)1> :rpc.call(:”models@ip-192–168–1–150", Models.Interfaces.Rf, :fit_model, [“data”]) %{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}

So what changes we need to make in our project to utilize this approach?

The idea is very simple, we need to add one more application to our project which implements communication logic — models_interface .

models_interface/

config/

lib/

models_interface/

models_interface.ex

lm.ex

rf.ex

mix.ex

This is a very thin layer that helps main to access the Models.Interface functions. There a couple of small modules that just duplicate functions from Interfaces modules:

defmodule ModelsInterface.Rf do

def fit_model(data) do

ModelsInterface.remote_call(Models.Interfaces.Rf, :fit_model, [data])

end

end

This module just calls Models.Interfaces.Rf.fit_model/1 function. The implementation of remote_call is in ModelsInterface module:

defmodule ModelsInterface do

def remote_call(module, fun, args, env \\ Mix.env) do

do_remote_call({module, fun, args}, env)

end



def remote_node do

Application.get_env(:models_interface, :node)

end



defp do_remote_call({module, fun, args}, :test) do

apply(module, fun, args)

end



defp do_remote_call({module, fun, args}, _) do

:rpc.call(remote_node(), module, fun, args)

end

end

The module gets node location from the configuration and does remote procedure call. You might see environment specific implementation of do_remote_call , this allows to simplify testing process, we will discuss this later.

The next quick refactoring: just replace Models.Interfaces with ModelsInterface and we are done! Just don’t forget add models_interface application to the dependencies of main application.

defp deps do

[

{:datasets, in_umbrella: true},

{:models, in_umbrella: true, only: [:test]},

{:models_interface, in_umbrella: true},

{:utils, in_umbrella: true},

{:espec, "1.4.6", only: :test}

]

end

Again, I left models dependency, but only in test environment. This allows making a direct calls to the application in test environment.

That’s it. No we are able to access models via iex console:

iex(main@ip-192–168–1–150)1> ModelsInterface.Rf.fit_model(“data”) %{__struct__: Models.Rf.Coefficient, a: 1, b: 2, data: “data”}

Let’s summarize! The only change we did is a new simple interfacing application. We still have all the code in one place and we still have all the tests passed!

Distributed tasks

Direct remote procedure calls are useful if you need a simple synchronous interface with another application. But if you want to effectively run asynchronous code on the remote node you’d better choose Distributed tasks.

Elixir has a specific Task.Supervisor which can be used to dynamically supervise tasks. This supervisor will start inside the remote application and supervise tasks that execute code. Let’s use Distributed tasks for accessing datasets application!

First of all, we need to add Task.Supervisor to children of datasets application supervisor:

defmodule Datasets.Application do

@moduledoc false



use Application

import Supervisor.Spec



def start(_type, _args) do

children = [

supervisor(Task.Supervisor,

[[name: Datasets.Task.Supervisor]],

[restart: :temporary, shutdown: 10000])

]



opts = [strategy: :one_for_one, name: Datasets.Supervisor]

Supervisor.start_link(children, opts)

end

end

The DatasetsInterface module (which is the separate interfacing application):

defmodule DatasetsInterface do

def spawn_task(module, fun, args, env \\ Mix.env) do

do_spawn_task({module, fun, args}, env)

end



defp do_spawn_task({module, fun, args}, :test) do

apply(module, fun, args)

end



defp do_spawn_task({module, fun, args}, _) do

Task.Supervisor.async(remote_supervisor(), module, fun, args)

|> Task.await

end



defp remote_supervisor do

{

Application.get_env(:datasets_interface, :task_supervisor),

Application.get_env(:datasets_interface, :node)

}

end

end

So we use async/await pattern here. The difference is: tasks are spawned on the remote node and are supervised by remote supervisor. The name and location of the supervisor are set in the configuration file:

config :datasets_interface,

task_supervisor: Datasets.Task.Supervisor,

node: :"models@ip-192-168-1-150"

And, again, there is the same trick with test environment!

Other protocols

RPC and Distributed tasks are built-in Erlang/Elixir abstractions that allow communicate using Elixir term without any additional serialization and deserialization. But if need to communicate with applications that are not written in Elixir you need more common approach such as HTTP protocol.

As an example, let’s implement simple HTTP interface for our utils application. Again, the first thing we need is a new utils_interface application:

UtilsInterface module has the similar structure with ModelsInterface but the do_remote_call/2 looks like:

defp do_remote_call({module, fun, args}, _) do

{:ok, resp} = HTTPoison.post(remote_url(),

serialize({module, fun, args}))

deserialize(resp.body)

end

For this example I’ve used simple Erlang term_to_binary and binary_to_term serialization:

defp serialize(term), do: :erlang.term_to_binary(term)

defp deserialize(data), do: :erlang.binary_to_term(data)

The utils project needs HTTP server to listen to external requests. I’ve used cowboy with plug for this

defp deps do

[

{:cowboy, "~> 1.0.0"},

{:plug, "~> 1.0"},

{:espec, "1.4.6", only: :test}

]

end

The plug module which is responsible for handling requests:

defmodule Utils.Interfaces.Plug do

use Plug.Router



plug :match

plug :dispatch



post "/remote" do

{:ok, body, conn} = Plug.Conn.read_body(conn)

{module, fun, args} = deserialize(body)

result = apply(module, fun, args)

send_resp(conn, 200, serialize(result))

end

end

It just deserializes {module, fun, args} tuple, does function call and sends a result back to the client.

And, don’t forget to start the “plug” via cowboy server in utils application

children = [

Plug.Adapters.Cowboy.child_spec(:http,

Utils.Interfaces.Plug, [], [port: 4001])

]

Please note, that it is not a good practice to call functions directly from deserialized data. I did it only to simplify the example. In the real world, you need more sophisticated approach!

Limiting concurrency with poolboy

The last feature I wanna describe in the post allows you to protect your application and its resources from “overflowing”. Imagine, for example, that models application use quite a lot of memory for model fitting. So we want to limit the number of clients that want to access models application. To do this we will create a limited pool of worker processes on the interface level using the poolboy library.

poolboy needs to be started byapplication supervisor:

defmodule Models.Application do

use Application



def start(_type, _args) do

pool_options = [

name: {:local, Models.Interface},

worker_module: Models.Interfaces.Worker,

size: 5, max_overflow: 5]



children = [

:poolboy.child_spec(Models.Interface, pool_options, []),

]



opts = [strategy: :one_for_one, name: Models.Supervisor]

Supervisor.start_link(children, opts)

end

end

You may see poolboy options here: name of supervisor, worker module, size of a pool, and max_overflow.

The worker module is a simple GenServer which just calls corresponding function:

defmodule Models.Interfaces.Worker do

use GenServer



def start_link(_opts) do

GenServer.start_link(__MODULE__, :ok, [])

end



def init(:ok), do: {:ok, %{}}



def handle_call({module, fun, args}, _from, state) do

result = apply(module, fun, args)

{:reply, result, state}

end

end

And the last change is in Models.Interfaces.Rf module. Instead of function delegation, it will spawn worker process inside pool:

defmodule Models.Interfaces.Rf do

def fit_model(data) do

with_poolboy({Models.Rf, :fit_model, [data]})

end



def with_poolboy(args) do

worker = :poolboy.checkout(Models.Interface)

result = GenServer.call(worker, args, :infinity)

:poolboy.checkin(Models.Interface, worker)

result

end

end

That’s it! Now you are absolutely sure that models application can handle the only limited number of requests.

Conclusion

As a conclusion I wanna give you some recommendations:

Start with microservices from the very beginning. It is very easy to do with Elixir umbrella project.

Use “context” and “implementation” modules to organize logic inside an application.

Think carefully about application’s interfaces. Do not allow direct calls to implementation functions between applications.

When scaling to distributed system, place “communication” logic into the separate application. Use Erlang Distribution Protocol for communication between BEAM applications

I hope, approaches and abstractions described in the article will help you to write better code with Elixir!

Hit the 👏 if you enjoyed the article and do not hesitate to contact me if you have questions or proposals!

Have a wonderful week,

Anton