The Internet of Things is upon us, and being able to efficiently communicate with those things is going to become more and more difficult as more things get connected. I’ve been working as a consultant for Nexia Home Intelligence for the last year, and they were kind enough to let me blog a bit about what it is we’re doing to help them handle the large number of connected devices involved in running a hosted home automation solution.

TL;DR – Elixir/Erlang are Awesome

After trying out several Ruby-based solutions to the issues we faced handling the crush of 50k internet-connected devices, including EventMachine and Celluloid, running on MRI and jRuby, we spiked a solution on Elixir, which runs on the Erlang virtual machine. Leveraging the Erlang VM and out-of-the-box OTP frameworks, along with a few more advanced tricks, we’ve built a system that can easily handle our target workload.

Connections, Connections, Connections

There are lots of “things” in the world, and with IPv6 slowly gaining traction, many more of them will have an IP address (or at least internet connectivity of some kind). In many cases, the communication with these devices may be mostly one-way, where the device simply reports information back to a central server somewhere. In those cases, a simple HTTP-based API may well be able to handle the communications needs of your devices, and normal scaling suggestions for web servers will probably serve you just fine. However, there are certain classes of devices (locks, cameras, and thermostats, for example), that require bidirectional, near-real-time communication, and it is this kind of device that we’re dealing with.

In the Beginning…

Our original implementation leveraged Ruby’s EventMachine, which is an evented I/O system for Ruby (similar to Node.js, if that helps). Devices would connect directly to the Ruby TCP/IP server via SSL, and communicate with our back-end Rails-based system via Resque. Outbound data from the Rails app was sent to RabbitMQ, with a header identifying the specific device to which the message was to be directed. The device servers all subscribed to the RabbitMQ exchange, and whichever one held the connection to the device would send out the results. This system held up well under stead-state load in the 3-6000 connection range. However, there were some issues…

The Stampede

There are several kinds of problems you’ll run into when you try to build a system that can handle 10s to 100s of thousands of simultaneous TCP/IP connections. The first one is “how do I handle all of these things connecting in rapid succession.” This was, in fact, the original impetus for investigating splitting the TCP/IP handling side of things from the “Business Logic” side. And, because all problems in Computer Science can be solved with an additional level of indirection, we decided to add an additional layer in front of the business logic to simply terminate SSL connections and route raw data to a pool of workers, which would handle the business logic and communication with the back-end. Given we are familiar with EventMachine, it seemed to be a no-brainer to simply split the original implementation in two, with some additional logic to handle things like message ordering now that data flowing to/from the device was transiting via a second hop over RabbitMQ instead of the more direct path from before. So that’s what we did.

Houston, we have a problem…

We quickly found that the new system was unable to keep up with the load of even 5,000 devices, much less our goal of something closer to 50,000 devices on a single machine. Things were not looking good. Using rubyprof, we found (and fixed) several performance-related issues with Ruby’s amqp-gem, which made some significant performance improvements. However, at this point we were CPU-bound with no obvious single bottleneck left to go after to further improve performance. It appeared that the single-threaded nature of EventMachine, along with some additional amqp-gem related performance issues (even when load-balancing against many instances on a multi-core machine), were going to sink this implementation.

A Side-Track Through Celluloid.io

Given our core application was a Ruby on Rails application, we really wanted to stay on Ruby for this solution, so we spent some time spiking a solution on Celluloid.io to see if it’s actor-based, multi-threaded model (and using jRuby as our underlying ruby implementation) would help resolve our issues. Running on jRuby, and using the march-hare gem, which is a lightweight wrapper around the java-based AMQP client, we hoped to be able to hit our targets. And, while I must admit that we abandoned this effort without a significant amount of profiling time, it only got us to about 15k connections before it fell over, mostly due to huge memory consumption (on the order of 30 GB). My understanding (learned much later) is that this may indicate that we did something with Celluloid that we shouldn’t have, but we didn’t have time to continue down this route to try to fix the memory leak at this time.

Elixir to the Rescue

Finally, we felt it was necessary to try something a bit more drastic. Having been introduced to Elixir a few months ago, and understanding that the Erlang runtime was designed in large part for this kind of problem, we approached the team at Nexia with the suggestion that we take a few weeks and spike a solution using Elixir. We chose Elixir over Erlang because it’s syntax was much closer to the Ruby that the developers of the existing Rails application, which should make transitioning this work to their core team easier. So, we started to learn more about Elixir, Erlang, and the OTP framework that promised to help us build a scalable, robust system that could, conceivably, provide 99.9999999% uptime (ok, maybe we’re not that good, but Erlang can be).

A quick note – Elixir is a great language on top of the amazing Erlang runtime, and the OTP framework and libraries really provide most of the features we’re leveraging for this application, so you can mostly replace Elixir with Erlang in this post, and I’ll try to call out Elixir-specific stuff if there is any.

Welcoming the Herd With Open Arms…

or, how do you accept 1000 connections per second on a single machine, with one OS process. Most of the example TCP/IP (or SSL, in our case) server examples you find on the internet generally do something like:

Create a listen socket block on that socket until a client connects (accept) hand off the newly-accepted socket to some other process Goto 1

In Elixir, this would look something like this tail-recursive call that implements our loop above for an SSL-based connection:

defp do_listen(listen_socket) do {:ok, socket} = :ssl.transport_accept(listen_socket) :ok = :ssl.ssl_accept(socket) endpoint = TcpSupervisor.start_endpoint(socket) :ssl.controlling_process(socket, endpoint) :gen_server.cast endpoint, {:start} do_listen(listen_socket) end

Notice that you’ve now single-threaded your application’s ability to accept new connections, which will eventually cause the operating system to simply refuse new connections on your listening port if you can’t keep up with accepting them in a timely manner. There are some things you can tweak to get more time (especially, increasing the listen backlog for your service to allow more pending connections), but eventually you’re going to have to do something about that single listener. In Elixir, the answer is to spin up multiple “acceptor” processes, each of which blocks on the same listen port (yes, you can do this!). When a new connection arrives, it will awake the next available waiting process and that process will handle that connection. This pattern has allowed us to get to 1000 connections/second on a single server quite easily (and we haven’t really found out what the upper limit was). The code is obviously a bit more complex. First, we have a supervisor that owns the listen socket, and its children will be the acceptor processes:

defmodule Owsla.TcpListenerSupervisor do use Supervisor.Behaviour def start_link(port, acceptor_count, backlog) do :supervisor.start_link({ :local, :listener_sup}, __MODULE__, [port, acceptor_count, backlog]) end def init([port, acceptor_count, backlog]) do :ssl.start() {:ok, listen_socket} = create_listen_socket(port, backlog) spawn(fn -> Enum.each(1..acceptor_count, fn (_) -> start_listener() end ) end) tree = [ worker(Owsla.TcpAcceptor, [listen_socket], restart: :permanent) ] supervise(tree, strategy: :simple_one_for_one) end def create_listen_socket(port, backlog) do tcp_options = [ :binary, {:packet, :line}, {:reuseaddr, true}, {:active, false}, {:backlog, backlog} ] :gen_tcp.listen(port, tcp_options) end def start_listener() do :supervisor.start_child(:listener_sup, []) end end

Next, we have the acceptors themselves:

defmodule Owsla.TcpAcceptor do use GenServer.Behaviour @ssl_options [{:certfile, "deviceserver.crt"}, {:keyfile, "deviceserver.key"}, {:ciphers, [{:dhe_rsa,:aes_256_cbc,:sha256}, {:dhe_dss,:aes_256_cbc,:sha256}, {:rsa,:aes_256_cbc,:sha256}, {:dhe_rsa,:aes_128_cbc,:sha256}, {:dhe_dss,:aes_128_cbc,:sha256}, {:rsa,:aes_128_cbc,:sha256}, {:dhe_rsa,:aes_256_cbc,:sha}, {:dhe_dss,:aes_256_cbc,:sha}, {:rsa,:aes_256_cbc,:sha}, {:dhe_rsa,:'3des_ede_cbc',:sha}, {:dhe_dss,:'3des_ede_cbc',:sha}, {:rsa,:'3des_ede_cbc',:sha}, {:dhe_rsa,:aes_128_cbc,:sha}, {:dhe_dss,:aes_128_cbc,:sha}, {:rsa,:aes_128_cbc,:sha}, {:rsa,:rc4_128,:sha}, {:rsa,:rc4_128,:md5}, {:dhe_rsa,:des_cbc,:sha}, {:rsa,:des_cbc,:sha} ]}] def start_link(listen_socket) do :gen_server.start_link(__MODULE__, listen_socket, []) end def init(listen_socket) do # Setting the process priority to high /seems/ to improve performance of # incoming connection rate, but it also /seems/ to slow down processing # of messages. For now, we punt and leave the priority at the default setting. #Process.flag(:priority, :high) :gen_server.cast self, {:listen} {:ok, listen_socket } end def handle_cast( {:listen}, listen_socket) do do_listen(listen_socket) end defp do_listen(listen_socket) do case :gen_tcp.accept(listen_socket) do {:ok, socket} -> case :ssl.ssl_accept(socket, @ssl_options) do {:ok, ssl_socket} -> endpoint = Owsla.TcpSupervisor.start_endpoint(ssl_socket) :ssl.controlling_process(ssl_socket, endpoint) :gen_server.cast endpoint, {:start} do_listen(listen_socket) {:error, :closed} -> do_listen(listen_socket) end {:error, :closed} -> do_listen(listen_socket) {:error, _} -> { :stop, :error, [] } end end end

Lines 48-51 above

start up a new process to handle the individual TCP/IP connection (the TcpSupervisor.start_endpoint call) Transfers control of the SSL connection to that process (this is an Erlang thing) Starts up the endpoint listening for messages on its connection and then, just like before, does a tail-recursive call to listen again.

However, now we have 1000 of these running at a time, with a TCP/IP backlog of 2000 connections, and we have no issue handling 1000 connections/second. Note that we also haven’t tweaked those numbers at all – this was our first guess, and it “just worked” so we left it alone (but configurable). It’s possible these are non-optimal, YMMV, IANAL, etc.

From there, the individual endpoint processes forward messages across RabbitMQ to our back-end systems. There were some additional challenges there, which I’ll talk about in another post.