Introducing Caravan: Distributed Elixir For The Cloud - Part 1

This is the first in a two part series covering how to run a Distributed Elixir cluster in a typical cloud environment. This first part will cover the basics of how distribution works on the BEAM VM and the problems posed by modern deployment scenarios. The second part will walk through using Caravan, a library I wrote that helps smooth out some of the rough edges of using distribution in modern scenarios. So, if you already understand all the challenges of doing Distributed Elixir/Erlang in something like AWS, then skip this and check out part 2.

We’re going back in time to…1986?

An important point to make up front about Elixir is how tightly it is integrated with Erlang and the BEAM VM. In fact, Elixir code translates directly to Erlang at compile time and doesn’t try to depart too far from its general design, so we don’t experience a runtime hit like we might when running a dynamic language on the Java Virtual Machine. With that in mind, here’s a little primer on Erlang history 1

Erlang was developed at Ericsson to develop and then run telephone exchanges back in the 1980’s. In these sorts of systems, it’s important to have high uptime and high availability (don’t drop phone calls, users get dial tones when they pick up their phones). Erlang achieves this by using a shared-nothing memory model, on top of which pre-emptive schedulers manage the running of millions of lightweight processes. For example, you could model each phone call as a process, which communicates with other processes by message passing. Because memory isn’t shared, a single process having a fault rarely cascades to other processes (ie. one dropped call doesn’t cause all calls to drop). Ericsson was able to achieve systems that had less than a minute of downtime a year with this architecture. The Open Telecom Platform (OTP) is a set of libraries and applications that are used to help with process supervision, in memory key-value stores, even static type analysis (dialyzer). Today, Erlang and OTP can be used interchangeably for the most part. The BEAM VM, is a virtual machine that compiles and executes Erlang code, and is mostly synonymous with the Erlang Run Time System (ERTS).

Distributed Erlang

One part of OTP is Distributed Erlang, which allows multiple ERTS’s to communicate with each other. Before I dive into how that works, we need to talk a bit about how Distributed Erlang was used by Ericsson.

So I mentioned above that Erlang was used to develop telephone exchanges. To be more specific however, I should say that Erlang ran on custom, dedicated hardware deployed out in the field (think sheds in the middle of nowhere or conduits running in sewers or basements). For Ericsson, the primary use of Distributed Erlang was enabling hot fail-over to redundant hardware running in the same physical machine, not across server racks or datacenters that many of us think of when we hear “distributed systems”. This explains a lot of the design decisions that make running Distributed Erlang in public clouds a little clunky. 2

What exactly makes it “clunky”? Here’s 4 that I could come up with, based on personal experience:

  1. Node discovery is not dynamic.3 To connect two nodes together, code must be written to make an explicit connection or a config file with a list of hostnames must be provided.
  2. Nodes need to be named with a routable long name if different machines are to be used. I’ve had issues with some ops tooling putting “localhost’ or a private IP here, which can be tricky to debug.
  3. Each node requires a dedicated TCP port for distribution traffic to travel through. By default, a process called the Erlang Port Mapper Daemon (epmd) starts on a well known port whenever an Erlang VM is started, and assigns each VM a port. Epmd maintains a mapping between node name and port that remote nodes query. There’s some limited ways that the port can be “set” for a VM, but I speak from experience when I say that this whole scheme makes system operators very nervous.
  4. Every node in the cluster4 maintains a connection with every other node, which creates a mesh network topology that has trouble scaling as the number of nodes increases.

In spite of these challenges, Distributed Erlang remains an attractive and ultimately still very effective solution for soft real time systems at web-scale. That’s because the core design features of independent processes, message passing, and immutable data translates in a near-transparent way to running on multiple machines communicating over a network. This means that many programs can leverage the CPU and memory of dozens of machines, with few changes. This can simplify system architectures as you no longer require a separate service to share state between application instances. For example, I find that instead of using Redis to cache data I more readily use GenServers or ETS tables. Once you see a system working in production, it’s very exciting.

Caravan was designed to help make it easier to get you to production usage in AWS faster and with less pain by doing two things:

  1. Replacing EPMD by supplying an alternative distribution mechanism that uses a simple node naming convention to discover the port. This allows us to decide up front what port our application will use.
  2. Providing an optional libcluster strategy that gets node names from a DNS record and performs a Node.connect() to each of them, automating cluster formation.

Next time, we’ll look more closely at Caravan’s design and usage.

  1. I’m going to be upfront that I’m not an expert on the history of Erlang and Ericsson. I’ve tried to stick to statements that I can verify or are well documented, but if I get something wrong I apologize. [return]
  2. This covered in more depth by the excellent chapter on distribution in Learn you Some Erlang For Great Good. [return]
  3. A “Node” is a running Erlang VM started in “distribution mode” by either providing the --sname or --name parameters. [return]
  4. For our purposes, a cluster is just 2 or more Erlang nodes that are connected to one another. [return]