Functional DevOps in a Dysfunctional World

Posted on 4 July 2019

This is a pseudo-transcript of a presentation I gave at the linux.conf.au 2018 Real World Functional Programming Miniconf.

What is DevOps about? For me it’s about my relationship to the phrase

It works on my machine.

I’ve been guilty of saying this in the past, and quite frankly, it isn’t good enough. After the development team has written their last line of code, some amount of work still needs to happen in order for the software to deliver value.

A few jobs ago I was at a small web development shop, and my deployment workflow was as follows:

Log on to the development server and take careful notes on how it had diverged from the production server. Carefully set aside some time to ‘do the deploy’. Log on to the production server and do a git pull to get the latest code changes. Perform database migrations according to the notes you made earlier. Manually make any other required changes.

Despite my best efforts, I would inevitably run into issues whenever I did this, resulting in site outages and frustrated clients. This was far from ideal, but I wasn’t able to articulate why at the time.

I posit that a better deployment process has the following properties:

Automatic : instead of a manual multi-step process, it has a single step, which can be performed automatically.

Repeatable : instead of only being able to deploy to one lovingly hand-maintained server, it can deploy reliably multiple times to multiple servers.

Idempotent : if the target is already in the desired state, no extra work needs to be done.

Reversible : if it turns out I made a mistake, I can go back to the previous state.

Atomic: an external observer can only see the new state or the old state, not any intermediate state.

I hope to demonstrate how the Nix suite of tools (Nix, NixOS, and NixOps) fulfill these properties and provide a better DevOps experience.

To make things easier, I’m not assuming that you already run NixOS. Any Linux distro should do, as long as you’ve installed Nix. macOS users will be able to follow along until I get to the NixOps section.

Shipping it

Packaging

Suppose we have been given a small Haskell app to get up and running:

Main.hs

{-# LANGUAGE OverloadedStrings #-} import Web.Scotty import Data.Monoid (mconcat) (mconcat) = scotty 3000 $ do mainscotty "/:word" $ do get <- param "word" beamparam $ mconcat [ "<h1>Scotty, " , beam, " me up!</h1>" ] html, beam,

blank-me-up.cabal

: blank - me - up nameblankmeup : 0.1 . 0.0 version : BSD3 license - type : Simple build - version : >= 1.10 cabalversion - me - up executable blankmeup - is : Main.hs mainisMain.hs - depends : base >= 4.9 && < 5 builddependsbase , scotty - language : Haskell2010 defaultlanguage

(This example is taken straight from Scotty’s README.)

Our first step is to build this app and quickly check that it works. We’ll need Nix and cabal2nix , which turns .cabal files into configuration for the Nix package manager. Assuming we’ve installed cabal2nix :

$ nix-env -i cabal2nix -i cabal2nix < a lot of output > lot of output created < number > symlinks in user environment numbersymlinks in user environment

How do we know it worked? Try nix-env -q (short for --query ):

$ nix-env -q -q cabal2nix

Okay, assuming the app is in the app subdirectory, let’s create a directory called nix to store our .nix files and begin:

$ cd nix nix $ cabal2nix ../app/ --shell > default.nix ../app/ --shelldefault.nix

default.nix might look something like

{ nixpkgs ? import < nixpkgs > { } , compiler ? "default" }: ? importnixpkgs}: let inherit (nixpkgs) pkgs ; (nixpkgs) f = { mkDerivation, base, scotty, stdenv }: = { mkDerivation, base, scotty, stdenv }: mkDerivation { pname = "blank-me-up" ; version = "0.1.0.0" ; src = ../app ; = ../app isLibrary = false ; = false isExecutable = true ; = true executableHaskellDepends = [ base scotty ] ; = [ base scotty ] license = stdenv.lib.licenses.bsd3 ; = stdenv.lib.licenses.bsd3 }; haskellPackages = if compiler == "default" = if compiler == then pkgs.haskellPackages else pkgs.haskell.packages. ${compiler} ; drv = haskellPackages.callPackage f {}; = haskellPackages.callPackage f {}; in if pkgs.lib.inNixShell then drv.env else drv then drv.env else drv

Now we can build our project by running nix-build , which tries to build default.nix in the current directory if no arguments are provided:

$ nix-build < lots of output > of output /nix/store / < hash > -blank-me-up-0.1.0.0 -blank-me-up-0.1.0.0

There should also be a new result symlink in the current directory, which points to the path above:

$ readlink result result /nix/store / < hash > -blank-me-up-0.1.0.0 -blank-me-up-0.1.0.0

Notice that we’ve built a Haskell executable without having to directly deal with any Haskell-specific tooling (unless you count cabal2nix ). Nix works best if you allow it full control over builds, as we do here.

What happens if we run nix-build again without changing anything?

$ nix-build /nix/store / < hash > -blank-me-up-0.1.0.0 -blank-me-up-0.1.0.0

It should be nearly instantaneous and not require rebuilding anything. Nix tries to think of build outputs as a pure function of its inputs, and since our inputs are unchanged, it is able to give us back the same path that it did before. This is what I mean when I say Nix is declarative.

What if we break our app:

--- a/functional-devops/app/Main.hs +++ b/functional-devops/app/Main.hs @@ -4,6 +4,8 @@ import Web.Scotty import Data.Monoid (mconcat) +broken + main = scotty 3000 $ do get "/:word" $ do beam

and try to build again?

$ nix-build < ... > Building executable 'blank-me-up' for blank-me-up-0.1.0.0.. executablefor blank-me-up-0.1.0.0.. [ 1 of 1] Compiling Main ( Main.hs, dist/build/blank-me-up/blank-me-up-tmp/Main.o ) of 1] Compiling Main ( Main.hs, dist/build/blank-me-up/blank-me-up-tmp/Main.o ) Main.hs :7:1: error: :7:1: error: Parse error: module header, import declaration error: module header, import declaration or top-level declaration expected. top-level declaration expected. | 7 | broken | ^^^^^^ ^^^^^^ builder for '/nix/store/<hash>-blank-me-up-0.1.0.0.drv' failed with exit code 1 forfailed with exit code 1 error : build of '/nix/store/<hash>-blank-me-up-0.1.0.0.drv' failed : build offailed < ... >

It fails, as one would hope, but more importantly the previous symlink at result is still available! This is because nix-build completes the build before atomically updating the symlink at result to point to the new artifact. This way, we can move from one known working state to another, without exposing our users to any intermediate brokenness.

Service Configuration

Okay, now that we’re able to successfully build the app, let’s configure a service file so that systemd can manage our app. I don’t know of any tools that automatically generate this so I always find myself copying and pasting from an existing service file. Here’s one I prepared earlier.

nix/service.nix

{ config , lib, pkgs, ... } : #1 , lib, pkgs, ... let #2 blank-me-up = pkgs.callPackage ./default.nix {}; #3 = pkgs.callPackage ./default.nix {}; in { options.services.blank-me-up.enable = lib.mkEnableOption "Blank Me Up" ; #4 = lib.mkEnableOption config = lib.mkIf config.services.blank-me-up.enable { #5 = lib.mkIf config.services.blank-me-up.enable { networking.firewall.allowedTCPPorts = [ 3000 ] ; #6 = [ 3000 ] systemd.services.blank-me-up = { #7 = { description = "Blank Me Up" ; after = [ "network.target" ] ; = [ wantedBy = [ "multi-user.target" ] ; = [ serviceConfig = { = { ExecStart = " ${blank- me-up } /bin/blank-me-up" ; me-up Restart = "always" ; KillMode = "process" ; } ; }; }; }

This isn’t intended to be a Nix language tutorial, but there are a few interesting things that I want to point out. For a more comprehensive overview of the language, see here or here.

These are the arguments to this expression that the caller will pass. Another way to think of this is as a form of dependency injection. let expressions work similarly to Haskell. This is the equivalent of our nix-build from before. We define a single option that enables our service. The config attribute contains service configuration. We expose port 3000. If you squint this looks a lot like a regular unit file. More on this below.

It would be useful to look at the systemd service file that gets generated from this configuraation. To do this, we’ll need one more file:

ops/webserver.nix

{ ... } : { imports = [ ../nix/service.nix ] ; = [ ../nix/service.nix ] services.blank-me-up.enable = true ; = true }

This is a function that imports the above configuration and enables the blank-me-up service. With this in place, we can do

$ nix-instantiate --eval -E '(import <nixpkgs/nixos/lib/eval-config.nix> { modules = [./ops/webserver.nix]; }).config.systemd.units."blank-me-up.service".text' --eval -E

We’re using nix-instantiate to evaluate ( --eval ) an expression ( -E ) that uses eval-config.nix from the library to import the file we created and output the text of the final unit file. The output of this is pretty messy, but we can use jq to clean it up:

$ nix-instantiate --eval -E '(import <nixpkgs/nixos/lib/eval-config.nix> { modules = [./ops/webserver.nix]; }).config.systemd.units."blank-me-up.service".text' | jq -r --eval -E-r

Here’s what that looks like on my machine:

Generated systemd service [Unit] After=network.target Description=Blank Me Up [Service] Environment="LOCALE_ARCHIVE=/nix/store/ -glibc-locales-2.27/lib/locale/locale-archive" Environment="PATH=/nix/store/ -coreutils-8.30/bin:/nix/store/ -findutils-4.6.0/bin:/nix/store/ -gnugrep-3.3/bin:/nix/store/ -gnused-4.7/bin:/nix/store/ -systemd-239.20190219/bin:/nix/store/ -coreutils-8.30/sbin:/nix/store/ -findutils-4.6.0/sbin:/nix/store/ -gnugrep-3.3/sbin:/nix/store/ -gnused-4.7/sbin:/nix/store/ -systemd-239.20190219/sbin" Environment="TZDIR=/nix/store/ -tzdata-2019a/share/zoneinfo" ExecStart=/nix/store/ -blank-me-up-0.1.0.0/bin/blank-me-up KillMode=process Restart=always

Hopefully at this point you’re convinced that Nix can take some quasi-JSON and turn it into a binary and a systemd service file. Let’s deploy this!

Deploying

First, we install NixOps:

$ nix-env -i nixops -i nixops

We also have to set up VirtualBox, which I’ll be using as my deploy target. If you’re using NixOS this is as simple as adding the following line to configuration.nix :

virtualisation.virtualbox.host.enable = true ; = true

and running sudo nixos-rebuild switch . If you’re using another Linux distro, install VirtualBox and set up a host-only network called vboxnet0 .

We’ll be using the instructions from the manual as our starting point. Create two files:

ops/trivial.nix

{ network.description = "Web server" ; network.enableRollback = true ; = true webserver = import ./webserver.nix ; = import ./webserver.nix }

ops/trivial-vbox.nix

{ webserver = { config , pkgs, ... } : , pkgs, ... { deployment.targetEnv = "virtualbox" ; deployment.virtualbox.headless = true ; # don't show a display = true deployment.virtualbox.memorySize = 1024 ; # megabytes = 1024 deployment.virtualbox.vcpu = 2 ; # number of cpus = 2 } ; }

We should now be able to create a new deployment:

$ cd ops ops $ nixops create trivial.nix trivial-vbox.nix -d trivial create trivial.nix trivial-vbox.nix -d trivial

and deploy it:

$ nixops deploy -d trivial deploy -d trivial

and assuming that everything goes well, we should see a lot of terminal output and at least one mention of ssh://root@<ip> , which is the IP of our target.

We should then be able to go to http://<ip>:3000 and see our web app in action!

$ curl http:// < ip > :3000/help http://ip:3000/help < h1 > Scotty, help me up! < /h1 > Scotty, help me up!/h1

NixOps also allows us to SSH in for troubleshooting purposes or to view logs:

$ nixops ssh -d trivial webserver ssh -d trivial webserver < ... > [ root@webserver :~]# systemctl status blank-me-up :~]# systemctl status blank-me-up

Responding to change

This is fantastic, but deployments are rarely fire-and-forget. What happens when our requirements change? In fact, there’s a serious issue with our application, which is that it hardcodes the port that it listens on. If we wanted it to listen on a different port, or to run more than one instance of it on the same machine, we’d need to do something differently.

The correct solution would be to talk to the developers and have them implement support, but in the meantime, how should we proceed?

Patching

Nix gives us full control over each part of the build and deployment process, and we can patch the software as a stopgap measure. Although this scenario is somewhat contrived, I have in fact had to take matters into my own hands like this in the past when the development team hasn’t been able to prioritise fixing a production issue.

Our new expression looks like this:

nix/patched.nix

args@ { nixpkgs ? import < nixpkgs > {}, compiler ? "default" }: { nixpkgs ? importnixpkgs{},}: ( import ./default.nix args ) .overrideAttrs (old: { ./default.nix args(old: { postPatch = let = let oldImport = '' import Web.Scotty Web.Scotty '' ; newImport = '' import Web.Scotty Web.Scotty import System.Environment (getArgs) System.Environment (getArgs) '' ; oldMain = '' main = scotty 3000 $ do = scotty 3000 $ do '' ; newMain = '' main = getArgs >> = \( port:_) - > scotty (read port) $ do = getArgsport:_)scotty (read port) $ '' ; in '' substituteInPlace Main.hs --replace '${oldImport}' '${newImport}' Main.hs --replace substituteInPlace Main.hs --replace '${oldMain}' '${newMain}' Main.hs --replace cat Main.hs Main.hs '' ; })

I’ve added that cat Main.hs at the end to

confirm that the file was correctly patched

emphasise that arbitrary shell commands can be executed

We can create a new service definition to use this expression:

nix/service-patched.nix

{ config , lib, pkgs, ... } : , lib, pkgs, ... let blank-me-up = pkgs.callPackage ./patched.nix { nixpkgs = pkgs ; }; = pkgs.callPackage ./patched.nix { nixpkgs = pkgs}; cfg = config.services.blank-me-up ; = config.services.blank-me-up in { options.services.blank-me-up.enable = lib.mkEnableOption "Blank Me Up" ; = lib.mkEnableOption options.services.blank-me-up.port = lib.mkOption { = lib.mkOption { default = 3000 ; = 3000 type = lib.types.int ; = lib.types.int } ; config = lib.mkIf cfg.enable { = lib.mkIf cfg.enable { networking.firewall.allowedTCPPorts = [ cfg.port ] ; = [ cfg.port ] systemd.services.blank-me-up = { = { description = "Blank Me Up" ; after = [ "network.target" ] ; = [ wantedBy = [ "multi-user.target" ] ; = [ serviceConfig = { = { ExecStart = " ${blank- me-up } /bin/blank-me-up ${toString cfg.port } " ; me-up Restart = "always" ; KillMode = "process" ; }; }; }; }

We make sure to pass the configured port in on startup and open the firewall appropriately.

Deploying (Again)

We update webserver.nix to use the patched service and specify a different port:

{ ... } : { imports = [ ../nix/service-patched.nix ] ; = [ ../nix/service-patched.nix ] services.blank-me-up.enable = true ; = true services.blank-me-up.port = 3001 ; = 3001 }

And we can deploy again!

$ nixops deploy -d trivial deploy -d trivial

The service should now be running on http://<ip>:3001 instead of http://<ip>:3000 .

$ curl http:// < ip > :3001/pull http://ip:3001/pull < h1 > Scotty, pull me up! < /h1 > Scotty, pull me up!/h1

If we made a mistake, rolling back is easy:

$ nixops list-generations -d trivial list-generations -d trivial 1 < timestamp > timestamp 2 < timestamp > (current) timestamp(current) $ nixops rollback -d trivial 1 rollback -d trivial 1 switching from generation 2 to 1 from generation 2 to 1 webserver > copying closure... copying closure... trivial > closures copied successfully closures copied successfully < more output > output

and in fact nothing needs to be copied to the target, because the previous deployment is still there.

Conclusion

As demonstrated, the Nix ecosystem allows us to impose order on the usually messy and ad-hoc practice of packaging and deploying software at scale. I’m satisfied that this is the way forward and hope that you will consider using these tools to tackle problems of your own!