…Simple and efficient threading, correctness and speed

If you program in Javascript, C++, C, Python, Java and so on, you are programming in an imperative language - a language where your code contains step-by-step instructions what you want the computer to do. “If this do that.” You have variables and objects that change while the program runs. Functions are very powerful and can reach out and read or update the state of any other object in the program or the operating system.

Imperative programming is the most common way of programming a computer today. It ships games and apps and webpages. You feel powerful using it. It’s also a mess.

The old rival to imperative programming is functional programming which works in a completely different way. There are a lot of cool things going on in the world of functional languages, but also an aspect of putting theory before practicality. But there are two extremely powerful concepts that we imperative programmers should just run in and Indiana Jones from their evil temple RIGHT NOW: pure functions and immutable data.

Pure Functions

A pure function is a function that given the same input arguments always returns the same result and that has no side effects.

All it does is use its input argument to calculate a result. Simple. Realise that the inputs and result can be complex objects or collections, not just numbers.

It cannot read or modify any global variables, even indirectly. It cannot query the operating system or libraries or cause changes in those. It cannot modify its input arguments.

Those simple sacrifices on the altar of purenessness gives your function these magic abilities:

Even a monkey can test it. Just call it with some inputs and check the output. There is no setup needed. It’s simple to write. It is very reusable since it has few dependencies and makes few assumptions. It’s very very simple to use since it has no side effects. Even for less gifted monkeys. Level 2 magic: it is thread agnostic! You can run it from many threads and it cannot cause problems. No need for locks! Speed magic: You can easily cache its result or replace a call to it with a lookup-table. Or shortcut it for special common inputs. Speed magic: The hardware loves data that isn’t partially changed. Speed magic: No aliasing problems.

Wait a minute! If the inputs cannot be changed then the result must be full copies! A pure function that modifies a 10 MB object must return a copy of those 10 MB, even though only a single byte was changed!? And another thing! Objects?! Can’t I use objects with member functions? Read on!

About Immutability

Immutable data is data that doesn’t change while the program is running. It has surprising and counter-intuitive super-powers (CISP)!

Let’s think about constants for a moment. Let’s say a constant string. It’s dead simple to use a constant string in an imperative language and it’s practically impossible to mess things up. Why is that?

It is always the same so you can be sure you don’t need to track if / when it changes to keep things in sync. You know it will be properly initialised before you can read it, so no timing or order of function calls can complicate things. You know you can keep a reference or pointer to it and know for sure that no other code can ever change or invalidate it. No need to copy it to be sure! There can be no thread data races on a string that never changes - it is thread agnostic.

Fantastic, right? But it turns out: you can get all the advantages of constant data for all data in your program!

Just always create variables and objects (and collections) in a correct initial state and then never allow them to change or be deleted! Now the full checklist above applies to this data too. But you can create them while your program is running!

Let this sink in for a while and you will think about programming in a different way from now on!

Multi-threading in Imperative Programs

In imperative programs every function can potentially reach out and call the OS or read or modify some other data via functions that use singletons or globals etc.

To make a program thread safe you need to try to figure out exactly where data races could happen between threads. Then you need to insert lock objects / mutexes next to the problematic data and alter all code that accesses that data to also control the locks properly.

Trouble is that the exact spots where a data race happens is a consequence of how the program is put together right now and changing the client code can/will move those spots. Now you need to fix those issues somewhere else! It is easy to leave behind messy unused locking code this way!

Also, most instances of those classes might not actually have data races - but still pay for that performance cost for every instance! In a traditional multi threaded program, how many percents of the lock instances actually protect data that has race conditions in the program, and how many are wasted clock cycles?

Locks also introduce the risk of deadlocks and other lock-related problems.

Another common approach solving this is to raise big Berlin walls right through the program (often a quite costly process) and use special techniques to smuggle information over the walls: agents, messages etc. (See what I did there with the metaphor!)

Multi-threading in Programs using Immutability and Pure Functions

An immutable / pure function design does the complete opposite. All code and data is per default thread safe. There can be no threading problems.

Then carefully introduce mutable state in a handful places in your program where you need to advance the state of your program.

These few spots needs proper thread synchronisation - and now your entire program is thread safe.

About Persistent Data Structures

OK you use immutable data and pure functions. Your code is elegant, simple and uses loads of threads with no worries. But it copies data around like hell. That function that sets a pixel in a bitmap always copies the entire bitmap! 10 MB of pixels copied every time set_pixel() is called!?

Persistent data structures fixes a lot (but not all) of the performance problems of an immutable design / pure function design without losing all the advantages we just got!

Also remember that we have recieved some performance potential already from pure functions and immutable data: Code is much easier to optimise. Cache function calls, pre-calculate data in worker threads (simple now!). No need to ever copy data to make sure it’s not changed somehow. It’s simple to use threading to accellerate the program etc.

A persistent data structure is immutable (just like we love them), but still have features for making changes to them(!) just like a standard C++ mutable data structure. Like push_back() on a std::vector. The trick is that every change returns a new copy of the data structure, leaving the old version as-is.

A naive implementation of a persistent data structures would be very inefficient - just copy the original objects, perform the modification to the copy, then return the copy. Convenient for client code and immutable but inefficient.

A good persistent data structure solves this internally without exposing all those details to the clients. When you perform a change on a data structure, they internally share most of the state between the original object and the new, modified object, thus getting speed and reducing use of other resources.

This gets you all the advantages of immutability and much of the efficiency of mutability and allows us to both easily and efficiently write those golden pure functions!

Shameless self-promotion: I have written a C++ adaptation of Clojure’s cool persistent vector class. It’s free, solid and ready for your consumption: Steady C++ Vector

This is a Pipe Dream! I Program in Reality!

You are making iPhone apps (or something else). You need to work with Cocoa callbacks and event loops. (Or writing a game engine, or making embedded software of a node.js program.) There is no way you can use immutability and pure functions, right?

Well what you can do is to make your low-level code use these concepts, even if the top-level code that talks to Cocoa can’t. You can factor-out as much of your core functionality into code where you use pure functions and immutability.

What you cannot do is make pure function that internally calls impure functions. Not without great care anyways.

Objects, What About My Objects?

I get the input arguments and result thing. But what about member functions? The answer is quite simple. A member function has a hidden argument (called this in C++) that specified the object we’re operating on. For a const function (your object is immutable, remember?) this is just another immutable input argument. No problem. It’s just syntactic sugar that makes it look special.

class person { ... string get_email(int index) const{ return _email_addresses[index]; } };

Summary

Start using writing your C++, Java and Javascript code using pure functions and immutability NOW. It gets super powers - threading, performance, simpifies you code and more - and will bring you happiness and joy. True, you will not be able to write all your code like this - but probably more than you think! Also remember it’s fine to use OOP and classes and structs as usual, just avoid non-cost member functions!

References

Persistent data structures: Clojure’s persistent vector class https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java

Persistent data structures: HAMT, Phil Bagwell en.wikipedia.org/wiki/Hash_array_mapped_trie

Excellent explanation of Clojure’s vector class by Jean Niklas L'orange: hypirion.com/musings/understanding-persistent-vector-pt-1

Another in-depth explanation of persistent data structures and Clojures vector class, by Bartosz Milewski bartoszmilewski.com/2013/11/13/functional-data-structures-in-c-lists

My own C++ adaptation of Clojure’s magical vector class - go get it: Steady C++ Vector