Do you want to know how deep the rabbit hole goes?

Beyond Programming Languages

In the old days, developers relied on fancy programming languages to express computation. Given the wild proliferation of languages over the years, it’s clear a new approach is needed. Programming languages hide a lot of sausage-making.

Programming languages are unfortunately ‘monolithic’ approaches: If you need more than one hardware box or programming language … well too bad … programming languages just aren’t built for that.

What we really want is more control over the computation (compute) graph that actually gets run on the hardware. So what if we could treat programs more like data structures?

Enter “Functional Programming” or FP.

Function Calls Grow Up

The selling point sounds alluring e.g. writing *functions* is easy… After all, functions are the basic widget of programming (going clear back to 1930s lambda calculus). Regardless of the programming language, you could almost always count on “functions” as basic building blocks.

The venerable function call

Now, coders traditionally would ‘assemble’ functions by having one function ‘call’ another. Or better yet introduce a ‘parent’ function that orchestrates the calls to the children functions. Unfortunately, you basically end up with a rigid function call graph/tree. This works fine in most situations but as foo() and bar() grow in complexity over time, it starts to create some issues.

Pulling Rabbits out of Hats

Eventually, say, you might want to assign >1 processor to a function or maybe using a Python library with your Scala. This requires splitting foo() and bar() apart somehow in a rather violent manner. The developer must go back into the code and “break” the hardwired calls and convert it into a queue (say Rabbit or whatever) and then set up whatever magic is needed to pull it off:

Yes FP fans, another word for queue is “pipe”

The bigger problem here is that we want more flexibility with our code.

Hey Silicon Valley — we hate to interrupt the wine tasting but the most basic “building block” of the software industry … the lambda… is in trouble.

Functions as First Class Objects

The first step toward greater flexibility is realizing that functions probably should be treated more like data objects— not unlike “widgets” in the LEAN manufacturing world. AWS Lambda heads this direction … but do we really want Amazon’s billing department watching what we are doing?

Where DO Those Functions Live, Anyway?

Something even bigger is lacking. Our little chain of functions might actually need its own data structure.

Function call graph

Hah! We are so used to letting compilers etc. handle this that we’ve forgotten that functions don’t just float round in space. Although functions are first class objects in many programming languages, they still need some surrounding infrastructure to connect them and make them usable.

Category = Compute Graph

Mathematically speaking, functions are supposed to be held in something called “categories” … or “cats”. In computing, most categories manifest as compute graphs. For a better explanation of graph versus category, see here.

Meow meow.

[ WARNING: Wonky terminology ahead — may we suggest this executive summary instead ]

Categories are function containers

Hidden Cats

But when was the last time you actually stumbled across a “category” while coding? You probably haven’t. We tend to chain functions directly or the categories are ‘baked’ deep inside the walls of programming languages so we don’t even think about them. Cats are a bit like the movie Matrix in that they are all around us — yet developers are rarely allowed to play with them directly. That’s a justifiable criticism of Haskell, which claims to be all about Category Theory yet does a pretty good job of hiding things.

Haskell programmers trying to find the Cat

Functional Programming

However in Functional Programming (FP), you probably ran across something mysterious called a “monad” that helps wire functions together.

Handing things off to infrastructure

What is a Monad?

Monad? The “M” word. Yeah I’m tossing out some strange terminology.

In FP, you are either (1) writing functions or (2) doing something else (like gluing or wiring functions together). Simply put, a monad is an industry-generic term for that “something else”.

Monads have two jobs:

Orchestration of function execution

Shuttling data around between functions

Technically, monads are mathematical handlers for ‘computation’. They are used with categories.

The Elusive Kleisli

How do we use these monads? Actually I will use the upper-case “M” because Functional Programming does something special with them. You see, FP won’t let you create a Cat from scratch but it does offer a special (if somewhat hidden) compute graph called a Kleisli. Technically, each monad crunches on bits of code and spits out its own Kleisli “effects” on the final compute graph.

The idea is that you first package your code into Monads™ and then they build the final compute graph.

That’s still a lot of sausage-making that makes PyTorch trivial by comparison.

So — much like TensorFlow — you don’t really touch the category / compute graph directly but instead let monads do the processing.

Monads are kinda like a code proxy. They live “one level down” from the programming language and convert pieces of code into sections of the compute graph

Various programming languages kinda do this approach all the time when converting high level code into assembly so don’t be confused by terminology … really any “boilerplate” feature of a programming language is kinda ‘monadic’ of sorts.

T-Algebras/Eilenberg-Moore

Okay fine — the Kleisli stuff goes into our compute graph. But isn’t Functional Programming named for Functions not Monads?

The software industry tends to conflate a functional programming language with the compute category. They are obviously related but NOT quite the same thing.

Programming world and compute graph are not quite the same

Programming languages themselves are also a flavor of “cats”. Specifically, code lives in “syntax” categories and we use “monads” to map them to the compute cats.

This is actually the main reason why programmers bother with all this ugly Category Theory terminology in the first place e.g. to describe how to map operations from programming world to the machine world.

Unfortunately, as you can see the terminology is rather daunting … by convention we say that programing languages (quasi algebras) live in a strange world called Eilenberg-Moore (or E-M) … kinda like a “library” … and you use Monads to pick goodies from E-M and stuff into your compute graph. Although it sounds like math fluff, this is a more engineering-like approach to building software.

Monads map bits of code to sections of the compute graph

Each monad can do different things to the compute graph

Beyond Monolithic Programming Languages

Functional Programming probably would have gone differently if we could push Monads to the cloud directly like in TensorFlow etc. but it never worked out that way. For starters, FP is intimately tied to a single programming language yet complex tasks often need > 1 language.

More than one programming language ?!? I hinted to this earlier but foo() and bar() don’t have to be in the same language. The dirty secret is that Cats are bigger than any single programming language (sorry Haskell fans!).

So the temptation of reaching for a shiny new programming language to work with Category Theory doesn’t quite make sense. Cats require a radical new approach.

Categories as First Class Objects

The better answer is to start treating Cats as first class objects and build them yourself. Then you are free to add what you want. This requires a new type of machine which we will explain in a moment. But first we need to explore the data side more.

Cats Have Dual Use

As you may have guessed, Cats are really data models. In fact, Cats are awesome data models— most obviously a graph but also a relational model. In fact the notion of relational “join” is the same in SQL and in Functional Programming. That’s pretty crazy, right? We normally don’t think of joining code the same way as joining data.

But what if we could?

Enter Convergent Hardware

The key to understanding where Category Theory is really going is look at major trends in hardware. As storage RAM and transient RAM converge, the difference between the database world and traditional programming gets fuzzy. In fact, “Persistent Memory” is a creature that plays in both the traditional programming and database world simultaneously. We call this a “fixed point” in math terms.

Duality of Code and Data

This implies that databases and traditional programming are going to collide someday if not merge. Much wow. That allows us to clean up a lot of redundancies in the old Silicon Valley tech stack, which is getting quite outdated. But what sort of new programming model will arise from that?

Categories are unique in that they can play in both database and traditional programing worlds at the same time.

Hence cats are all the rage.

How It Works

Function memoization illustrates this well. It’s not perfect but you get the idea:

In a nutshell, function calls and relational joins are trying to achieve much the same thing. Imagine function parameters as table columns. Imagine function “calls” as relations. It gets pretty cosmic but that means convergent memory is possible.

Now that we have established a “correspondence” between traditional programming and database — and convergent memory is the proof — we can start to see some big holes with “Functional Programming”…

First Major Flaw of FP: No Schema

Specifically, Category Theory starts to ask about functional-relational design. Uh oh.

Before you build a database you need to build a good data model or ‘schema’

But how is the schema represented? In a Category of course!

But Functional Programming lacks Categories

So where are you gonna put your schema? Nowhere good, that’s for sure. Frustrated, developers have been chasing advanced type theory like HoTT and leaving Haskell for Idris and other languages that at least have some type support, if not ‘schema’. Eventually, they will run full circle back to boring old Entity-Relation diagrams for in-memory structures and try to repackage it as “new”.

Second Major Flaw of FP: Side Effects

Lack of good schema design is forgivable but the next flaw can be much more costly. Database convergence means that Functional Programming can no longer ignore side effects. That is, they must properly manage transactions. Functional Programming ‘experts’ routinely embarrass themselves by screaming about “stateless purity” and “rigor” while simultaneously ignoring a huge chunk of applied Categories in database industry.

“We don’t handle trivial things like state management but if you find something that works please let us know”

Yes, FPers, transactions and databases do actually exist for a reason and not just some mysterious Monad in the shadows. Again, Category Theory and convergent hardware is exposing a lot of the cargo cult nonsense.

Cats Have Been Around

These wild ideas aren’t new. Cats have been lurking around computer science for a long time. However now we have a better understanding of categories, plus the right hardware to pull it off.

The elephant in the room

Where Do Cats Live? The CAM

Because of the convergence property, Categories should be managed in something called a Categorical Abstract Machine (CAM). A CAM is a bit of an Eldritch creature that is (1) part database (2) part operating system and (3) part programming language. Thus, Cats work better with a persistent memory architecture.

OCAML tried to build a CAM but it was too inflexible. In truth, lots of programming languages and systems have gone down this road and borrow some elements of each… operating systems in particular (hence the above connection between Windows and Monad). NixOS is also going this direction with a heavy functional focus.

Perhaps the most famous example is 1960s Multics, which we believe stumbled into CAMs without a good grasp of category theory or the right hardware (and got in pretty far over its head). Nonetheless, Multics is still regarded as the genesis project for modern computing.

More recently, Wall Street risk analytics systems (which are heavily functional) have been going down this road for running simulations.