All right, let’s finish this thing off!

First, I want to summarize, second I want to describe a whole lot of interesting stuff that I did not get to, and third, I want to give a selection of papers and further reading that inspired me in this series.

If you come away with nothing else from this series, the key message is: probabilistic programming is important, it is too hard today, and we can do a lot better than we are doing. We need to build better tools that leverage the type system and support line-of-business programmers who need to do probabilistic work, the same way that we built better tools for programmers who needed to use sequences, or databases, or asynchrony.

We started this series with me complaining about System.Random , hence the name of the series. Even though some of the implementation details have finally improved after only some decades of misleading developers, we are still dealing with random numbers like it was 1972.

, hence the name of the series. Even though some of the implementation details have finally improved after only some decades of misleading developers, we are still dealing with random numbers like it was 1972. The abstraction that we’re missing is to make “value drawn from a random distribution” a part of the type system, the same way that “function that returns a value”, or “sequence of values”, or “value that might be null” is part of the type system.

The type we want is something like a sequence enumeration, but instead of a “move next” operation, we’ve got a “sample” operation.

If we stick to simple distributions where the support is finite and small and the “weights” are integers, and restrict ourselves to pure functions, we can create new distributions from old ones using the same operations that we use to create new sequences from old ones: the LINQ operators Select , SelectMany and Where .

, and . Moreover, we can compute distributions exactly at runtime, without doing rejection sampling or other clever techniques.

And we can even use query comprehension syntax, which makes me happy.

From this we can see that probability distributions are monads; P(A) is just IDistribution<A>

is just We also see that conditional probabilities P(B given A) are just Func<A, IDistribution<B>> — they are likelihood functions.

are just — they are likelihood functions. The SelectMany operation on the probability monad lets us combine a likelihood function with a prior probability.

operation on the probability monad lets us combine a likelihood function with a prior probability. The Where operation lets us compute a posterior given a prior, a likelihood and an observation.

This is an extremely useful result; even domain experts like doctors often do not have good intuitions about how an observation should change our opinions about the probability of an event having occurred. A positive result on a test of a rare disease may be only weak evidence if the test failure rate is close to the disease rate.

Can we put these features in the language as well as the type system? Abusing LINQ is clever but maybe not the best from a usability perspective.

We could in fact embed sample and condition operators in the C# language, just as we embedded await . We can then write an imperative workflow, and have the compiler generate a method that returns the correct probability distribution, just as await allows us to write an asynchronous workflow and the compiler generates a method that returns a task!

and operators in the C# language, just as we embedded . We can then write an imperative workflow, and have the compiler generate a method that returns the correct probability distribution, just as allows us to write an asynchronous workflow and the compiler generates a method that returns a task! Unfortunately, real-world probability problems are seldom discrete, small, and completely analyzable; they’re more often continuous and approximated.

Implementing Sample efficiently on arbitrary distributions turns out to be a hard problem.

efficiently on arbitrary distributions turns out to be a hard problem. We can use our tools to generate Markov chains.

We can use Markov chains to implement Sample using the Metropolis Algorithm

using the Metropolis Algorithm If we have a continuous prior (like a mint that produces coins of a certain quality with a certain probability) and a discrete likelihood (like the probability of a coin flip coming up heads), we can use this technique to compute a continuous posterior given an observation of one or more flips.

This is a very useful technique in many real-world applications.

Computing the expected value of a distribution given a function is a tricky problem in of itself, but there are a variety of techniques we can use.

And if we can do that, we can solve some high-dimensional integral calculus problems!

That was rather a lot, and we still did not get to everything I wanted to.

By far the biggest thing that I did not get to, and that I may return to in another series of posts, is: the connection between await as an operator and sample as an operator is deeper than you might think.

As I noted above, you can put sample and condition operations in a language and have the compiler build a method that when run, generates a simple, discrete distribution. But it turns out we can actually do a pretty good job of dealing with sample operators on non-discrete distributions as well, by having the compiler be smart about using some of the techniques that we discussed in this series for continuous distributions.

What you really need is the ability to pick a sample, run a little bit of a routine, remember the result, back up a bit and try a different sample, and so on; from this we can build a distribution of program traces, and from that we can build an approximation of the distribution of the output of a probabilistic method!

This kind of control flow is tricky; it’s sort of a generalization of coroutines where you’re allowed to re-run code that you’ve run before, but with different values for the variables to see what happens.

Obviously it is crucially important that the methods be pure! It’s also crucially important that you spend most of your time exploring high-likelihood control flows, because the number of unlikely control flows is nigh infinite. If this sounds a bit like training up an ML model and then using that model in production later, that’s because it is basically the same thing, but applied to programs.

I know what you’re thinking: that sounds bonkers. But — here is the thing that really got me motivated to write this series in the first place — we actually did it.

A while back my colleague Michael and I hacked together (ha ha) an implementation of multi-shot-continuations for Hack, our PHP variant and showed that we could in fact do probabilistic workflows where we have a distribution, and we sample from it some number of times, and trace out what happens in the program as we do so.

I then went on to work on other projects, but in the meanwhile a team of people who understand statistics far, far better than I do actually built an industrial-strength probabilistic control flow language with a sample operator.

You can read all about it at their paper Hack PPL: A Universal Probabilistic Programming Language.

Another point that I very much wanted to get to in this series but did not is: we can do the same thing in C#, and in fact we can do it today.

The C# team added the ability to return types other than Task<T> from asynchronous workflows; and it turns out you only need to abuse that feature a small amount to convince C# to “go back a bit” in the workflow – back to the previous await operation, which becomes a stand-in for sample — and re-run portions of it with different sampled values. The C# team could add probabilistic workflows to C# tomorrow.

The C# team has historically done a great job of finding useful monads and embedding them into the control flow of the language; monadic probabilistic workflows with multi-shot continuations could be the next one. How about it team?

Finally, here’s a very incomplete list of papers and web sites that were inspiring to me in writing this series. I learned a lot, and there is plenty more to learn; I’ve just scratched the surface here.

The idea that we can treat probability distributions like LINQ queries was given to me by my friend and director Erik Meijer; his fun and accessible paper Making Money Using Math hits many of the same points I did in this series, but I did it with a lot more code.

The design of coroutines in Kotlin was a great inspiration to me; they’ve done a great job of making features that you would normally think of as being part of the language proper, like yield and await , into library methods. The first thing I did in my multi-shot coroutine hack was verify that I could simulate those features. (I was also very pleased to discover that much of this work was implemented by my old colleague Nikov from the C# team!)

and , into library methods. The first thing I did in my multi-shot coroutine hack was verify that I could simulate those features. (I was also very pleased to discover that much of this work was implemented by my old colleague Nikov from the C# team!) An introduction to probabilistic programming is a book-length work that uses a Lisp-like language with sample and observe primitives.

Church is another Lisp-like language often used in academic studies of PPLs.

The webppl.org web site has a Javascript-based implementation of a probabilistic programming language and a lot of great supporting information at dippl.org.

The next few papers are more technical.

And these are some good ones for the math:

There are many more that I am forgetting, and I’ll add to this list as I recall them.

All right, that was super fun; I am off on my annual vacation where I have no phone or internet, so I’m going to take a bit of a break from blogging; we’ll see you in a month or so for more fabulous adventures in coding!