Much of statistical modelling involves shuffling around operations on stochastic variables — i.e. on their probability distributions. This can be tricky stuff: it either involves loads of manual lifting with fully discrete distributions or clever analytic methods for continuous ones. This is complicated enough that most of basic and intermediate-level applied statistical analysis is done working with a normality assumption; the normal distribution has some simple linearity properties regarding affine functions of normal stochastic variables, and is thus used to avoid dicking around with convolutions and the jacobian method.

Like with Pancito, this is something I once independently tried to do, but got lost trying to model dense spaces and measure theory. Martin Erwig did the actual work so far, even though there’s no work in continuous spaces at all. I’m working on cleaning his code up — that is, making it more like the other GHC libraries, since this was done for an academic paper. I’d like to eventually turn this into Control.Stochastic; it’s not a Data.* library, as it allows for transititions, functions whose outcome depends on stochastic variables. This together with sigfpe’s CA comonad could lead to a very compact library for simulating percolation systems! In general, stochastic programming has wide, deep applications that make this econometrician’s eyes shine with hope.

So after this short status report (apparently I finally manage to become interested enough to work on a project for a significant ammount of time), let’s take a look ath the centerpiece of Erwig’s PFP library: the Dist monad.

First of all, probability is a measure over an algebra of events. Erwig doesn’t mull over too much on algebras, sigma-algebras and measure theory — I admire his restraint. Instead, we have merely the concept of an event: something that might or not happen.



type Event a = a -> Bool



A probability measure, by definition, is a real between 0 and 1. Martin eschews the bounded number problem (you can’t define a type that enforces such a constraint) by making an abstract datatype that will later be hidden (though he didn’t really create an export list).



newtype Probability = P ProbRep

type ProbRep = Float



We omit here the Show method for Probabilities. It involves a somewhat confusing rounding procedure I haven’t studied yet, and isn’t essential for our purpose here. We instead jump directly to the Distribution type. A distribution is a list of singleton events together with their probabilities; the probabilities should sum one, and this will be handled by later handling functions.



newtype Dist a = D {unD :: [(a,ProbRep)]}



Record syntax is used to automatically define a destructor function; this kind of reads “wrong” if you’re used to records where destructor functions describe fields, but is really inconsequential. Now we turn to instancing Dist as a functor:



instance Functor Dist where

fmap f (D d) = D [(f x,p) | (x,p) <- d]



Let’s say you have a Bernoulli probability distribution with paramter 0.5. That is, with 50% probability you will turn to 1, and with 50% probability it’ll be 0. What happens if I want to model a head-or-tails game where I _lose_ one dollar if tails comes out and win one if heads comes up? We then have f(1) with probability 0.5 and f(0) with probability 0.5, where f(x)=2*x-1. This is precisely the definition above; apply the function to the events and leave the probabilites alone.

Now, the monad.



instance Monad Dist where

return x = D [(x,1)]

d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)]

fail _ = D []



The unit of this monad is merely a singleton certain (proper jargon is almost certain when it can only assume one value with probablity 1) distribution. Let’s now remember the type of (>>=) so we’re sure what we’re looking for.

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

d >>= f = D [(y,q*p) | (x,p) <- unD d, (y,q) <- unD (f x)]



The content of d is a list of (Event, probability) pairs hidden inside the D constructor; first we extract the pair with the destructor function; then, a function of type (a -> m b) , in this case (a -> D b) is applied to the event extracted from d ; that function shall return another distribution, which we faithfully destruct into an (Event, probability) pair. Finally, we take the event that function returned and multiply the probabilities.

Let’s examine this all again. Dist is a functor — an endofunctor on the category of Haskell types with functions as morphisms — that takes any type t to a type D t (consisting of a hidden list of event/probability pairs); fmap lifts functions that act on event types to functions that act on the probability distributions by applying the deterministic event->event function to the event part and keeping the probability.

Dist is also a monad; it has a trivial unit, and binds a function of type (pure type -> probability distribution) to a probability distribution itself; the event part of that distribution results from applying the function to the pure type part of the d distribution and keeping the pure type part and multiplying the probabilities. Multiplying probabilities is natural when you’re intersecting events, but why does it happen here? We’ll see this by examining liftM2 . Let’s switch to do-notation for this.

First, we sugar d>>=f into



do { z <- d ; f z}



Multiplying probabilities feels more natural in this context; a stochastic value is taken from a distribution, and applying another stochastic function to it can only be interpreted in terms of an intersection of events. But let’s take a look at liftM. What liftM is supposed to do is lift a function from ordinary types to monadically-qualified types. This clearly is the very same thing as fmap , and only has this synonym for consistency: for higher-arity functions, we have the liftM n family:



liftM :: (a1 -> r) -> m a1 -> m r

liftM2 :: (a1 -> a2 -> r) -> m a1 -> m a2 -> m r

liftM3:: (a1 -> a2 -> a3 -> r) -> m a1 -> m a2 -> m a3 -> m r

liftM4 :: (a1 -> a2 -> a3 -> a4 -> r) -> m a1 -> m a2 -> m a3 -> m a4 -> m r



(In all these, the (Monad m) => context has been omitted). The semantics of fmap/liftM, as we’ve seen, are clear. What semantics does one expect from liftM2 ? How about taking a function from two ordinary types to an ordinary type, and transforming it into a function from two distributions to a distribution? This is the stuff stochastic calculus is made of: operations over random variables. Of course, there’s deep theory on how to do this for continuous types, but let’s think of an easy example. What’s the sum of two dice? Each dice can assume the values 1..6 with a 1/6 probability. Let’s construct the distribution for the sum of them. Each (a,b) combination has a 1/36 = 1/6 x 1/6 probability of happening. Let’s count’em.

Value Combinations that yield it Probability 1 … 0 2 (1,1) 1/36 3 (1,2), (2,1) 2/36 4 (1,3), (2,2), (3,1) 3/36 5 (1,4), (2,3), (3, 2), (4, 1) 4/36 6 (1,5), (2,4), (3, 3), (4, 2), (5,1) 5/36 7 (1,6), (2,5), (3, 4), (4, 3), (5,2), (6,1) 6/36 8 (2,6), (3, 5), (4, 4), (5,3), (6,2) 5/36 9 (3, 6), (4, 5), (5,4), (6,3) 4/36 10 (4, 6), (5, 5), (6,4) 3/36 11 (5,6), (6,5) 2/36 12 (6,6) 1/36 >12 … 0

We’d like to be able to liftM2 (+) two dice so it calculates our probabilities. Let’s first see what a dice looks like



dice = D [(1,1/6), (2,1/6), (3,1/6),(4,1/6),(5,1/6),(6/16)]



What happens when we fmap (that is, liftM a function into that? Remember fmap f (D d) = D [(f x,p) | (x,p) <- d] . Hence,

fmap (*2) dice == D [ 2*x, p | (x,p) <- unD d] == D [(2,1/6), (4,1/6), (6,1/6),(8,1/6),(10,1/6),(12/16)]

That’s the probability distribution for twice the result of the same dice. Now what’s the definition of liftM2 ?

liftM2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }



or, desugaring,



m1 >>=( \ x1 -> m2 >>= (\ x2 -> return (f x1 x2)))



Let’s take this for one event in the join distribution: (2,1). By the definition above, the probabilities of 2 and 1 are multiplied, yielding 1/36; the sum of these ( f x1 x2 ) is 3. The final distribution then is written down with (3, 1/36). When later (1,2) comes up, another (3,1/36) will be written down. As a result, the internal representation of this distribution is actually



[(2,1/36), (3,1/36), (3, 1/36), (4, 1/36), (4, 1/36), (4, 1/36), ..... (12, 1/36)]



We fix this by writing a query function (??) , defined as



(??) :: Event a -> Dist a -> Probability

(??) p = P . sumP . filter (p . fst) . unD



where



sumP :: [(a,ProbRep)] -> ProbRep

sumP = sum . map snd



For example, a simple query over the two-dice distribution is



(==7) ?? (liftM2 (+) dice dice)



where dice is defined as above. The dice distribution is destroyed, liftM2 is applied so to produce the “inner representation” we’ve shown before and then the query combinator filters through the instances of 7 and sums their probabilities. A suitable Show method can be (it actually is, but it’s confusing and spans two different files (in my opinion needlessly)) defined to quickly visualize the table we’ve manually constructed. Evidently, liftMn works accordingly. Bingo: stochastic calculus in a monad. Suitable combinators for defining probability distributions without working directly with the D constructor are defined; for example, dice = uniform [1..6] . Of course, uniform [1,2,2,3] is suitably adjusted for all queries, with 2 being more likely.

This is a financial modeller’s wet dream. To think you can use the empirical distribution function of past data directly as your input, with no ugly binomial/normality assumptions! Of course, distribution assumptions yield important theoretical results, but this stochastic calculus is an extremely powerful modelling tool in its own right. And wait until I start raving on stochastic transitions 😉