To get a bit more concrete, I will add the Python code I wrote to execute the steps described above.

The definition of a POMDP. For simplicity, inputs and outputs are supposed to be natural numbers. The transition matrices corresponding to each of the input characters are stored in alist (where alist[i] is the transition matrix that corresponds to input symbol i ). As before, the matrix that maps state- to observation probabilities is given by c . The initial state distribution is stored in init .

class MarkovModel (object): def __init__ ( self ,alist,c,init): self .alist = alist self .c = c self .ns = c.shape[1] self .os = c.shape[0] self .inps = len (alist) self .init = init

In the above sections, the procedure was stated in a recursive form. It is, however, not advisable to actually implement the algorithm as a recursion as this will lead to a bad performance. A better approach is to use a dynamic programming approach: The and values are stored in a matrix where

each column corresponds to one point in time

each row corresponds to one state

This is where actually most of the magic happens.

This function will generate

a tableau for

a tableau for and

and a tableau for

To make the computation of and more efficient, I also calculate the common factor as derived above:

def make_tableaus (xs,ys,m): alpha = np.zeros(( len (ys),m.ns)) beta = np.zeros(( len (ys),m.ns)) gamma = np.zeros(( len (ys),m.ns)) N = np.zeros(( len (ys),1)) # Initialize: gamma[0:1,:] = m.init.T*m.c[ys[0]:ys[0]+1,:] N[0,0] = 1 / np.sum(gamma[0:1,:]) alpha[0:1,:] = N[0,0]*gamma[0:1,:] beta[ len (ys)-1: len (ys),:] = np.ones((1,m.ns)) for i in range (1, len (ys)): gamma[i:i+1,:] = m.c[ys[i]:ys[i]+1,:] * np.sum((m.alist[xs[i-1]].T*alpha[i-1:i,:].T),axis=0) N[i,0] = 1 / np.sum(gamma[i:i+1,:]) alpha[i:i+1,:] = N[i,0]*gamma[i:i+1,:] for i in range (len(ys)-1,0,-1): beta[i-1:i] = N[i]*np.sum(m.alist[xs[i-1]]*(m.c[ys[i]:ys[i]+1,:]*beta[i:i+1,:]).T,axis=0) return alpha,beta,N

These tableaus can be used to solve many inference problems in POMDPs. For instance, given a sequence of inputs and a sequence of observations that correspond to , estimate the probability of being in a certain state during each state. Put differently: The function state_estimates will calculate the posterior distribution over all latent variables.

I included an optional tableaus parameter. This ensures that, if we want to solve several inference problems, we only need to calculate the tableaus once.

def state_estimates (xs,ys,m,tableaus= None ): if tableaus is None : tableaus = make_tableaus(xs,ys,m) alpha,beta,N = tableaus return alpha*beta

With the formulas that we derived above and using the tableaus, this becomes very simple. Note that the standard meaning of the *-operator in numpy is not matrix multiplication but element-wise multiplication.

A bit more sophisticated is the following inference problem: Given a sequence of inputs and a sequence of observations that correspond to , estimate the probability of transferring between two states at each time step.

def transition_estimates (xs,ys,m,tableaus= None ): if tableaus is None : tableaus = make_tableaus(xs,ys,m) alpha,beta,N = tableaus result = np.zeros((m.ns,m.ns, len (ys))) for t in range (len(ys)-1): a = m.alist[xs[t]] result[:,:,t] = a*alpha[t:t+1,:]*m.c[ys[t+1]:ys[t+1]+1,:].T*beta[t+1:t+2,:].T*N[t+1,0] a=m.alist[xs[ len (ys)-1]] result[:,:, len (ys)-1] = a*alpha[-1:,:] return result

The next function again takes an input sequence and an output sequence and for each time step computes the posterior probability of being in a state and observing a certain output.

Note that as the sequence of outputs is already given, this function does not add much: If in time step an output of 1 was observed, obviously the probability of observing 2 in time step will be zero.

def stateoutput_estimates (xs,ys,m,sestimate= None ): if sestimate is None : sestimate = state_estimates(xs,ys,m) result = np.zeros((m.os,m.ns, len (ys))) for t in range (len(ys)): result[ys[t]:ys[t]+1,:,t] = sestimate[t:t+1,:] return result

Having defined these functions, we can implement the Baum-Welch style EM update procedure for POMDPs. It simply calculates

The posterior state estimates for each

The posterior transition estimates for each

The posterior joint state/output estimates for each

This method can be repeated until the model converges (for some definition of convergence). The return value of this function is a new list of transition probabilities and a new matrix of output probabilities. These two define an updated POMDP model which should explain the data better than the old model.

def improve_params (xs,ys,m,tableaus= None ): if tableaus is None : tableaus = make_tableaus(xs,ys,m) estimates = state_estimates(xs,ys,m,tableaus=tableaus) trans_estimates = transition_estimates(xs,ys,m,tableaus=tableaus) sout_estimates = stateoutput_estimates(xs,ys,m,sestimate=estimates) # Calculate the numbers of each input in the input sequence. nlist = [0]*m.inps for x in xs: nlist[x] += 1 sstates = [np.zeros((m.ns,1)) for i in range (m.inps)] for t in xrange (len(ys)): sstates[xs[t]] += estimates[t:t+1,:].T/nlist[xs[t]] # Estimator for transition probabilities alist = [np.zeros_like(a) for a in m.alist] for t in xrange (len(ys)): alist[xs[t]] += trans_estimates[:,:,t]/nlist[xs[t]] for i in xrange (m.inps): alist[i] = alist[i]/(sstates[i].T) np.putmask(alist[i],(np.tile(sstates[i].T==0,(m.ns,1))),m.alist[i]) c = np.zeros_like(m.c) for t in xrange (len(ys)): x = xs[t] c += sout_estimates[:,:,t]/(nlist[x]*m.inps*sstates[x].T) # Set the output probabilities to the original model if # we have no state observation at all. sstatem = np.hstack(sstates).T mask = np.any(sstatem == 0,axis=0) np.putmask(c,(np.tile(mask,(m.os,1))),m.c) return alist,c

How to test that? Well, we’ve seen how to calculate the log-likelihood of the data under a model: