Introduction

So I’ve learned a little bit about Bayesian Hierarchical Modeling at FDA and decided to put down my thoughts and write about it more to reinforced what I’ve learned. I also want to try out some new javascript data visual libraries.

A great book I’ve found is “Introduction to Hierarchical Bayesian Modeling for Ecological Data” by Parent and Rivot1.

While at the FDA I code my own model without using any MCMC framework and it was very slow in R. I realize I need a MCMC framework under my toolbelt. After some research I decided on Stan using the rstan r package.

The Graph (not DAG; not a Bayesian network)

Graph represents the salmon migration and birth cycle. Each edge represent a year pass. The nodes are square because they’re given. The information is given from previous knowledge.

Variable Definition Wt Salmon Eggs 0+ Young-of-the-year (hatched) PSm Pre-smolts Sm1 Smolt after 1 year Sp1 Returns one year earlier than Sp2 Parr1 Smaller juveniles left behind by Sm1 Sm2 Smolt after 2 year Sp2 Returns one year after Sp1

The Models - Introducing Probability into the Graph

We’re going to take the graph that represent Salmon’s migration cycle and introduce uncertainty to it (model it via probability). By doing this we create a new graph that is complete different from the Salmon’s migration cycle graph. It is a graph base on probability view.

We go through each square node and one by one apply a model and probability to it.

Model is time base, t will represent a particular year.

Sp t = Sp1 t + Sp2 t = # of spawners at t-th year

= Sp1 + Sp2 = # of spawners at t-th year W t = # of eggs spawned by the adults returning in year t

= # of eggs spawned by the adults returning in year t 0+ t = Young-of-the-year at t-th year

= Young-of-the-year at t-th year PSm t = pre-smolts at t-th year

= pre-smolts at t-th year Sm1 t = 1+ smolts (1 year to smolt) at t-th year

= 1+ smolts (1 year to smolt) at t-th year P1 t = Parr1 = smaller juveniles left behind by Sm1 at t-th year

= Parr1 = smaller juveniles left behind by Sm1 at t-th year Sm2 t = 2+ smolts (2 years to smolt) at t-th year

Spawners -> Eggs

W t = Sp t ⋅ P_f ⋅ fec

W t = # of eggs spawned by the adults returning in year t

= # of eggs spawned by the adults returning in year t Sp t = # of spawners = Sp1 + Sp2

= # of spawners = Sp1 + Sp2 P_f = proportion of females

fec = mean of fecundity (fertility)

Eggs -> 0+ juveniles

This is Ricker Cruve model with parameters (α,β) which is a classic discrete population model.

0+ t+1 = α ⋅ W t ⋅ e-β ⋅ W t ⋅ eε t where ε t ~iid N(0, σ2)

0+ t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t

0+ juveniles -> Smolts

PSm t+2 ~ Binomial(0+ t+1 , γ 0+ ) = # of 0+ t+1 will survive and migrate to PSm t+2

Sm1 t+2 ~ Binomial(PSm t+2 , θ Sm1 ) = # of PSm t+2 will survive and migrate as 1+smolts (Sm1)

Sm2 t+3 ~ Binomial(Parr1 t+2 , γ parr1 ) = # of Parr1 t+2 will survive and migrate as 2+smolts (Sm2)

PSm t+2 = young-of-the-year 0+ t+1 will survive next spring year t+2

= young-of-the-year 0+ will survive next spring year t+2 γ 0+ = survival rate of 0+

= survival rate of 0+ θ Sm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)

= proportion of pre-smolts will migrate as 1+Smolts (survival rate) γ parr1 = survival rate of parr1

Smolts -> Returning Spawners

Sp1 t+3 ~ Binomial(Sm1 t+2 , γ Sm )

Sp2 t+4 ~ Binomial(Sm2 t+3 , γ Sm )

γ Sm = survival rate

Learning from observations

These two are observed and given:

C Sm1,t = observations = # of smolts caught downstream trap

π Sm = trap efficiency

Using these data points we can figure out the unknowns.

Our unknowns, the parameters, are: α, β, σ, γ 0+ , θ sm1 , γ Parr1 , γ Sm

# of smolts caught downstream trap can be model as a binomial distribution either the smolt is caught or not.

C Sm1,t ~ Binomial(Sm1 t , π Sm )

*Note (advance): observations assume Bayesian’s property of exchangability

The Models - Creating a proability graphical model

0+ t+1 = α ⋅ W t ⋅ e-β ⋅ W t ⋅ eε t where ε t ~iid N(0, σ2)

0+ t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t

PSm t+2 ~ Binomial(0+ t+1 , γ 0+ ) = # of 0+ t+1 will survive and migrate to PSm t+2

γ 0+ = survival rate of 0+

Sm1 t+2 ~ Binomial(PSm t+2 , θ Sm1 ) = # of PSm t+2 will survive and migrate as 1+smolts (Sm1)

PSm t+2 = young-of-the-year 0+ t+1 will survive next spring year t+2

= young-of-the-year 0+ will survive next spring year t+2 θ Sm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)

Sm2 t+3 ~ Binomial(Parr1 t+2 , γ parr1 ) = # of Parr1 t+2 will survive and migrate as 2+smolts (Sm2)

Sp1 t+3 ~ Binomial(Sm1 t+2 , γ Sm )

Sp2 t+4 ~ Binomial(Sm2 t+3 , γ Sm )

Now we put all the parts together into graph.

Okay, now that we got the probability graphical model down we can figure out the joint probability distribution.

P(J t ) = ?

Step 1. Looking at the graph, we’re going to start with all nodes with no parent: α, β, σ, W t , γ 0+ , θ Sm1 , γ Parr1 , and γ Sm .

P(J t ) = P[α] ⋅ P[β] ⋅ P[σ] ⋅ P[W t ] ⋅ P[γ 0+ ] ⋅ P[θ Sm1 ] ⋅ P[γ Parr1 ] ⋅ P[γ Sm ] …

Step 2. Now we’re going to look at the nodes with parents.

0+ t+1 is P[0+ t+1 | W t , α, β, σ]

is P[0+ | W , α, β, σ] PSm t+2 is P[PSm t+2 | 0+ t+2 , γ 0+ ]

is P[PSm | 0+ , γ ] Sm1 t+2 & Parr1 t+2 is P[Sm1 t+2 , Parr1 t+2 | PSm t+2 , θ Sm1 ]. Notice how complex this one is. It is because Sm1 and Parr1 both share the same parameters.

& Parr1 is P[Sm1 , Parr1 | PSm , θ ]. Notice how complex this one is. It is because Sm1 and Parr1 both share the same parameters. P[Sp1 t+3 | Sm1 t+2 , γ Sm ]

| Sm1 , γ ] P[Sm2 t+3 | Parr1 t+2 , γ Parr1 ]

| Parr1 , γ ] P[Sp2 t+4 | Sm2 t+3 , γ Sm ]

P(J t ) = P[α] ⋅ P[β] ⋅ P[σ] ⋅ P[W t ] ⋅ P[γ 0+ ] ⋅ P[θ Sm1 ] ⋅ P[γ Parr1 ] ⋅ P[γ Sm ] ⋅ P[0+ t+1 | W t , α, β, σ] ⋅ P[PSm t+2 | 0+ t+2 , γ 0+ ] ⋅ P[Sm1 t+2 , Parr1 t+2 | PSm t+2 , θ Sm1 ] ⋅ P[Sp1 t+3 | Sm1 t+2 , γ Sm ] ⋅ P[Sm2 t+3 | Parr1 t+2 , γ Parr1 ] ⋅ P[Sp2 t+4 | Sm2 t+3 , γ Sm ]

Okay so what? Where’s the Bayesian network?

Not yet. The book needs to introduce the concept of a simple model vs a hierarchical model and some terminology.

So far we haven’t introduce any observational variable (random variable) at all.

θ represents parameters

Z represents latent parameters

Y represents the output Random Variable. (little y represent the realization/sample of Y random variable).

Left is a simple model. The right graph is a hierarchical model.

Z represents latent variables (nuisance variables), basically variables we don’t really care, also they’re hidden we don’t observed it directly like Y. Y represents observations. Observations are random so Y is capitalized and smaller y is the realization of Y or a sample of Y. The node is pink because it is an observable.

P[θ, Z, Y] = P[θ] ⋅ P[Z | θ] ⋅ P[Y | θ, Z]

Well first what’s the experiment?

For this it’s salmon captures and they’re model via binomial distribution either you catch the fish or not.

Notice the C stands for catches.

C 0+, t+1 ~ Binomial(0+ t+1 , π 0+ )

~ Binomial(0+ , π ) C Sm1, t+2 ~ Binomial(Sm1 t+2 , π Sm )

~ Binomial(Sm1 , π ) C Sm2, t+3 ~ Binomial(Sm1 t+3 , π Sm )

~ Binomial(Sm1 , π ) C Sp1, t+3 ~ Binomial(Sp1 t+3 , π Sp )

~ Binomial(Sp1 , π ) C Sp2, t+4 ~ Binomial(Sp2 t+4 , π Sp )

Once again the squares represent known/given values (the π’s are given). The pink circle means observed values. Pink in general means they’re known either by given or by observations. The purple boxes represent grouping and group the nodes into their respective group.

Ok. Finally, we got a Bayesian network. Really, what now?

How does y (the sample or realization of Y) fits in this fancy graph?

What happen when the Observation is available?

Before that notice how we build the model and the direction. The direction is downward from the Salmon cycle toward the latent variable and then towards the obsevation.

Why did the book brought this up? It is because when you train the model using the data/observations that are available you go in the opposite direction.

You start at the Y (observation layer and Y is a random variable) and Y is now, Y = y, since little y is the realization of random variable Y. y is a sample of Y or the data (values not just some placeholder variable). And you go up to latent layer and then to the parameter later.

Let’s see it mathematically:

Here’s the joint probability:

P[Y, θ, Z]

Now here’s the joint probability with Y = y, when we have data to train the model and find the paramenter.

P[θ, Z | Y = y]

Given Y = y, the observations propagate upward from the observation to the latent layer to the parameter layer.

This is how you train the model after you are done creating the model.

You can see the Bayes Rule connection too right? We’re always dealing with Joint Probability and Conditional Probability.

Bayesian make it so that they’re conditionally independent. This is one of the property of Bayesian statistic.

This is now a posterior distribution. Posterior being after the data. Prior distribution is before the data.

P[θ, Z | Y = y] = posterior distribution

I’m going to repeat it again.

Posterior is after the data have been inputed.

Prior is before the data. It is your prior belief.

In Bayesian you need to supply a belief in form of a prior distribution. It’s weird but don’t worry if you don’t know anything then you can use a noninformative prior distribution.

The belief thing is also away to encode expert belief too.

So given what we have now, we just have to apply Bayes’ Rule to the conditional probability and you get your parameter values.

Bayes’ Rule

P[θ Z | Y = y] = P[θ, Z, Y = y] / P[Y = y]

Some stat here and you get.

P[θ Z | Y = y] ∝ P[θ] ⋅ P[Z | θ] ⋅ P[Y = y | θ, Z]

Conclusion

I highly recommend this book. Andrew Gelman’s DBA book is more PhD level and his approach is not graphical like this but more mathy. Being visual this book helps a lot into tying things together.

There was no observations/data and no code for this chapter. Ah dangit. Well until next time, stay tuned for the next episode of Bayesian man.

1Buy the book if you like what you see on the post. This is basically my notes on chapter 1 of the book. It’s an amazing book and I highly recommend it.

What did I learn about myself

I’m glad I’m reviewing this chapter of the book again. I have a confession to make, if I want to understand a material/subject I need to read 3 times and do projects on it and a review of what I’ve learned. I need tons of practice. I guess this is one of the reason why I started this blog.

This chapter ties in again DAG, Bayes’ Rule, and conditional probabilities. Good refresher and clear up things that I was wrong about. Especially the salmon breeding cycle, I didn’t think about the fact that it wasn’t a DAG. And that from that model we create a Bayesian Graph Model (DAG).

I think I’ll go through each chapter of this book as a refresher while playing with javascript graphical libraries and hopefully learn Stan. I need to make sure I didn’t miss out on anything from the first reading.

What Did I Get to Practice? (for me)

Bayesian Hierarchical Modeling using rstan. Tried out a javascript data visualisation library, cytoscape.js, for modeling graphs. Gets to refresh Bayesian Graphical Model (Bayesian Network).

Rough Roadmap for Bayesian HM

Finish off this book. Introduction to Hierarchical Bayesian Modeling for Ecological Data (Chapman & Hall/CRC Applied Environmental Statistics) Read this for Hamiltonian Markov Chain(Statistics in the social and behavioral sciences series) Gill, Jeff-Bayesian Methods A Social and Behavioral Sciences Approach-CRC Press (2014) Read https://arxiv.org/abs/1111.4246 an implementation of HMC Measure theory videos DBA 3 reread again learning Dirichlet Process

Etc..

The book link is Amazon affiliated. If you get it at CRC publishing you can get it 20 bucks cheaper if you use a discount code, just that it takes longer to ship. Also note I would recommend reading “Doing Bayesian Data Analysis” first before even trying to get into Hierarchical Modeling. Would like to thank this website for all the html mathematical notations.

The salmon sushi picture was taken from pixabay under creative common license.

