Bayesian Model

Since we want to solve this problem with Bayesian methods, we need to construct a model of the situation. The basic set-up is we have a series of observations: 3 tigers, 2 lions, and 1 bear, and from this data, we want to estimate the prevalence of each species at the wildlife preserve. That is, we are looking for the posterior probability of seeing each species given the data.

Before we begin we want to establish our assumptions:

Treat each observation of one species as an independent trial.

Our initial (prior) belief is each species is equally represented.

The overall system, where we have 3 discrete choices (species) each with an unknown probability and 6 total observations is a multinomial distribution. The multinomial distribution is the extension of the binomial distribution to the case where there are more than 2 outcomes. A simple application of a multinomial is 5 rolls of a dice each of which has 6 possible outcomes.

A probability mass function of a multinomial with 3 discrete outcomes is shown below:

Probability Mass Function (PMF) of a multinomial with 3 outcomes

A Multinomial distribution is characterized by k, the number of outcomes, n, the number of trials, and p, a vector of probabilities for each of the outcomes. For this problem, p is our ultimate objective: we want to figure out the probability of seeing each species from the observed data. In Bayesian statistics, the parameter vector for a multinomial is drawn from a Dirichlet Distribution, which forms the prior distribution for the parameter.

The Dirichlet Distribution, in turn, is characterized by, k, the number of outcomes, and alpha, a vector of positive real values called the concentration parameter. This is called a hyperparameter because it is a parameter of the prior. (This chain can keep going: if alpha comes from another distribution then this is a hyperprior which could have its own parameters called hyperyhyperparameters!). We’ll stop our model at this level by explicitly setting the values of alpha, which has one entry for each outcome.

Hyperparameters and Prior Beliefs

The best way to think of the Dirichlet parameter vector is as pseudocounts, observations of each outcome that occur before the actual data is collected. These pseudocounts capture our prior belief about the situation. For example, because we think the prevalence of each animal is the same before going to the preserve, we set all of the alpha values to be equal, say alpha = [1, 1, 1].

Conversely, if we expected to see more bears, we could use a hyperparameter vector like [1, 1, 2] (where the ordering is [lions, tigers, bears]. The exact value of the pseudocounts reflects the level of confidence we have in our prior beliefs. Larger pseudocounts will have a greater effect on the posterior estimate while smaller values will have a smaller effect and will let the data dominate the posterior. We’ll see this when we get into inference, but for now, remember that the hyperparameter vector is pseudocounts, which in turn, represent our prior belief.

A Dirichlet distribution with 3 outcomes is shown below with different values of the hyperparameter vector. Color indicates the concentration weighting.

Effect of the hyperparameter vector alpha on the Dirichlet Distribution (source).

There’s a lot more detail we don’t need to get into here, but if you’re still curious, see some of the sources listed below.