Previous post in the series (or click above on Class). REVIEW!

Download the code: mcmc.pred.R , mcmc.pred.examples.R . If you downloaded before, download again. This is version 0.3! Only the example code changed since last time.

For an explanation of the theory behind all this, which is markedly different than classical views, get this book: Uncertainty.

Let’s return to the despised CGPA data. I know, I know, but it’s a perfect way to demonstrate tobit regression, which is exactly like ordinary regression except the Y can be censored above or below.

We saw with ordinary regression that there was substantial probability leakage in many scenarios because CGPA can only live on 0 to 4. We had large probabilities for CGPAs both greater than 4 and less than 0. Well, why not disallow it, censoring at these levels?

Start by reading the data back in:



x

As with multinomial we have to track the model formula, a limitation of the MCMCpack .



# cgpa can only exist between 0 and 4; the ordinary regression

# above exhibits strong probability leakage

form = formula('cgpa~hgpa*sat')

fit = MCMCtobit(form, below=0,above=4,data=x, mcmc=30000)



You can see the places for censoring, and also that I upped the MCMC samples a bit. Just to show how it's done.

Okay, now to scenarios? why? Surely you reviewed! Because we want as always:

Pr (Y | new X, old X&Y, Model & Assumptions)

where here Y = "CGPA = c", X is the HGPA and SAT scores and the model, including its assumptions about priors and so forth, we already discussed.



p=MCMCtobit.pred(fit,form,x[6,],0,4)

hist(p,300)

quantile(p,c(0.05,.5,.95))



I'm not showing the histogram here. You do it.

If you saw an error about truncnorm you will need to install that package with its dependencies, like this:



install.packages('truncnorm', dependencies = TRUE)



Why x[6,] and not x[1,] like we always do? No reason whatsoever. I think I aimed for 1 and hit 6, and was too lazy to change it. Anyway, I got for the quantile function

5% 50% 95% 0.2269157 0.8828441 1.6033544



So there's a 90% chance a NEW person like x[6,] will have a CGPA between 0.23 and 1.6. Your results will differ slightly, as usual. If they differ a lot, increase that mcmc = 30000 in the model fit function.

The 6 person had a SAT of 756 and HGPA of 0.33. What if we wanted probabilities for a fixed scenario? As we've done before



w = x[6,]

w$sat = 1e6

p=MCMCtobit.pred(fit,form,w,0,4)

hist(p,300)

quantile(p,c(0.05,.5,.95))



Gives basically a 100% of CGPA = 4. Everybody who thinks this is the wrong answer, raise their your hands. One, two, ... Hmm. You all fail.

This is the right answer. Even before with probability leakage the model gave the right answer. Unless you have a typo in your code, models always give the right answers! That's because all models are conditional on only the information you provide, which includes information on the model structure. Here three is NO information about limits of SAT scores, only limits on CGPA. So we can specify whatever we like for SAT, even a million.

Outside of this model we know it's BS. But that makes our model assumptions wrong, not our model.

Now that that's understood finally and for all time, let's do our trick of looking at all the scenarios in the data.

# Has to be a loop! array, marray, larray, and etc. all convert # the data.frame to a matrix or a unusable list! # R has no utility to step through a data.frame while mainting class # all the old x q = matrix(0,nrow(x),3) for(i in 1:nrow(x)){ q[i,]=quantile(MCMCtobit.pred(fit,form,x[i,]),c(0.1,.5,.9)) }

We discussed before why the loop is necessary. You reviewed, from the beginning, so I know you already know. So there was no reason for me to mention it here.

I went with a 80% predictive interval and not 90% as above. Why not? 90% is harsh. Most people for many decisions will be happy with 80% (conditional) certainty. Feel free to change to---AS YOU MUST---for your decisions.

Look at the answers:

plot(x$sat,q[,2],ylab='cgpa',ylim=c(min(q),max(q))) for(i in 1:nrow(x)){ lines(c(x$sat[i],x$sat[i]),c(q[i,1],q[i,3]),col=3) } # or plot(x$hgpa,q[,2],ylab='cgpa',ylim=c(min(q),max(q))) for(i in 1:nrow(x)){ lines(c(x$hgpa[i],x$hgpa[i]),c(q[i,1],q[i,3]),col=3) }

Same as we did for ordinary regression---which you remember, since you reviewed. So I won't show those here. You do them.

How about scenarios for probabilities of CGPA greater than 3? Same as before:

# all the old x q = NA g = 3 for(i in 1:nrow(x)){ p = MCMCtobit.pred(fit,form,x[i,]) q[i]= sum(p>=g)/length(p) } # Then plot(x$sat,q,ylab='Pr(CGPA>g|old data,M)') # Or plot(x$hgpa,q,ylab='Pr(CGPA>g|old data,M)')

You can color the dots using hgpa , or put them into decision buckets, or whatever. You can redo the 3D plots, too.

We've reached the end of the pre-packaged MCMpack extensions! Next up, JAGS.

Share this: Facebook

Reddit

Twitter

Pinterest

Email

More

Tumblr

LinkedIn



WhatsApp

Print



