By Jeff Ernsthausen

When we first began the long work of assembling the data and writing the code that would become the Georgia Legislative Navigator, we quickly realized that we had stumbled upon one of those unique opportunities to bring the quantitative methods of social science into the newsroom—much as we had done with the Atlanta Public Schools cheating scandal. Putting together a working application to offer the public insight into the legislative process meant assembling an enormously detailed data set on legislators and the bills they introduced stretching back to 2001. With over 14,000 bills labeled with votes, sponsorship information and text summaries, we had many of the ingredients that we would need to build a working statistical model of the General Assembly over the years. We decided to give it a shot.

How we approached the problem

We began our modeling work by consulting with experts. Our first call was to Professor Charles Bullock, chair of the Political Science Department at the University of Georgia and a close observer of the Georgia legislature. We asked him what type of model he might use on a problem like this and what factors he might include. He recommended a technique known as logistic regression, a model originally developed by biostatisticians for predicting the chances that a person would develop a disease. This type of model is highly useful when, as is our case, the outcome is either positive or negative – you either have a disease or you don’t; a bill either passes or it doesn’t.

So as we explored and amassed our data set, we also read up on logistic regression. Our Bible became a book written by Professor Frank Harrell, chair of the Department of Biostatistics at Vanderbilt, who also generously provided advice to us and reviewed our work as we went along (and also wrote the R software that we used to run the model).

What factors went into it

We based our model on a number of obvious things, and then began looking around for things that were less so. In the first category, we’d count things like which party the sponsor belonged to, how many co-sponsors the bill attracted from each party, and whether a leadership figure sponsored or co-sponsored the bill.

In the not-so-obvious category, I’d count things like how many days before the end of the session the bill was submitted, along with a range of information gleaned from the summary of the bill. For instance, we discovered that a disproportionate number of bills with summary text indicating that they were aimed at amending acts tended to pass more often. Thus, we included it as a variable.

We also included a variable for whether the summary included certain “social issue” terms that refer to abortion, gun control, prayer in school, alcohol or controlled substances.

Then, because so much legislation that moves through the legislature is local and typically passes at a much higher rate, we included variables for local terms like “city of” and “county of” that would serve as a proxy for local legislation. In the end, we focused our effort on making sure the model did well on both local and non-local legislation.

What factors did not go into it

Right now, the most important factor that we are not considering is a bill’s current status. Our model doesn’t update as a bill clears various legislative hurdles, though we hope to be able to incorporate that information in the future. The percentage chance that you see only takes into account the sponsors, text and timing of a bill, not whether it has, for example, already gotten out of committee.

So does it actually work?

The result is a model that actually does a reasonably good job of predicting the bulk of bills that pass through the legislature each year. The graph below, known as a calibration curve, shows how bills from the 2011-12 session fared compared with how a model we created based on data from the 2007-08 and 2009-10 sessions predicted that they would do. It essentially clumps together bills to which the model has given a similar chance of passing and then calculates the percentage of those bills that actually passed (represented by the triangles in our chart).

It would be like taking 100 days at random that a weather forecaster gave about a 20 percent chance of rain to, and then seeing how many of those days it actually rained. If it rained on around 20 of those days, you’d say that the forecaster did a good job, at least when it comes to predicting a 20 percent chance of rain. A calibration curve essentially does the same calculation with bills given odds from near a zero percent chance of passing to nearly a 100 percent chance of passing, and creates a line comparing the two.

With a truly accurate model, the line labeled “non-parametric” would follow almost perfectly along the straight dotted line labeled “ideal”— meaning that bills with a 1 percent chance would pass 1 percent of the time; bills with a 2 percent chance 2 percent of the time, and so on.

Our model tracks that ideal line pretty well, except for bills given odds in the range of about 50-60 percent. In that range, our model overestimates the chances of passage by several percentage points. It is important to note, though, that our model does not place as many bills in that range (which you can measure by the height of the black bars along the bottom of the graph), and it does much better where the bulk of bills actually fall. (In case you are curious, the bills in the group above 60-80 percent are largely local bills, affecting a specific county or city, which by convention generally are not challenged if the delegation of legislators from that region supports the measure).

Another way to evaluate our model would be to look at the indexes provided in the top left of the graph. Let’s take the C (ROC) statistic, on which our model scored 0.836. A model with no discriminatory power whatsoever would have a score of 0.5, while one that could pick perfectly—giving a 0 percent chance to bills that actually failed, and 100 percent chance to those that actually passed — would have a score of 1. A commonly used rule of thumb is that when you get above a 0.8 or so — and you have a decent calibration curve — you start to have a model with useful predictive power.

My purely gut feeling is that our model for this session will not do quite as well as the model we made for the last session. We tested the same approach — using the two previous sessions to predict the “current” one — on every session for which we had data, and the approach produced the best index scores on the 2011-12 session (interestingly, a session in which a lot of new blood was supposed to shake up the way things worked in the Capitol).

We are reasonably confident, however, that we will not get results in the next month or two that differ drastically from what we observed for previous sessions.

What’s next?

The current version of Predict-a-bill is a prototype, a beta version, if you will. We are eager to get feedback from experts and the public on it, including factors that we might add to the model or other ideas for how we can make it more useful. We’re also making our data available to the public—including researchers who might be able to improve upon it. In the next version, we hope to take our approach a step further, being able to offer an indication of how a bill’s chances of passage change as the bill clears various legislative hurdles. We may even try to create one that hazards a guess at how individual legislators will vote, or how many votes a bill will ultimately get.

If you have ideas for improving our model or have ideas on how we can make it more interesting, please contact Jeff Ernsthausen.