Co-Authored by Sam Bydlon, Salil Malkan, and Marty Santalucia

Since last year’s election in Pennsylvania when voters sent a Democratic governor and a historically large number of Republican legislators to Harrisburg there have been lawsuits, political posturing, conflicting press statements, and a budget impasse that has gone on for months past the state’s June 30th deadline. If the voters were telling leaders in Harrisburg to “get along”, we can pretty definitively say the message was not received. What has become clear is that navigating a bipartisan path through this environment requires a deeper understanding of what makes Harrisburg tick.

Using data pulled from the Sunlight Foundation’s Open States project and other sources, a Stanford-trained scientist, a software engineer, and a professional campaigner set out to break-down Pennsylvania’s general assembly to understand how a deal is made. Using a few Python applications we built, Excel, and Gephi (a freeware network graphing tool) we’ve been able to find some interesting features of voting habits and influence in the Pennsylvania General Assembly. This blog will be where we will explore and discuss what we’ve found.

This first post is largely an introduction to some of our core ideas, so to guide this discussion we’ll start by exploring the basic premise of how our methods might help Governor Tom Wolf and his team increase the efficiency of their negotiations with the state legislature.

There is no way to get around the fact that relationships rule both our government and our politics. This is why understanding, and quantifying, the relationships of the General Assembly was a critical first step on our journey to decipher how deals are made in the State Senate and State House. With this in mind, and without setting the bar too high, the first thing we set out to do was to find a way to “calculate” partisanship. We wanted our algorithms to separate the Democrats and Republicans without us ever explicitly giving the program that information.

We figured that successfully analyzing a legislative body’s behavior would require finding and acting on the subtle relationships between legislators. If we were unable to tease out the most significant differences, it didn’t hold much promise for finding relationships that perhaps weren’t as obvious. We looked at data from both the Senate and the House and the results were exactly what we had hoped for.

Relationship Network of the Pennsylvania Senate

Relationship Network of the Pennsylvania House

These images deserve some explanation. Our code takes the data we put in and spits out a list of Senators and Representatives who it has determined have relationships with each other as well as a measure of how strong that relationship is. To draw these relationships, and the network they create, we simply draw a line between any two members who have a relationship with each other. Strong relationships are represented by thick blue lines while weaker relationships are represented by thin red lines. Finally, we adjusted the “shape” of the network so that members with stronger connections to each other are also closer to each other on the graph. The easiest comparison is a social network where a relationship between two members would be same as those two members “friending” each other on Facebook. Stronger relationships simply indicate that those two members interact with each other more, while weaker relationships indicate less interaction.

What we ended up with were two networks that were each split into two “communities” which we could easily identify as Democrats and Republicans.

The resulting network went beyond partisanship though and started hinting at those deeper relationships we were looking for. In the Senate, between Senator Tomlinson and Senator Wozniak, was a single thin line. Then in the House, a burst of connections from Representative DiGirolamo into the Democratic Caucus as well as a single connection between Representative White and Representative Sainato.

Those with experience in Pennsylvania politics will recognize the significance of these cross-party relationships. It makes sense that Seantor Wozniak, a conservative Democrat, would get along with Senator Tomlinson, one of the most liberal Republicans. Representative DiGirolamo, a Republican from the South Eastern part of the state, has strong ties with the local organized labor community. As a result of these connections, he is a fairly reliable vote for labor issues when they come up and this means that he finds himself working with the Democratic Caucus more than the average Republican. Representative White, a Republican from Philadelphia, is in a similar situation. Until White was elected this past summer her district had been represented by a Democrat and in 2014 her constituents favored Governor Wolf (D) over Governor Corbett (R) 66% to 34%. It isn’t surprising that during her brief time in office she has already developed a bipartisan streak while she balances a conservative Republican caucus in Harrisburg and a more liberal electorate at home.

Digging deeper, we reasoned that if we could identify distinctions between parties as well as subtle bipartisanship relationships, we may be able to also find communities within the parties themselves. We split our data by party and took a look at each group by itself. The House has 203 members (versus the Senate’s 50). This means that each member can have more connections so the network can be much larger. The size difference gives us much more material to work with and, frankly, will make a better blog post so we’ll be focusing on the House during this discussion and work with the Senate in later posts.

Pennsylvania House Democrats

Pennsylvania House Republicans

*** In the Democratic network, Representatives Bishop, DeLuca, and Thomas are not included because they did not have any relationships strong enough to connect them to the network. In the Republican network, Representatives Gingrich, Maher, Mcginnis, and O’Neill are not included because they did not have any relationships strong enough to connect them to the network. ***

Looking at members in a single party gives us a network that is more visually compact because a member will have much more in common with members of their own party than with members of the other party. This gives us a more “complete” network, and while these networks don’t have the striking gap between groups that we saw when we looked at the full House, there are a number of interesting features that hint at what we might see if we look closer. Experienced politicos may have already spotted a few surprising results, and I promise that we’ll touch on those shortly.

Returning to our premise of helping Governor Wolf we need to find a way to get more information out of these networks by identifying which members might be more connected in the network. Just from looking at the graph it is evident that every member is not connected to every other member. This means that even without a large division between communities, we might still be able to find more subtle groups now that the data isn’t overshadowed with a distinct partisan divide.

Pennsylvania House Democrats by Community

Pennsylvania House Republicans by Community

Running an analysis on each network produced these graphs that split each caucus into three communities. The results beg the question of whether these communities mean anything, or if they are simply the result of random connections.

To address that question, we turned to Boris Shor of Georgetown University and Nolan McCarty of Princeton University who have done a lot of work on quantifying the ideology of legislators. You can check out their papers and datasets here, if you’re curious. Their state legislative dataset was last updated in June 2015 with members who were either in office or elected in 2014 so it is missing members elected in 2015, but it is the best state legislative data available and is complete enough to illustrate our point.

Shor and McCarty assign each legislator a number called a NPAT Common Space Score with lower scores being more progressive and higher scores being more conservative. We averaged the scores of each group to validate our model and confirm that the communities we identified were comprised of ideologically similar members; liberal members are grouped with other liberals, moderates get grouped with moderates, etc.

Communities by Average NPAT Score

Significance Between Communities

After looking more closely at the results by doing some additional statistical analysis we were able to eliminate the “blue” Republicans as a significantly unique group. So while the Democrats have three communities, the Republicans really only have two.

Members of these communities tend to act together, or at the very least, tend to act in a similar way. If the Governor were to approach a member of the “red” community and secure their vote, he may have also increased his odds with other members of that same community.

This method isn’t a silver bullet though because what one community wants could contradict the demands of another. Additionally the influence of a member is diluted as you get farther away from them in the network; as an idea travels through the network from member to member everyone adds their own twist to what they pass on to others. Fortunately for the Governor, a member who is grouped with one community will probably still have many connections to the other communities. So rather than randomly picking someone to talk to, the Governor is best served by targeting someone who is as closely connected with as many other members as possible. In network analysis this characteristic is called “betweenness centrality” and each member can be assigned a number that represents how central they are in the network.

Pennsylvania House Democrats , Centrality

Pennsylvania House Republicans, Centrality

Larger circles represent more central members.

The larger circles are our targets and they are unlikely suspects, for sure. Some of these members are freshmen and they are all rank-and-file so these are not the typical people meeting with the Governor to negotiate deals. What is very important to understand is that while these people aren’t the power-brokers, their ideas are powerful and we argue that they are worth listening to. What we’ve done here is go past the traditional legislative power structures to build a network based on ideology and influence, then demonstrated that these members sit in uniquely central positions. Therefore, influencing their vote may have an outsized impact on many other members around them.

The Governor’s team already goes into negotiations having considered what leadership can sell to their party. Our methodology takes this process a step further by identifying which members might be more important to the success of a final deal. Negotiators can use this information to be more efficient since quantifying the mechanics of the the legislature could lower the amount of guesswork and human error that often plagues our system and stagnates discussions.

While this is an exciting first step in opening up the behavior of the state legislature, every analysis is going to be limited or driven by the quality and type of data used. To write this post we didn’t filter our data by topic so we could focus on a very general scenario. In future posts it would be interesting to see how the communities come together on taxes versus environmental issues, and how those new communities change the suggested targets.