Andrew Rae

Every election season, pollsters try to figure out the demographic makeup of the electorate in an election that hasn’t happened yet. And every election season, pollsters are greeted with charges that their estimates are wrong. Republicans criticize 2012 polls that assume that African-American turnout will remain at its 2008 level. Democrats criticize 2012 polls that assume African-American turnout will be lower than it was. And that’s just one demographic group.

It’s hard to predict voter turnout because people are reluctant to admit that they will not vote. How reluctant? One recent estimate suggests that as many as two-thirds of people who will end up not voting tell pollsters that they will.

In my work in economics, I use anonymous, aggregate data from millions of Google searches in hundreds of media markets in the United States to measure variables on sensitive topics — racism, drug dealing and child abuse, for example — where people tend to be less forthcoming in surveys (to put it mildly).

My research suggests that by comparing Google search rates for voting information so far this year with search rates on comparable dates from previous elections, we might already be able to get a pretty good idea of the composition of the 2012 electorate.

Despite the ubiquity of Google searching, and searchers’ demonstrated willingness to share their true feelings and unbridled thoughts on Google, what Americans are typing when they search remains surprisingly underutilized in political analysis. But Google can often offer insights unavailable elsewhere.

Some of what we learn is pretty silly. Every month, about 5,000 people ask Google about Mitt Romney’s underwear choice (devout Mormons wear temple garments). But some of what we learn is disturbing. On Election Day in 2008, roughly 1 in 100 searches that included “Obama” also included “KKK” or “nigger.”

Our thoughts are often superficial. “Paul Ryan shirtless” is currently Googled 9 times more often than “Paul Ryan budget.” Don’t ask me why, but “Paul Ryan shirtless” is Googled more frequently in blue states than in red. When we search for “Michelle Obama,” we include the word “ugly” three times as often as the word “beautiful.”

Politicians can map the geography of their popularity by looking at what they’re called on Google. “Obama” is Googled more frequently in blue states, but “Barack Hussein Obama” is Googled more often in red states — just as “Willard Mitt Romney” is in the blue states.

How frequently people in a state searched for “Obama jokes” almost perfectly predicted the vote share of Mr. Obama’s 2008 opponent, John McCain. “Romney jokes,” which typically focus on his wealth, are popular in Iowa and Ohio, two swing states in which Mr. Romney has struggled to connect with working-class voters. Never mind favorability; maybe what we need is a jokeability index.

Google search data also give some evidence suggesting that last-minute rumors had negative effects in the 2008 election. There were a number of states, like Oklahoma, Tennessee and Kentucky, in which Mr. Obama slightly underperformed in the final polls. Google search data offer one rather interesting correlation: these states had some of the largest search volumes for “Obama Muslim.” And those searches, while not uncommon throughout the summer and early fall, rose substantially in the final days of the campaign, after many of the final polls were conducted.

Comparing the timing of our Google searches to outside events is often intriguing. Searches for “McCain life expectancy” rose to unprecedented levels the day of his controversial choice of the Alaska governor Sarah Palin as his running mate. They rose again after Ms. Palin’s poorly received interview with Katie Couric.

Google data may also help us predict the composition of the 2012 electorate. Individuals may systematically deceive pollsters regarding their intentions, but actual voters are far more likely to Google phrases like “how to vote” or “where to vote” before an election.

By the middle of October, taking the frequency with which Google searches include “vote” or “voting” and comparing the number to those from the same days four years earlier strongly predicts where turnout will rise, stay the same or fall. If search rates for voting information were higher in the first half of October 2008 than in the first half of October 2004, voting rates tended to be higher in 2008 than in 2004. It’s true for midterm elections, too. If search rates for voting information were higher in the first half of October 2010 than in the first half of October 2006, voting rates tended to be higher in 2010 than in 2006.

Related Article How Racist Are We? Research that compares the use of racially charged search terms with voting patterns suggests that Barack Obama’s race lowers his chance of re-election. Read more»

This predictive power was significantly stronger than that of other variables we might use to predict area-level turnout, like changes in registration rates or movement in early voting.

By comparing changes in search volume to area-level demographics, we can use this information to make predictions about turnout rates among different demographic groups.

To see how this works, consider what Google search data would have shown us by this time in 2008. Search rates for voting information that month were slightly lower than they were in October 2004. However, states in which Google searches were higher than they were four years earlier were overwhelmingly the states with some of the highest African-American populations —North Carolina, Georgia and Mississippi. Within states, media markets with higher African-American populations — places like the Raleigh-Durham area in North Carolina, Augusta, Ga., and Jackson, Miss. — saw the biggest increases in voting-related searches from October 2004 to October 2008. The Jackson media market, with a 47 percent African-American population, saw a 56 percent increase.

An analysis of Google search data, in other words, would have made the unsurprising, and ultimately correct, prediction that black turnout was going to be substantially higher in 2008 than it was in 2004.

This methodology would have also correctly predicted a slight increase in Hispanic turnout. While the relationship was not nearly as strong as it was among African-Americans, parts of the country with greater Hispanic populations were Googling for voting information at elevated rates in 2008 compared with 2004.

Google search data, in October 2008, would not have predicted major changes in the age composition of the electorate. Before the election, some were claiming that Mr. Obama’s presence on the ticket would lead to a substantial increase in youth turnout. However, in October, the media markets with the greatest proportion of individuals between the ages of 18 and 34 — Gainesville, Fla., and Salt Lake City, for example — were not Googling for voting information at significantly elevated rates. Youth turnout did not rise as much as it was expected to in 2008.

So what does Google suggest about 2012 this far into October? There is little evidence for a 2012 electorate significantly more favorable to Democrats or Republicans than the one that obtained in 2008. The data suggest that turnout, as always, will be elevated in some parts of the United States and depressed in others. Interestingly, turnout might be expected to be higher in Ohio in 2012 than it was in 2004 or 2008.

Of course, there are still two weeks of searches to come. And the methodology I am using is new and subject to many caveats. But the differences we see so far do not appear to predict large changes in demographics that might substantially affect the outcome of the election.

Areas with the largest black populations are, on average, Googling for voting information at rates similar to those of 2008, rather than 2004, levels. By this metric, it does seem that pollsters should assume a black share of the electorate similar to that of 2008, when African-Americans made up an estimated 12 percent of the electorate, rather than 2004, when it was 11 percent — a good sign for Mr. Obama.

There is nice news for Mitt Romney in the Google data, too: voting searches are higher in Idaho Falls and Salt Lake City, the two media markets with the largest Mormon populations. While neither Idaho nor Utah is a swing state, increased Mormon turnout might help Mr. Romney somewhat in two important swing states: Nevada (7 percent Mormon) and Colorado (3 percent Mormon).

Mr. Romney’s supporters might also be pleased with the search rates for voting information in some areas with high evangelical populations. Google data predict that Lubbock, Tex., and Paducah, Ky., for example, may see increased turnout. This might alleviate the concern among Republicans that evangelical voters will turn out in lower numbers because of suspicions about Mr. Romney’s Mormon faith or his lack of commitment to conservative social causes.

The Google data offer little evidence for any big changes in the age composition of the electorate: there is also not a meaningful change in voting-information search rates in areas with a high proportion of individuals 18 to 34, who tend to support Mr. Obama, or individuals 65 and older, who generally lean toward Mr. Romney.

Search rates thus far in October 2012, compared to the same days in October 2008, are a bit lower in areas with larger Hispanic populations. While Mr. Obama is more popular among Hispanics than Mr. Romney, the size of the correlation does not, as of yet, present a huge concern for Mr. Obama. Monitoring the data over the next two weeks, not to mention looking at Spanish-language voting searches, might tell us if this should become a significantly larger concern for his campaign.

Mr. Obama’s opponents hope that the 2012 electorate will be less favorable to Democrats, more like the 2004 electorate. My early analysis of Google search data says: don’t count on it.