By Rob Knies, Managing Editor, Microsoft Research

It’s a presidential election year in the United States, and that, we’ve learned, means that pollsters are on the prowl. The electorate for the forthcoming balloting will be sampled, questioned, categorized, sliced, and diced a zillion different ways between now and Nov. 6, so if you’re interested in gender polling by age bracket in Wirt County, W.Va., for the time being, you’re in luck.

Spotlight: Academic programs Working with the academic community Read more about grants, fellowships, events and other ways to connect with Microsoft research. Read more

So is David Rothschild, an economist at Microsoft Research New York City. Trained during an academic career that culminated with a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania, Rothschild is also an avid follower of the political scene.

He has gained lots of renown this year for his work using prediction markets to harness big data in its many and varied forms to calculate and disseminate his prediction for who will be elected president. His research reflects Microsoft’s deep expertise in machine learning to recognize complex patterns, make intelligent, data-based decisions, and open new avenues of exploration previously unattainable. This provides the foundation for techniques that promise to unlock the power of social-media data and to transform political-forecasting models.

Pundits and talking heads flock to his posts on a pair of blogs, PredictWise and The Signal. He is breathing the rarefied air encountered only when an individual moves from being an interested observer of the political process to becoming an influential participant in that arena.

“David’s work offers a unique and innovative method for predicting election outcomes,” says Sunshine Hillygus, associate professor of political science at Duke University. “By aggregating and correcting data from state-level polls and election markets, he produces a forecast that is far more useful than the simple national polling estimates that dominate media coverage.”

David Pennock, assistant managing director of Microsoft Research New York City, couldn’t agree more.

“David is incredible,” Pennock says. “I’ve described him as a force of nature. The amount he can get done, his ideas and insights … you get excited just watching him. It’s groundbreaking research.”

Rothschild joined Microsoft in May as a founding member of the New York City lab, and he has spent his time since then building prediction and sentiment models and organizing novel, experimental polling and prediction games. Indeed, his research centers on prediction markets, and on PredictWise, you can find his analyses of projected box-office receipts for upcoming movies, the state of the U.S. economic recovery, or the winner of this year’s baseball World Series.

His passions, though, run deepest in the political sphere. It’s all a matter of navigating and analyzing massive amounts of data to uncover meaningful patterns or relationships that previously were hidden.

“All these projects stem from a general research idea of thinking about all the data we have,” he says. “It can be external, such as Facebook or Twitter: individual-level information people provide to the world. It can be internal: search or page views, things that people provide to Microsoft. And it can be things like polling and prediction markets, where people actively get more information to solve particular questions.

Meaningful, Aggregated

“How do you combine all that data and turn it from raw data into meaningful, aggregated outcomes?”

The answer, Rothschild asserts, is to take two initial steps. The first is to use all of this data to make it efficient to create predictions, sentiment indexes, and interest indexes that match the needs of stakeholders. The second is to enable people to absorb the information, using data visualizations or other techniques to make it impactful.

He already is putting this to work for Microsoft. Since August, he has been working with Xbox LIVE to provide polling guidance for that service’s Election 2012 hub, which enables members to interact in real time during the three presidential debates, the vice presidential debate, and in daily polling conducted with YouGov. The polling is providing a snapshot of how Xbox LIVE’s passionate, technically savvy user base is reacting to campaign developments.

“If we’re going to be polling people,” Rothschild says, “can we learn anything from how people understand things and how efficient the data is to create polls and prediction markets that are even more effective at gathering the right information?”

That’s the sort of reflection that is changing the game of understanding voters’ intent—an effort sorely in need of a fresh perspective. As Rothschild explains it, the science of political polling until recently was stuck in a rut three quarters of a century old.

“In the mid-’20s up until the early ’30s,” he says, “people got this idea to poll as many people as possible on who they would be voting for in the upcoming election. This, they thought, would provide some indication of what was going to happen.”

That technique worked for The Literary Digest—for a while. In four consecutive U.S. presidential elections, from 1920 to 1932, its straw poll correctly predicted the winner. In 1936, though, things changed dramatically. In one of the classic pratfalls in U.S. political history, the magazine published its poll indicating that Alf Landon, Republican governor of Kansas, would be a big winner. On Election Day, incumbent Franklin Delano Roosevelt carried 46 states, Landon two. Shortly thereafter, its credibility in tatters, The Literary Digest closed its doors for good.

During the same election cycle, an upstart named George Gallup was able to predict Roosevelt as the winner by an astute use of representative samples of each state. His creation, the Gallup Poll, remains influential to this day.

The Gold Standard

“That became the gold standard,” Rothschild says. “For the next 75 years, the idea of the most efficient thing to do was to take a sample that represented likely voters or registered voters and report the raw data.”

Daily polls, though, are notoriously noisy and random, though aggregation of numbers from recent polls increases the accuracy significantly. In addition, the wording used in the polls was faulty: Who would you vote for if the election were held today? The problem is that the election almost never is held “today.” Known factors such as the anti-incumbent bias, in which incumbents poll more poorly around Labor Day than they do on Election Day, skew the numbers.

“Even if you correct for that,” Rothschild notes, “you’re creating an expected vote share, but most Americans don’t really care about vote share. George W. Bush in 2000 had no less political capital to spend after his razor-thin election than Ronald Reagan after his ’84 landslide. What people actually care about is who’s going to win. That’s the only thing that matters, and that’s what we focus on when we gather data and create predictions.

“It’s amazing to me to think about how novel it is to think, ‘Let’s forecast the thing people actually care about.’ That’s one of the major things with which I’m trying to approach all these things, thinking about what are the most efficient things I can create, and what does the end user really want and need? Can we create that? How close can we get to that?”

Accuracy is paramount, of course, but so is timing. Presidential forecasts, typically, are evaluated the night before the election. But, as Rothschild says, such forecasts are “pretty darn worthless.” What really has value is a forecast two months before an election.

“It’s the same thing with marketing-type questions,” he explains. “It doesn’t do much good if I can tell you which jeans are going to be popular the day before you put the jeans onto the market. You’ve already produced those jeans. If I can forecast what type of jeans are going to be popular two months beforehand, then you can make the right investment strategy.

Information When We Want It

“I hope to get people to grade and judge people’s forecasts and people’s data streams when the people need it. We expect to have information all the time, when we want it. Five, 10 years from now, it’s going to seem anachronistic that we thought about economic indicators on a monthly basis. I’d be very surprised if very strong tracking on a minute-by-minute basis has not been developed.”

Pennock has seen that sort of nearly instantaneous data analysis play out in reality.

“The great thing is that David is providing predictions in real time,” he says. “You can see these reactions within minutes. After Rick Perry made his ‘oops’ mistake in that GOP debate, you could literally watch the predictions crash in almost real time.”

Beyond the interest generated by working in such a high-profile area as presidential-election predictions, what are the research benefits of such work?

“Number one is forecasting,” Rothschild says. “You have a goal of creating the most accurate forecast at any given moment, because that will help create a more efficient world. Economists generally want to make a more efficient universe, and accurate forecasts on a regular basis help to do that.

“The second goal is to understand the world. It’s a research goal that is both beneficial to greater research as well as beneficial to decision-makers. It’s understanding why things happen. Granular, correct, and efficient forecasts can help you understand the effect of a debate, the effect of a $10 million ad buy. You can see movement as things happen.”

To provide accurate forecasts and to gain a greater understanding of the world around us, Rothschild relies on data.

“You want to be able to aggregate as much information as possible and create a prediction about what’s going to happen,” he says. “With prediction markets, you can get a self-selected group of people who have a lot more information than those in traditional polling. These are people who know a lot about elections. I got into this by thinking about polls versus prediction markets: What are we learning from these different things?”

Flocking to Xbox LIVE

The result of that musing led him to create hybrid approaches. That’s what’s happening on Xbox LIVE. Users of the service are not a perfect representation of the U.S. populace, but by asking unique questions and using new ways of combining that information, new ingredients are being added to the prognosis stew. It’s certainly popular: As many as 10,000 people per day are participating in Xbox LIVE’s daily polls.

Back to that standard polling question: “If the election were held today …” It’s static, it’s easy, it’s computationally trivial. And then there’s the Rothschild approach.

“Asking people the probability something is going to happen—‘What do you think is going to happen?’—is a lot trickier, because we don’t have a track record of asking these questions, and they don’t implicitly translate into anything very clean.”

Even so, such probabilistic probing has one key advantage: It works.

“By asking somebody, ‘Who do you think is going to win the election?’ it touches on their intention, the intentions of their friends and family and those people they discuss elections with.

“We found a sampling of 345 times where potential voters were asked who they were going to vote for, who they thought was going to win, and when those questions got different results. When the results were different, more than half of the voters said they wanted candidate A to win, but that they expected candidate B to win. Seventy-five percent of the time, candidate B won.”

That’s not all.

“Asking a person’s expectations has a multiplicative effect,” Rothschild states. “It’s the equivalent of asking 10 random voters who they were going to vote for and reporting back a binary result of a poll of 10 people. “

That’s not all.

“We’re able to show that even with an incredibly biased group of people,” he adds, “if you ask them their expectations, you can turn that into a meaningful forecast that something’s going to happen.

Lopsided Expectations, Accurate Forecasts

“If you just take those people who claim that they’re going to vote for the Democratic candidate, or just those people who claim they’re going to vote for the Republican candidate, by seeing how lopsided their expectations are for their candidate to win, you can make a very strong expectation of whether a candidate is going to win.”

The polling work Rothschild has done with Xbox LIVE has helped refine such techniques.

“We have younger people and more males,” he explains. “One of the ways that we’re attacking that is by asking questions about people’s social network. There are ways in which we can take a biased sample and have them give us information about a less-biased sample of people they may know.”

Such experimentation, of course, must be conducted with stringent privacy restrictions to protect individual users. For Rothschild, though, the goal is not so much to predict a particular election, even one as momentous as that for a U.S. president, as to gain knowledge about making future models more robust.

“Nothing I do ever is calibrated with the 2012 election in mind,” he says. “No model I’ve ever created, no set of data I’ve ever considered, do I consider it for how this works for the 2012 election. I do it to determine how this works in a total, historical view and how it works in a universal view.”

He’s hoping he can extend his techniques to continuous forecasts for all 435 seats in the U.S. House of Representatives in 2014. He also wants to apply his exploration into the realm of economic indicators with the goal of providing accurate, meaningful predictions that shed more light on the underpinnings of the economy.

For such potentially revolutionary research to have its best chance at success requires understanding and commitment. Microsoft, Rothschild says, is delivering those in spades.

“Microsoft has made a very strong commitment to me and a few of us in New York,” he says. “Microsoft understands that it’s important for it to be seen as a leader in these fields. That allows us to produce better research.

“Microsoft has afforded us the ability to sit back and think about this in the long run: What are the implications of this information? How do we utilize it? How do we make it more efficient? What are the next steps? It’s very exciting.”