The Data Demystifier Nate Silver is halfway through his cheeseburger and a few sips into his third Coca-Cola when he pauses to say, “I think . . .” before trailing off, midsentence, to reach for a french fry. Silver tends to follow this pattern throughout a meal: Talk, stop, chew. Talk, stop, chew. Time ticks by. As he waits to finish a thought, you start paying attention to the ambient music drifting in the air by the bar. The restaurant’s ventilation system has just switched on, you notice, then off.

advertisement

advertisement

At some point during his upward trajectory, Silver’s fans began thinking of him less as an analyst and more as an oracle. The question on the table is whether big data–that is, the accumulation and manipulation of massive quantities of information–will change our world, or whether it’s just another overhyped technology with a too-good-to-be-true story line. Silver is arguably peerless at interpreting data in the domains of sports and politics. Early in his career, he created an analytical model for baseball stats known as PECOTA, which did an exceedingly good job of identifying the minor-league prospects most likely to perform well in the majors. More recently, his FiveThirtyEight.com blog famously parsed polling and economic data to predict the results of the 2008 presidential election (calling 49 of 50 states correctly) and the 2012 election (going 50 for 50). He has since dabbled in predictions of Oscar winners, NCAA basketball champions, and the geographic distribution of support for gay marriage. For at least the past five years, his methods and models have been questioned, doubted, and ridiculed. But in response he has shown every doubter, with almost unerring consistency, that a skinny supernerd with a big data set and a killer algorithm can be a swashbuckler, too. Silver’s Wikipedia entry now clocks in at more than 6,100 words–about double the length of the entry on Bob Woodward, the legendary Watergate reporter. No journalist has ever become so famous so fast, let alone so fluent in statistics. Nate Silver’s Greatest Hits The data guru isn’t perfect–he once said Texas Governor Rick Perry had a good chance at the Republican Presidential nomination. But check out just how often Silver has hit the bull’s eye. You would therefore think he would be rapturous about the extraordinary possibilities of data science and a new era–“a singularity,” as he puts it dryly–that would merge turbocharged computing power and analytics. He is not. When he finishes the french fry, Silver says again: “I think . . .” And then he pauses. He seems to be crunching the data about big data himself. “Obviously,” he remarks at last, “I think this is an important technology.” Computers do keep getting more powerful, he notes, and he recounts the claim by Eric Schmidt, the Google chairman, that on a single day, modern society generates more information than all of civilization had created before 2003. But Silver adds a crucial caveat: The flood of data means more noise (i.e., useless information) but not necessarily more signal (i.e., truth). In his airplane reading, Silver says he has noticed of late an incredible proliferation of magazine ads that suggest the awesome and incontrovertible prognosticating power of big data. The gist of the ads is that the data will help businesses know what consumers will do before they actually do it. “But I don’t see it to be as much of a paradigm shift as some people think,” Silver says. “People sometimes get the idea that you put all this data into a machine, and you press a button and out come miraculous ideas that help your business make a quick 10% profit margin every year and your share price will double.” He’s not a pessimist, he adds. “I’m a guarded optimist. But I do like to weigh against naive optimism. And to be honest, there’s a lot of ways to take a lot of data, mangle what you’re doing with it, not ask good questions, and get yourself in trouble.” Indeed, if the worshipers at the altar of Nate Silver forget one thing, it’s his belief that data and predictions can only be as good as we are. And while Silver happens to be very, very good at what he does, he likewise stands out as a believer in what he calls “the limiting ability” of humans. We are a fallible, biased, and slipshod bunch (especially in the political realm). We live in a complex world that barely makes sense. Often, we expect too much of computers and not enough of ourselves. “People blame the data,” he tells me, “when they should be asking better questions.” So is big data going to change the world or not? “The revolutions we recognize in retrospect,” Silver says, “aren’t usually the ones we recognize in advance.” He’s right, as usual, but he’s not precisely addressing the question. Then he laughs, picks up his cheeseburger, takes a big bite, and begins to chew. It may seem odd to say we have arrived at a moment when data and creativity are bound together in the same vocation, not to mention the same person. Silver doesn’t have much of a problem with the idea, as incongruous as it might sound. “I think there are two types of creativity,” he says. The first is what he calls “pure expression”–a phrase to describe the work of musicians, poets, actors, dancers, and the like. “The other kind,” he says, “is finding different ways to approach and solve a problem. I’m not sure of the first kind, but I think I have a lot of the problem-solving type of creativity.” Math, as he once put it, “is a different language you can use to think through problems.” And the fact that he works in math (or to be more precise, in math, prose, and infographics, which comprise the building blocks of his blog and recent book, The Signal and the Noise) only means that he has found a fittingly personal and wholly contemporary mode for creative solutions in an era of information overload.

advertisement

Silver happens to be amused by the marketing of so many best-selling books–Freakonomics or The Black Swan, for example–that promise the revelation of “hidden” insights. ‘It’s like they’ll be revealing a mysterious truth,’ he says sarcastically. Silver is by no means the first to mine interesting conclusions from big data sets. Nor is he the first to become known for using statistical models as an innovative tool. Depending on how you define it, big data has been around for a while. It was a crucial element in tracking patterns in early epidemics (such as the black plague in London in the 1600s) as well as trends in the U.S. census (beginning in the late 1800s). You might consider the D-Day invasion at Normandy, or even the Apollo lunar missions, as strategic triumphs of complex problem solving and big-data analytics. In the early 1970s, a group of academics published a book called The Limits to Growth that used a fairly sophisticated statistical model to test the sustainability of Earth. (The planet and society are likely doomed, the program concluded.) In his book and in conversation, Silver is quick to point out that the most familiar, and arguably most successful, applications of big data involve National Weather Service predictions and hurricane warnings, which rely on huge data sets and wizardly models and have become increasingly accurate and precise. But other familiar examples abound too. The quants on Wall Street have been helping hedge funds interpret complex trading data for years. Watson, the IBM computer that won at Jeopardy! and is now being applied to medical treatment and financial planning, is a success with a certain kind of big data–“unstructured data,” as IBM likes to call it, which describes information formatted as natural language rather than numerical figures. Palantir, a willfully obscure company that crunches big data in the name of national security, is another. Above all are Amazon, Facebook, Google, and Twitter, which stand as the foremost practitioners at making informed conclusions from customer data. By vacuuming up the exhaust from web users, such companies have made extraordinary gains in efficiency, trend-spotting, sales, and–at least in Google’s case–research that sometimes translates into societal rather than corporate advantages. “Google is doing a better job predicting the flu than the CDC,” observes D.J. Patil, the former chief data scientist at LinkedIn, who now works at venture capital firm Greylock Partners. Silver is a freelance data scientist.

He just happened to pick a hot topic. Silver is taking on such challenges as a solo practitioner, though he places his work more in the realm of “medium data,” involving, say, hundreds of thousands of data points rather than the millions or billions mined by researchers at Google or Amazon. But the size of the information pile matters less than the measure of clarity it can yield. As a kid in East Lansing, Michigan, Silver grew up a sports fanatic but wasn’t much of an athlete. “I played soccer up through eighth grade,” he tells me. “It was my least worst sport.” After earning a bachelor’s degree in economics at the University of Chicago, he took a job with a consulting company that left him frustrated and unfulfilled. So he began to work on his PECOTA statistical system in the evenings. The choice of baseball, a sport unusually rich in statistics and measurement, was fortuitous (this abundance is why the sport also lent itself, more famously, to Billy Beane’s predictive calculations chronicled in Michael Lewis’s book Moneyball). After gaining a reputation for expertly dicing baseball stats, Silver wondered whether he could do a better job of predicting political elections than the Beltway pundits. In 2007, he started sifting through poll data and posted his analyses anonymously, at first on the Daily Kos blog under the name Poblano. (A fan of Mexican food, he once created a website to rate Chicago’s burritos.) Eventually Silver revealed himself as the author, set up the independent FiveThirtyEight blog (named after the number of voters in the electoral college), and became a minor celebrity outside the insular world of baseball statistics. A few years later, the editor of The New York Times Magazine ran into Silver on a train platform in Boston and invited him to bring his now high-traffic blog to the Times, which is where he remains, for the moment. At some point during his upward trajectory, Silver’s fans began thinking of him less as an analyst and more as an oracle. One of the curious things about his success, though, is that he has never really fit–or tried to fit–into the contours of 21st-century punditry. It isn’t just that as a low-key Midwesterner he seems to lack the temperament for bombast. And it isn’t just that he prefers to discuss the future in terms of probabilities, whereby he hastens to acknowledge the significant uncertainty in any of his predictions. More crucial is the fact that little of what he says is counterintuitive. Silver happens to be amused by the marketing of so many best-selling books–Freakonomics or The Black Swan, for example–that promise the revelation of “hidden” insights. “It’s like they’ll be revealing a mysterious truth,” he says sarcastically. “I’m more one of those guys who says we can oversimplify things, and the devil is in the details. But at the same time, I think the big mistakes people are making are often quite obvious.” He admits he likes to look “for elephants in the room that people often disregard.” Sometimes these animals can be found in the predictor’s bias–for instance, local weather forecasters are, in his analysis, intent on pushing a good regional story, and therefore consistently overpredict the likelihood of rain. (In the mornings, you’re better off relying on National Weather Service data.) But other kinds of obvious mistakes are out there, too. In Silver’s view, when the credit agencies undervalued the probability of failure in the U.S. housing markets in the mid-2000s, they based their assumptions on reams of data–but data they had culled mainly from housing stats during the boom years. It was a recipe for egregiously wrong predictions. “They had lots and lots of observations,” Silver says, “but not a lot of variance to tell you how this housing system will function under different conditions.” Evaluating the ingenuity of Silver’s mathematical models can be difficult. But it’s easy to admire his accuracy and the savvy with which he chooses the problems he wants to solve. “I’ve tried to pick fields where the competition was not that good,” he says, laughing. He’s actually serious. When he decided to try politics, he felt that a few good competitors did exist–Simon Jackman at Stanford, for instance, and Drew Linzer at Emory. But having grown up during the “moneyball” wars in baseball–when statistical geeks argued for rationality against major-league scouts who argued for the superiority of experience and intuition–Silver was largely surprised by the statistical ignorance, and outright clubbiness, of most political experts. He seems to distrust and dislike cliques. The publication Politico, a frequent target of his, seems to strike him as especially fatuous, gossipy, groupthinky, and sometimes just wrong. Silver tells me, “Political journalism had been a lazy industry for a long time, and laziness allows people who have an innovative idea to achieve success more easily.” When I ask whether he’s interested in predicting the ups and downs of the stock market, Silver sounds less enthusiastic. It’s already a crowded field; stock picking is not really his game. “Ninety-five percent of the time I buy index funds,” he admits. But recently he made an exception when he smelled a whiff of bias and irrationality in the markets. “I bought some Facebook stock a little while ago because I had friends on Wall Street who were trashing it. I felt that was more about their being angry over the aesthetics of the IPO, and angry at themselves for having overrated the stock previously.” The story has a point: It suggests that regardless of domain, Silver enjoys capitalizing anytime he thinks someone is making a decision for the wrong reasons. The story has a moral, too: He bought Facebook stock near its absolute low. And the share price has since done very well.

advertisement

It isn’t lost on Silver that he arrives at his moment of fame just as his field is debating whether the newest statistical tools are truly transformative or whether the expectations for big data, already quite high, might always outpace the reality. The fact that large data sets have existed in one form or another for decades, if not centuries, doesn’t mean that nothing new and significant has happened in the past year or two. If you ask a half dozen of the country’s leading data scientists, including Silver, you can arrive at a rough consensus that things are indeed changing. But why? As Silver sees it, “We’ve gotten a lot better in a few things, and a little better in a lot of things.” For starters, far more data is available to us today, thanks in large part to the information, records, and measurements generated by cell phones, sensors, and web traffic. We have more computer-processing power and at a lower cost. The interplay between different kinds of databases is more robust, helping to reveal patterns–about consumers, politics, sports, disease, markets, media–that were harder to discern before. And the ability to get specific data in real time and course-correct quite quickly is growing too. Practitioners also have a greater sense of limitations and possibilities. Rayid Ghani, who worked as the chief scientist of the Obama 2012 campaign, dismisses the most extreme promises of big data. “The expectation,” he says, “is that if I have enough data, I can predict anything with it.” Some things, he points out, are inherently unpredictable–a hurricane a year in advance, for example, which a potential client once asked him to forecast. Still, Ghani saw firsthand during the campaign that his statistical work could make significant differences in a number of arenas, even if it alone didn’t constitute a magic bullet. His work led to improvements in voter targeting (by finding who was likely to vote for Obama and then getting them to the polls); helped the campaign allocate its resources better (by constantly crunching data in battleground states to determine whether money would be better spent on voter persuasion or on turnout); and boosted fundraising (by tailoring voter appeals with more precision). “We probably helped them raise 20% more money than they would have otherwise,” he says. Others who traffic in big data are more effusive. Probably the most eloquent exponents of the idea that data is about to turn the world into a different place–and, if we can work out the not-insignificant privacy issues, a better place–are Viktor Mayer-Schoenberger and Kenneth Cukier, authors of the recent book Big Data. “In some ways, these are data techniques that were done in the past,” says Mayer-Schoenberger. “But now, instead of it taking $3 billion and a decade, it takes a week, or a day, and costs nothing.” Decoding the human genome, he offers, is a good case in point. To his writing partner Cukier, who is also the data editor at The Economist, the fact that big data has seen its initial applications in e-commerce is not a reason to think its largest impact, or even its most disruptive effects, will be in business. In his view, the techniques are proliferating first in commerce because businesses have the incentives and the data and because nothing prevents them from using the data in an innovative way. But Cukier already sees important applications in health care and social services. “Seeing this as some sort of crass commercialism,” he maintains of the impending big-data era, “is totally missing the point.” People will ask if I will go work for a campaign and I say, ‘no way.’ I can make a lot more money working for a hedge fund and it would be a lot less evil. Without question, some businesses already use our data in a manner that goes well beyond the quest to boost profits or sell customers more crap. Apart from its efforts to track infectious diseases such as the flu, Google has used its vast trove of data to create a state-of-the-art language-translation program. IBM has applied its data-crunching abilities to identify previously undetectable health risks in premature babies. General Electric is creating new jet engines with sensors that can collect and transmit mind-boggling amounts of information about performance and thereby help flag potential problems. In the meantime, a host of companies with less familiar names are mining similar ore. Osito, a Silicon Valley startup, has an app that gathers data about the location and daily patterns of its users to provide them with helpful information throughout the day. (If the roads are clogged, Osito might tell you to leave early for your next appointment.) Or there’s Kaggle, a company that identifies “data challenges” from corporations and not-for-profits and puts tens of thousands of data scientists into competition with one another to solve them. Recently, in response to a challenge posed by Cornell University and an oceanographic big-data company called Marinexplore, Kaggle asked its users to come up with an algorithm for improving buoy systems to prevent ships from colliding with endangered whale species. (The prize was $10,000.) Another competition asked users to create an algorithm that analyzes patient health records to predict how many days they will spend in the hospital in the next year. (The prize was $3 million.) Such endeavors suggest that however successful big data may prove to be in predicting consumer behavior, its utility in the less technologically sophisticated (and more poorly funded) social sector–in education and medicine, especially–is both largely untested and extremely promising. Some of this promise will no doubt be fulfilled by private companies such as Google and IBM that use their data in creative ways for the public good.

advertisement

A fair amount of disruption might also arise under no one’s aegis, coming instead from a less obvious source: freelance data scientists such as Nate Silver who are simply interested in applying their talents to creative problem solving. This summer, Ghani, the Obama campaign’s data-science guru, begins teaching a class at the University of Chicago on the use of data science for social purposes. “At a very high level,” says Ghani, “there is no difference between using these techniques for predicting what you’re going to buy and whether you’re going to drop out of high school or get a disease or commit a crime.” Ghani maintains that what the stat geeks are doing at Google, Facebook, and Wall Street hedge funds could easily help us understand why talented students in poor communities have difficulty finding or applying to a suitable college. Or they could analyze patterns in childhood obesity or energy consumption. “The problems are all very similar,” says Ghani. “The issue is that the people who are capable of solving these problems might not be aware they exist. And these people don’t know where to go to help organizations that have needs.” Ultimately, Ghani’s goal is to serve as a connector between the private and public big-data initiatives of the future. As it happens, he already has some company. In 2011, a New York data scientist named Jake Porway was tired of the feeling that he wasn’t doing much for the world. Working with big data had to be about more than tracking ad clicks or creating recommendation engines for consumers. “I thought, Let’s see if we can get some people together and spend a weekend hacking on some medical data,” he recalls. “So I put up a blog post to my friends: ‘If you’re in the data community in New York, I’d like to know if you’re interested.'” The word spread. “I don’t have a big readership on my blog,” Porway says, “so I didn’t assume anyone would really see it. But at the end of the week, I had 300 people signed up around the world asking, ‘I want to do this, how do I get involved?’ The White House called me. It was shocking. And that’s when I realized it was more than just me and my friends. This was a potential movement.” In July, Porway and a few colleagues set up Datakind in Brooklyn, an outfit that serves as a bridge between social and “mission-driven” organizations needing help interpreting their data and data scientists intent on using their talents, usually pro bono, usually for noncommercial ends. They’ve been busy, taking on work for the Grameen Foundation in Africa and collaborating with the Sunlight Foundation in the United States to explore the influence of lobbyists on legislators. This sort of work has been done before, but by combing and comparing far larger databases–of congressional votes, fundraisers, parties, donations, and all of the House of Representative transcripts going back to the early 1800s–analysts have far more potential to explore (and expose) the issues. Datakind is likewise working with several medical organizations to discover weak links in so-called cold chains–the transportation routes for vaccinations and organ transplants. At the moment they are interpreting temperature data culled from Android phones strapped to the transport trucks. Already, several chapters of Datakind have sprung up in other cities around the world. “How cool is that?” asks Porway. For the most part I’m saying the world is pretty unpredictable, or at least we human beings are not all that able to predict it. Nate Silver is now trying to see what’s coming next for him. He has just turned 35. His interest in politics, always more intellectual than emotional, seems nearly exhausted by the election season. “I definitely get tired of the politics stuff,” he tells me. “Or at least I’m tired of it now. You basically have a lot of sociopaths and crazy people who work in the politics industry who are kind of enabled by it being such a strange profession. Just a lack of. . . .” Silver stops to reach over for a french fry, eat it, and think. “I mean, well, the fact that it’s seen as so optional to actually be truthful?” It offends his sensibilities as a data scientist in pursuit of truth. “You know,” he continues, “whereas business can be amoral, I think politics is actively immoral on many occasions. So people will ask if I will go work for a campaign and I say, ‘No way.’ I can make a lot more money working for a hedge fund and it would be a lot less actively evil. At least you’re not trying to manipulate people’s belief systems.” But he has no plans to go to Wall Street. And while he has done some business consulting in the past–for a Hollywood movie studio and for ESPN–he doesn’t sound interested in that path, either. He’d rather write or blog, he tells me, or take on occasional speaking gigs, some paid, some not. “Right now,” he says, “to be able to do something that’s creatively fulfilling is really valuable. I mean, I’ve had years in the past 10 where I made basically no income, like when I was playing poker. And I’ve had years now where I’ve made quite a bit of income.” His point is that he doesn’t want to compromise in the name of getting rich–or, to be precise, richer. At the moment, he spends most of his disposable income on restaurants. And only occasionally does he contemplate luxury living. He’s considering buying a partial season-ticket package to the Knicks, for instance, but first has to decide if he’s still too annoyed at the team’s management for the Jeremy Lin fiasco. “Adding more means is nice,” he concludes. “But if I were making 10 times more money, it would be a somewhat marginal improvement at this point.”

advertisement

His next book, he thinks, might be about belief. “Ideology, in some sense, is a series of assumptions that people make and hold very deeply,” he explains. “Sometimes it can be very deeply thought out, but especially in politics it can be wafer thin.” He is curious how people formulate those beliefs and why they hold on to them so vehemently. And he is also thinking about doing some analytical work on education, which he considers another area where prediction is underutilized and poorly executed. There’s a lot of education data, Silver says–but much of it is quite noisy. Can he find the signal? It would be a field where good analysis could make a real difference in the world, he agrees. “The stakes are pretty high.” Being Nate Silver can meanwhile have its downsides. People are always asking him–calling, emailing, texting–to try and predict things he has no interest in predicting, such as the lottery, or for a book tour in the United Kingdom, Kate Middleton’s future. “That’s not really the message I’m trying to send out,” he says. He’s had fun with some things–predicting the next Pope, or the Oscars–and he will try to work up a decent mathematical model in pursuit of amusement. But he seems to worry about tarnishing what he calls “the FiveThirtyEight brand” by being too mercenary or trivial. “For the most part I’m saying, Look, actually the world is pretty unpredictable, or at least we human beings are not all that able to predict it,” he tells me. What he really wants is for the world to have a serious conversation about the science of statistics, and how belief and bias figure in. And as for the prospect that big data will make human behavior completely predictable–or even mostly predictable? The idea not only seems wrong to Silver, but it also seems unpleasant. “History shows that people keep making mistakes,” he says. “And the good thing is that people who are creative and entrepreneurial and innovative can make a name for themselves or make a lot of money for their company if they do something a little different.” For a moment, he seems unnerved by imagining a society without mistakes and bias–no overfed political pundits bloviating on the airwaves about can’t-miss candidates, no sports hacks making pregame NFL predictions based on anecdote and intuition. “I always find utopia very boring,” he says after a moment, “but that’s probably my ingrained bias.” You can see that he has trouble figuring out how, precisely, he could work the angles of a perfect world. “Utopia would be really . . . fucking . . . boring,” he says again, this time for emphasis, “because there would be no edge to it.”

–Additional reporting by J.J. McCorvey and Jillian Goodman How Big Data Saves Helps The World 1. CRIME, STOPPED

We’re still a ways from Minority Report, but several cities have teamed up with IBM to analyze crime history and strategically deploy police officers. Memphis cut its crime rate by 14% in the past year. Chicago just adopted the method, to avoid a repeat of its 500 homicides in 2012. 2. MEDICINE, MADE SAFER

In March, researchers from Microsoft and Stanford revealed that they had mined search data of millions of users to successfully identify unreported side effects of certain medications. And last year, Merck held a Kaggle competition to create an algorithm that predicts the side effects of a drug–before it’s produced. 3. ENERGY, SAVED

Smarter meters (and the software developers who make them) are shaming us gluttons into cutting back. For example, Opower lets some utility company users compare consumption with their neighbors’. In 2012, that helped 15 million customers cut usage by 2 billion kilowatt-hours (about $220 million in energy). 4. EDUCATION, EXTENDED

Systems such as Degree Compass keep kids in school with predictive analytics–recommending courses based on their majors, transcripts, and past students’ success rates. In March, Degree Compass had a 92% accuracy rate (across four universities) in predicting the grade a student would get in a course. [Photos by Jeff Brown; illustrations by Justin Mezzell]