Searching for sneezes (Image: Cultura/Colin Hawkins/Getty Images)

Google may be a master at data wrangling, but one of its products has been making bogus data-driven predictions. A study of Google’s much-hyped flu tracker has consistently overestimated flu cases in the US for years. It’s a failure that highlights the danger of relying on big data technologies.

Google Flu Trends, which launched in 2008, monitors web searches across the US to find terms associated with flu activity such as “cough” or “fever”. It uses those searches to predict up to nine weeks in advance the number of flu-related doctors’ visits that are likely to be made. The system has consistently overestimated flu-related visits over the past three years, and was especially inaccurate around the peak of flu season – when such data is most useful. In the 2012/2013 season, it predicted twice as many doctors’ visits as the US Centers for Disease Control and Prevention (CDC) eventually recorded. In 2011/2012 it overestimated by more than 50 per cent.

The study’s lead author, David Lazer, of Northeastern University, says the fixes for Google’s problems are relatively simple – much like recalibrating weighing scales. “It’s a bit of a puzzle, because it really wouldn’t have taken that much work to substantially improve the performance of Google Flu Trends,” he says. Merely projecting current CDC data three weeks into the future yields more accurate results than those compiled by Google Flu Trends. Combining the two resulted in the most accurate model of all. Lazer says Google Flu Trends does have promise, especially at predicting flu trends over smaller areas than the CDC takes into account, which could enable individual cities or states to prepare.


Neil Richards, a data and privacy lawyer at Washington University in St Louis, says the study is an important insight into the immense power that the analysis of large data sets has afforded technology companies like Google and Facebook – and why that power is dangerous. “We now have technology companies with power that rivals the state in some respects, and which have much more of an impact on our daily lives,” Richards says. Understanding the failings and the function of technology companies is becoming increasingly important, as they wield ever more power.

Evan Selinger, a technology ethicist at Rochester Institute of Technology in New York, says Google Flu’s failures hint at a larger problem with the algorithmic approach taken by technology companies to deliver services we all want to use. The problem is with the assumption that either the data that is gathered about us, or the algorithms used to process it, are neutral. The main concern of tech companies like Google, or data brokers like Acxiom, is to use patterns in that data that can make them money (like where an advert should be placed on screen to maximise the chance that men in Boston aged between 25 and 30 will click it). How they go about that has a huge impact on our lives, and yet we have no idea how it works.

“Algorithmic accountability is one of the biggest problems of our time,” Selinger says. “More and more decisions made about us are computed in processes we don’t have access to.”

Selinger points out that the other areas of life where people make decisions that affect our welfare are much more transparent – we have an opportunity to a fair trial when accused of breaking the law, for instance, and can raise issues with our credit score if we feel it’s wrong. In contrast, a business that finds itself pushed to the second page of Google’s search results or delisted from Google Maps can never know the reason – the algorithms which make that decision are Google’s property.

Unfortunately, the power of Google’s algorithms are hard to substantiate. “We all have some sense of what could really go wrong,” Selinger says. “But when it comes to big data and data brokers and companies there’s only a few of little things on the radar – [US retailer] Target figuring out its customers are pregnant, for instance.”

The issues are serious enough for President Obama to order a review, led by John Podesta, of the risks big data poses to privacy. The first of three conferences examining the legal, technical and sociological issues surrounding big data just took place earlier this month at Massachusetts Institute of Technology.

The more that real-world decisions are based on algorithms, the more important transparency into those processes becomes, Richards says. Already, automated systems determine whether people receive loans or jobs. Algorithmic analysis may be used to determine no-fly lists. The Intercept, drawing on the documents leaked by Edward Snowden last year, has reported that analysis on data held by the NSA has been used to target drone strikes.

“Now that data scientists have achieved a remarkable level of social power, I hope we’ll see them recognising that with that power comes a great degree of professional responsibility, the same way doctors and lawyers and journalists do,” says Richards.

Journal reference: Science, DOI: 10.1126/science.1248506