Psychologists can't seem to agree on what technology is doing to our sense of well-being. Some say digital devices have become a bane of modern life; others claim they’re a balm for it. Between them lies a shadowy landscape of non-consensus: As the director the National Institutes of Health recently told Congress, research into technology's effects on our thoughts, behaviors, and development has produced limited—and often contradictory—findings.

As if that uncertainty weren't vexing enough, many of those findings have sprung from the same source: Giant data sets that compile survey data from thousands or even millions of participants. "The problem is, two researchers can look at the same data and come away with completely different findings and prescriptions for society," says psychologist Andrew Przybylski, director of research at the Oxford Internet Institute. "Technological optimists tend to find positive correlations. If they’re pessimists, they tend to find negative ones."

LEARN MORE The WIRED Guide to Internet Addiction

In the latest issue of Nature Human Behavior, Przybylski and coauthor Amy Orben use a novel statistical method to show why scientists studying these colossal data sets have been getting such different results and why most of the associations researchers have found, positive and negative, are very small—and probably not worth freaking out about.

Consider the Millennium Cohort Study. A long-term study on the health outcomes of kids born in the UK in 2000 and 2001, the survey contains dozens of questions whose answers a researcher could reasonably interpret as relevant to a person's well-being.1 Those questions span topics as disparate as self-esteem, suicidal thoughts, and overall life satisfaction. "But different researchers have different conceptions of well-being and can choose different questions to fit that conception," Orben says.

Whether they realize it or not, a researcher who chooses to focus only on certain questions is making a decision to pursue one analytical path at the exclusion of many, many others. How many? In the case of the MCS, combining the survey's questions on well-being with those on things like TV watching, videogame habits, and social media use produces a total of 603,979,752 analytical paths a researcher could take. Combine them with questions directed to the caregivers of study participants, and that figure balloons to 2.5 trillion.

Granted, the vast majority of those 2.5 trillion results are not all that interesting. But the sprawling nature of these data sets allows for associations to emerge that are technically statistically significant but are very, very small. In science, large sample sizes are generally considered to be a good thing. Yet when you combine the large number of analytical paths afforded by subjective survey questions with an enormous number of survey participants, it opens the door to statistical skullduggery like p-hacking—the practice of fishing for favorable results in a large set of data.

"Researchers will essentially torture the data until it gives them a statistically significant result that they can publish," Przybylski says. (Not all researchers who report such results do so with the intention to deceive. But researchers are people; science as an institution may strive for objectivity, but scientists are nevertheless susceptible to biases that can blind them to their misuse of data.) "We wanted to move past this kind of statistical cherry-picking. So we decided to look for a data-driven method to collect the whole orchard, all at once."

Przybylski and Orben found that method in a statistical tool called specification curve analysis. Rather than investigate a single analytical path through the Millenium Cohort Study, SCA allowed them to investigate 20,000 of them. It also permitted them to probe all 41,338 paths through two other large-scale data sets, called Monitoring the Future and the Youth Risk and Behaviour Survey, that are commonly used to assess the association between digital habits and adolescent well-being.