Amy’s interest in kids’ screen time

Watkins: Amy has been described as a “clarifying present” when sensationalist headlines circulate saying that digital technology is the harbinger of a tween apocalypse. Her research uses large-scale data to examine how digital technologies such as social media platforms, like Instagram and Twitter, affect the psychological well-being and mental health of adolescence. We started out our conversation by asking Amy how she got interested in this particular topic.

Orben: What happened in 2017 is that we started getting these waves of very scary coverage about social media effects. We had the publication of iGen by Jean Twenge — which was a professor at University of California San Diego — and she and her collaborators published a couple of academic papers as well, which detailed that social media and digital technologies are having these lasting impacts on children. At least that was written in very popular science articles, for example, she wrote an article in The Atlantic called “Why smartphones are destroying a generation.” I haven’t double-checked, but I have been told by a journalist that this was the most read science article in The Atlantic in 2017. And I just remember reading it, and I got annoyed on Twitter in a way that you only get annoyed on Twitter when you have no followers. So I was kind of ranting around on Twitter about this article, but the scientific papers backing this article up hadn’t come out. And then in November, these scientific papers started appearing, and there’s one published in clinical psychological science, and a journalist sent it to me the day before publication. So it was under embargo and he was like Amy do you have any idea, you know, I think you might be interested in this. And the paper itself was saying that smartphones cause depression, and I knew that the next day there would be headlines, and headlines, and headlines that smartphones cause depression. And I felt like there’s nobody that’s gonna sit down and have a look at this, you know, it’s gotta be me and I spent the evening and the night downloading the first wave of the monitoring, the future date set which is an openly available US-based database of teenagers. There’s about a hundred thousand every year that they collect data from, and most of what the book is written on, and the paper is based on this dataset, and you can download it freely from the web. So I downloaded it and I started looking at it and mainly I wrote a blog post on Medium saying, you know, these effects are tiny. Social media only predicts 0.096 percent of depression in teenage girls, you know, it’s significant but it’s tiny. So I published a blog post, you know, it got I think like a hundred retweets and Twitter. I felt that was really cool, but naturally I had like directly criticized a pretty eminent person in my field and I hadn’t really thought about it, because I was so, I just felt like I needed to do this, and I was, you know, we we are funded by public money, so why are we so hesitant to be vocal if we think that something is actually not correct that is going around in the media. So I then posted this and it’s got a lot of positives. It naturally also had a bit of negative feedback both in Oxford and more broadly, and I guess I started realizing that, you know, proper scientists might not normally do this sort of, kind of, thing. And we have a really weird way of criticizing research and it’s not very direct. So I’m still very happy that I did all this, but I was asked by a young PhD student the other week, you know, if I would do this again and, you know, at the moment when you’re on the job market, you probably wouldn’t. But they kept on toying with the data, and I started realizing that very small changes, in the way that I set up the analyses, cause quite big changes in the results that I would find.

[ Back to topics ]

Origins of the study

Leigh: The effects of digital technologies on children’s developing brains has drawn a lot of attention from scientists and journalists alike, with each article seemingly suggesting a definitive answer to the kinds of questions that most parents have about their kids’ screen time. So Ryan and I were curious to learn what Amy had to say about research these kinds of questions.

Orben: Everyday we hear about social media having some sort of effects on teenagers or on adults or in children, and one day it’s that it’s really really negative and the next day is that it’s positive, or maybe there’s no effect at all. And so there’s a real confusion as to what is actually happening, you know, how is this technological innovation affecting our society, and the youngest and most vulnerable in our society. So this paper really started by trying to figure out how could researchers, analyzing the same dataset with the same question, actually come up with very different conclusions, and this is really led to us implementing some more innovative statistical techniques. So we actually ran every theoretically defensible analysis pathway in which we could have correlated digital technologies and adolescent well-being. So this was over three different date sets in the US and the UK, and that led to up to 200,000 different statistical analyses being performed on one dataset. And it really gives the scientists and the researchers a unique insight into how the way you would have analyzed the data could have changed what you had found. So we ended up with a range of possible results scientists could have found analyzing the same dataset with the same research question. So it was almost like simulating taking 200,000 scientists with subtly different biases and different histories and giving them the same dataset in the same research question and seeing what they come up with. And so we found that actually, for example, you could have written probably a hundred thousand papers with negative effects, but then 50,000 with positive effects, or 50,000 with no effect at all, so that you say oh there wasn’t any significant correlations.

[ Back to topics ]

Analysis of big datasets

Watkins: Over the past few years, there’s been plenty of news about “big data” and what might be learned from the patterns that can be found in large datasets. The analysis of such datasets isn’t without its risks and challenges, however. Doug and I were interested in hearing how Amy approaches the analysis of these kind of data.

Orben: I often thought about data analysis like a sort of magnifying glass that I just put to my data and I then see what’s in the data. But what the research shows, and what we’ve naturally known but I think what this paper visualizes is that data analysis can actually change what you then end up seeing in the data. So it’s really important for us to think about how we can safeguard ourselves from different ways in which we can analyze the data. So this is kind of methodological aspect of the paper, and the paper is kind of a double whammy because on the other side, there’s also we then actually wanted to look at what’s the correlation between digital technologies and well-being, and what we found that if we take the average of all these different ways we could have analyzed the data and all these different results we could have gotten, we found that there is a small statistically significant negative effect, so not effect association between digital technologies and well-being. But crucially it’s important to think about the effect size, because we’re working with such large scale data sets that even incredibly small effects in the data or correlations in the data become significant. So what we did to put this the size of these associations into perspective was to compare them to other associations we had in these data sets. These data sets are incredibly rich, they’re collected from hundred thousands of adolescents with loads of different variables. So we could actually look at, for example, what is the association between eating breakfast every morning and your well-being, and we found that association is actually really positive, and the association between taking drugs and well-being is negative, and the digital technology use association is negative but it’s it’s a lot smaller than either of these other it says, and it’s so small that it’s pretty near the association between eating potatoes and well-being, or wearing glasses and well-being which are both negative and are statistically significant. But we wouldn’t be investing millions of dollars into making sure that children wear less glasses at school or eat less potatoes. So it’s almost a satire this kind of part of the paper, but it’s really trying to highlight that just because an effect is statistically significant doesn’t mean it’s practically significant, and these two different ways of thinking about the data are almost orthogonal, you know, statistical significance versus practical significance.

[ Back to topics ]

The Garden of forking paths

Leigh: In 1941, the Argentine writer and poet, Jorge Luis Borges, penned a short story titled: “The garden of forking paths,” which describes a world in which all possible outcomes of an event occur simultaneously with each one leading to further proliferations of possibilities. More recently, Andrew Gelman, a statistics professor at Columbia University, invoke the idea of Borges forking paths as an analogy for the bias that may sneak into research studies through the small decisions made by scientists, as they progress through their work. But how can these unique paths influence a researcher’s science? Amy explains.

Orben: So for example, if I’m a researcher trying to figure out the correlation between digital technologies and well-being, I might start walking down this garden and I come to the first decision I have to make, and that might be how do I even define well-being. And I have these datasets where there’s probably up to 25 different questions that could be used to define well-being, and some of them come from predefined scales, others don’t. If I look at the paths literature people have been selectively picking and choosing what they want to use to measure well-being. So we can’t even use plus literature to guide us and so we’re at this fork in this garden needing to make a decision how to define well-being in our analysis. For example, if I would now already be working with the data I might have a look, you know, what happens if I define it and there way and if the results don’t really make sense or they’re they’re not very clean, I might try a different way and I might try it another way. And so I might end up deciding which path to go down on the basis of what seems to make the most sense having tested on the data. And so I’m wandering down the garden a bit more, and then I come to the next decision point, and there I might need to decide how to measure technology use, and the same thing commences, and I need to decide what to do, and I start wandering down one of those many different paths, and then, you know, there’s even more questions about what controls do I need to include, how do I want to model the data, etc. So in the end, I might have walked down a very specific path in this garden, and I think what often happens to researchers is that we look back, and we think naturally I’ve taken this path, that was clear from the very beginning. But actually if we were looking at this garden of evermore forking paths, there might have actually been millions of possible ways I could have analyzed the data, all of them will probably be defensible in, for example, a peer-review process, as it looks like no peer review found out that people were just picking and choosing what they want to define well-being as active, for example, 25 different questions. So, at the moment if researchers don’t specify beforehand how they want to analyze our data, there can be subtle biases that can influence how they analyze our data, especially if they’re looking at their data while they’re analyzing it, which most of us naturally do.

[ Back to topics ]

Specification curve analysis

Watkins: Empirically, testing scientific hypotheses requires that researchers make a number of data analytic decisions. For example, which variables to use, and what observations to include or exclude the accumulated effect of these decisions often leaves different researchers at different conclusions from the same data, specification curve analysis or SCA for short was developed in 2015 to identify what might have happened had a researcher gone down any number of different forking paths while collecting analyzing their data. Doug and I wanted to learn more about how SCA works, so we asked Amy to tell us more about the technique.

Orben: So, Specification Curve Analysis takes this one analytical pathway which would be reported in a normal paper, and instead of reporting the one analytical pathway it reports this whole garden of forking paths, every single different combination of all the different decisions a researcher could have made. And so how you really start off by doing that is that you sit down and you think about, okay, so I have this research question, I have my data but I won’t look at it, yeah, what sort of decisions will I need to make to actually analyze it, and you note down all the possible ways, for example, you think you could define what are being, all the possible ways you could define digital technologies, all the possible controls you could add, and all the different combinations of controls. And there we found that in these large-scale datasets where we have these hundreds of different variables, there’s actually sometimes more than a million different ways you could have analyzed a simple correlation. And so this garden of forking paths is actually millions of different paths the researcher could have gone down sometimes.

[ Back to topics ]

Developing analytic skills

Leigh: As a new statistical tool, SCA isn’t yet included in many statistical software programs. And applying the method to one dataset is no simple task. So, given that she applied it to three different datasets, Ryan and I wondered how Amy developed the skills to do these kind of analyses.

Orben: The code for a Specification Curve Analysis, I often liken it to a prosthesis because it needs to be it needs to be molded to the analysis that you want to do. So nobody has yet made a package in our package to do specification curve analysis, because it’s so hard to actually make one because it needs to be so so individualized. And so I spent yeah I spent the spring doing that making these Specification Curve Analysis they started becoming ever bigger, and instead of taking, you know, hours to run, they started taking days to run, and then I submitted it and gotten to review, some of the authors of the original Specification Curve Analysis paper were my viewers. They wanted some very fancy permutation tests done, which ended up that I needed to run these Specification Curve Analysis that already took days on my computer. I needed to run 500 of them time three four three datasets. So that’s when I started using computer clusters as well. So I learned how to use a supercomputer to send these specification curve scripts to run on the Oxford computer cluster which was incredibly helpful. Yeah, then clogged up that, and it was a whole different story, but I ended up actually being able to run these in a couple weeks, and after two rejections, Nature Human Behavior was very interested, and the review process went really well. So that’s kind of how Specification Curve Analysis evolved. And so a year and a half ago, if you would have told me that I would spent a year running these sorts of analyses in a semi-crazed state state, I would have probably not believed you. But it’s been great, I’ve been a huge amount of skills and looking back, I now really view the PhD is the time to develop skills, and I definitely did in that projects being so involved and trying to up my game to really progress the work in the field.

[ Back to topics ]

Confirmation bias

Watkins: Scientists aren’t exempt from confirmation bias. The predisposition we all have to be more likely to favor evidence that’s consistent with our beliefs rather than that which is inconsistent with them. Specification Curve Analysis attempts to assess a researcher’s bias, and in doing so, takes on one of the most challenging aspects of the scientific process addressing a researcher’s arbitrary but defensible decisions. Here’s what Amy had to say about how SCA might improve our science.

Orben: I think how SCA can help is in two ways: The first is that it’s a really powerful tool to explore what decisions that you make actually make a difference in the results that you find. So, for example, I now know the decisions to include control variables is hugely influential to whether you find negative effects of dental technology use or not, which made me a lot more attune to that there are probably third variables involved, like an adolescent being disadvantaged. This is a matter to adolescents use more technology, and they also feel worse. So we need to account for that in our models, and it made me see a lot more about the importance of common method variants. For example, we always use adolescent self-report digital technology use, but we have both the adolescents reporting on their well-being and also the parents reporting on their adolescence well-being that caused major differences, if we included one or the other. So I’m currently using SCA to have a look at data where I, for example, might have multiple different measures of life satisfaction. And so what sort of life satisfaction might be more or less affected. So I think it’s a really great tool to do exploratory research, and then to create hypotheses to then test later on in confirmatory work especially if we’re going towards a scientific method which is more akin to what the scientific method should be, and it’s not just labeling everything that we do is confirmative, but actually valuing the exploratory part another thing that I say in the paper is that Specification Curve Analysis isn’t without bias, or unconscious, or conscious bias. So a different researcher could set up a different essay and probably get different results, and that’s actually completely fine. That’s what was actually proposed by those who wrote about Specification Curve Analysis first.

[ Back to topics ]

Measuring technology use

Leigh: The idea that digital devices, the Internet, and social media have an enduring influence on how people develop, socialize, and thrive is a compelling one. To study this question, however, researchers have to first determine how to appropriately measure our use of these technologies. Since Amy was analyzing data from across three large-scale data sets, Ryan and I were curious how she approached defining what counts as “technology use.”

Orben: The measurement of technology use is really the achilles heel of the research area. So it’s a really interesting thing to talk about the focus on digital technology use and how I measure it was very much based on past research, because I wanted this work to directly speak to the existing literature. These datasets often have a couple of questions relating to things that the people who write the questionnaires think are interesting, but which is often not what’s then interesting in a couple of years time. So there’s stuff about digital gaming, there’s things about watching television, there is a bit about social media use. But these are all self-report questions and often not on very good scales, you know, how many hours on average do you use social media on a typical school day, and then it’s like none, one to three hours, three to five hours, five plus. So I’m not actually quoting the paper I have in front of me, but this is normally how these measures look like. And because a lot of really highly influential research which was informing political decisions already used exactly these measurements, I use those as well. But I think it is really important to say that research has been coming out for a couple of years now that showed that these self-report measures of adolescent technologies aren’t very good, and that if we actually tracked the amount of time, for example, a teenager spends on social media, and then ask them how much time they spend on social media, there’s only a correlation about 0.3 normally, which isn’t very isn’t very good or a measure, it’s naturally the best thing that we have at the moment because all of this trace data this actual time spent on, for example, Facebook or Twitter or Instagram, that data is locked away in the tech companies and we are currently trying to figure out how this sort of data could be shared, but we’re not there yet at the moment. And furthermore I think what is really important is that they’re not very nuanced in that I think we would all agree that a teenage girl at skinny models on Instagram, will probably have a different effect than a teenage girl looking at cat videos on Instagram. Me asking that teenage girl how much time do you spend on Instagram won’t capture that sort of nuance, and going forward, especially in my own research, I really think we need to ask better questions about technology use, because screen time time spent on digital technologies I think is a worthless concept, because I think the important thing is not the time you spend on it, but it’s more what you do on it as well, and the motivations behind it. So that’s my little measurement rant.

[ Back to topics ]

Measuring adolescents’ well-being

Watkins: Kids’ use of Technology was just one side of the equation in Amy’s research; she also had to deal with variations in how adolescent well-being was measured. Amy talks with us about the challenges of measuring well-being after this short break.

ad: SciencePods.com

Watkins: Here again is Amy Orben.

Orben: In the end, I’m taking an approach which is naturally quite, how do you say, blunt which is for the the first two so that the US stage sets, I again just used what other researchers and gives before in one of them, for example, there’s a kind of a depression like scale asking about depressive symptoms, and then there is a self-esteem scale which is Rosenberg self-esteem sale of six items, and then there’s actually a really neat table in the supplementary materials where I’ve gone through, and I try to find all of the past papers that had used this dataset to look at something pertaining to well-being, and I showed that people didn’t stick to the questionnaires. So some people would take four questions of the depression measure or six, but then they add two of the self-esteem measures, or they take two of the self-esteem into the depression, and they seemed to me, you know, consistency and no regard to how the scales had been constructed. And so in the paper, naturally what I was trying to do is I wanted to note down and to map out every theoretically defensible analysis pathway and so theoretically defensible could naturally be well. We should just use a questionnaire as how they’ve been designed, and I do that in the supplementary materials which is very extensive but actually in the main paper, I decide to use any single combination of these different well-being measures in the dataset, so I might be taking one of the Welby depression and one of the self-esteem, or just a self esteem or a different mixture. And I do that because I found that people have done that in the past and it seems to be getting through peer review, but going forward in the work I’m now doing, I’ve never used that sort of very blunt approach again, but I think it was important for me to visualize just how much analytical flexibility there is in the field, because I’ve now peer-reviewed quite a few papers using these datasets, and I often just the first thing I do is I look at the questions and I see whether they’ve actually included all the questions that they could have, or whether they’ve just selected what might fit best for their narrative.

[ Back to topics ]

Technology use and well-being

Leigh: As it is also with adult’s, adolescents’ use of technology is nuanced, and its impact on their well-being is equally complex. So Ryan and I were eager to hear what Amy believes her study says about the association between the two constructs.

Orben: I don’t think this research says that there’s no negative effects of digital technologies or adolescence. I think it does say that, you know, if we take the average of all sorts of digital technologies, and its effect on the average of all possible adolescents we only find these very minimal effect. But that doesn’t mean that there won’t be certain types of technologies that might actually negatively impact certain types of adolescents, or maybe certain type of technologies that might positively impact certain types of adolescents, because we’re averaging all of these different reactions together, we might be actually finding something near zero. I’m actually quite critical of technology, but probably that there’ll be specific types of technology we’ll need to think about more, and it’s not actually screens in general, you know, doing your homework on screens it’ll be very different to others types of screen use. So I think the the Bulls still out in the open, and we we don’t actually know what is really happening yet I think what this work at the moment is trying to say is that actually the evidence base that a lot of these really scare mongering claims are based on is not there, and so at the moment probably what we need to do is we need to be very vocal that we don’t know, and place an increase emphasis on the parents to make these decisions, and not for example for policy makers to try to enforce certain screen time limits. And there’s been a lot of the debate in the UK currently because the UK Royal Society for Pediatrics of pediatricians, and child health they released their new screen time guidelines a couple of weeks ago, and more or less what they said is as well is that we don’t actually have the level of evidence yet to state specific screen time limits, you know, naturally we need to endeavour to build a child’s life, and then put technology use around it, and not take technology use and then build the life around it.

[ Back to topics ]

Collaboration with Andy Przybylski

Watkins: Throughout the history of science, researchers have often benefited from the collective intelligence that can emerge from collaboration. Amy’s researchers often done in collaboration with Andy Przybylski, who’s also with the University of Oxford. We were curious how they came to work together on this line of research.

Orben: We met actually in a conference in Germany, mainly, so we, it needed both of us to be abroad for us to really start talking about these issues. It also helped that at the time I was in a bit of a rut with my PhD research, I didn’t really know where I was going to take things, I wasn’t very happy with what I was doing, and so I met Andy for a quick lunch before then. But we were in this conference in the German countryside, and I think there I started just begging him to help me out, because my PhD was doing Noah and I didn’t I didn’t trust the science I was doing, and I thought that it was pretty crap, and no didn’t it adhere to the scientific principles that I really entered science for. So at that point, we we started working together, but it only really kicked off when I started analyzing these large-scale data sets that people had already published on. It was a very winding path till we actually started working on things that are now published and and that I’m really proud of. Looking back, I’m surprised that we didn’t meet each other earlier, because we were both interested in the same things, we both had this feeling that a lot of the scientific literature was misrepresenting things, but it’s weird that we needed to be in Germany to actually figure out that it’d be good to work together.

[ Back to topics ]

Amy’s work in open science

Leigh: In addition to being an experimental psychologist, Amy’s also a fellow podcaster. Her show, ReproducibiliTea, which there’s a link to at parsingscience.org/e47, focuses on reproducible and open science. Since Ryan and I often invite our guests to talk about their perspectives on open science and related issues, we were interested in learning how these initiatives have influenced her own work.

Orben: Just so much of who I am as a scientist, I feel just resonates that we need to do things better, and just such a crucial part of that is transparency. I could have only done my work because the date sets that were used were openly available to everyone, and I still had huge trouble computationally reproducing the results, even with the same dataset, and I’m still trying to do it. And I’ve been trying to get the actual datasets the researchers used so naturally, I think that they got uploaded to SPSS and then they made some variables, and I can’t figure out how those variables were made. We still haven’t computationally reproduced quite a lot of the research using the same datasets. So yeah, transparency is so key, and I think we need to justify everything in our papers, and at the moment we’re just cutting corners. But once we actually start justifying all of these different decisions, we’ll do a lot better science, not just because we’ll have to think about it before we do it, but we’ll say because others can then reconstruct your research can properly criticize, etc. And actually you feel vulnerable making things transparent, but then I think if people do find errors, then at least you’re helping the scientific literature, and it can be seen as an honest mistake, and you can actually action it, and make sure it’s better. And I think the last thing about open science is that when I started doing it, I felt like I was shooting myself in the foot, you know, it’s gonna it took a year for this paper to emerge. Everybody else around me seems to just be getting on with things, but for me it was just as such a massive change. Once I just started doing the best work I possibly could, and didn’t need to worry about the results, I think this is what the Specification Curve Analysis, it was amazing because I knew that it would be publishable, because it was new methodology on an interesting question, and whatever result came out I knew it would be okay. And so I didn’t even worry about the results, I was just trying to do my best. But for me, that was such a massive step change, you know, in the time I was in Germany I was applying for jobs outside of academia, and the moment I started actually being able to learn and do what do my the best possible work and not think about the story, I actually became more and more attached to being an academic, because I think most of the people who start in academia start because we want to figure out part of the truth, or we at least want to help society figure out more things going forward, you know. We’ll have mistakes that’ll have setbacks, but looking back on our academic careers, we’ll hopefully have one or two things which we actually contributed. So as a grad student, I think I felt like I was actively harming my career, but now at the end of my grad studies, it was really what changed it, and actually made me stay, and gave me the collaborators who are so supportive to everything I do, and all the network. So there’s just I could fill you two hours of this positivity about open science in a lot of different ways.

[ Back to topics ]

Leigh: That was Amy Orben, discussing her article: “The Association Between Adolescent Well-being in Digital Technology Use,” which she published with Andy Przybylski on January 14th 2019, in the journal Nature: Human Behavior. You’ll find a link to their paper at parsingscience.org/e47, along with bonus audio and other materials we discussed during the episode.

Watkins: If you enjoy Parsing Science, consider becoming a patron for as little as $1 a month. And as a sign of our thanks you’ll get access to hours of unreleased audio from all of our episodes so far, as well as the same for all of our future ones. You’ll help us continue to bring you the unpublished stories of researchers from around the globe, while supporting what we hope is one of your favorite science shows. If you’re interested in learning more, head over to: parsingscience.org/support for more information.

[ Back to topics ]

Preview of next episode

Leigh: Next time, in episode 48 of Parsing Science, will be joined by Mason Youngblood from the City University of New York. He’ll talk with us about his research into the cultural transmission of digital music samples through collaborative networks of musicians.

Mason Youngblood: So in the case of music sampling and individual might be more likely to adopt a particular music sample and their own music, because of some quality intrinsic to that music sample.

Leigh: We hope that you’ll join us again.

[ Back to topics ]