Jane Hu, writing for Slate, called last week a week of spectacularly bad social science reporting. The “extraordinary claim of the week” went to news reports of hurricanes with feminine names being more ‘deadly’ than hurricanes with masculine names.

“People of the Internet were quick to cite the findings as an example of sexism. […] More skeptical journalists, scientists, and bloggers were quick to point out glaring flaws in the study. Some dissected the study’s methods, Slate re-ran analyses, and others drew up their own models. People dug deep, and the verdict was clear: The study’s data analyses may not have been accurate, and its conclusions were overblown.” – Jane Hu, on Slate

Hu cites the New York Times, Washington Post and Time as examples of the ‘spectacularly bad social science reporting’ evident in many news articles about the female hurricane study published in the Proceedings of the National Academy of Sciences (PNAS).

I agree with Hu that reporting of the rather controversial PNAS hurricane study could and should have been more skeptical and responsible in many cases. Faye Flam, writing for Knight Science Journalism tracker, pointed out in good humor how many traditional news outlets covered the study too uncritically, while others sought needed verification.

“You could: a) Take the study at face value, since it’s in a peer reviewed journal, in this case PNAS. b) Take it seriously but include one or two quotes from “skeptics”. c) Ignore it. d) Make fun of it. Or e) Take it apart and analyze how researchers could have obtained such a result.” -KSJ

But something big is missing here. That something, which I thought I could hunt down, was the story from these journalists’ perspectives. That something was the story of why the PNAS female hurricane study in many cases gave very good science writers fits.

I think that scientists, science news critics and science bloggers often get caught up in criticizing media coverage of science, perhaps without trying to see the problem from legacy journalists’ perspectives. The interviews below are an attempt to understand how traditional reporters grappled with this controversial study.

Updates tell the story better: The Washington Post

“We are a weather blog, and I do the PNAS clippings of the embargoed stories before the journal publishes articles,” said Jason Samenow, the Washington Post’s weather editor. “That one obviously caught my eye, because I cover weather and climate, and historically we’ve written a number of stories on the history of hurricane names. […] For whatever reason, storm names really resonate with people, because of that personal connection I think. It certainly gives storms more of a personal connection. Given that natural interest that people have, I decided that I needed to cover the story.”

Jason’s post on June 2nd, titled “Female-named hurricanes kill more than male hurricanes because people don’t respect them, study finds,” was arguably overly positive about the findings and implications of the PNAS research study. The article’s lede describes “a new groundbreaking study” that finds “[f]emale-named storms have historically killed more because people neither consider them as risky nor take the same precautions.” But fortunately, Samenow’s narrative on the PNAS hurricane study didn’t stop with his first post. A post he wrote the following day, titled “Disbelief, shock and skepticism: Hurricane gender study faces blowback,” did a great job putting the original post in context of critical feedback from the public and the science blogosphere.

“Obviously, part of my job is to draw in readers, so I wanted to present the take-home message of the paper in a provocative way to engage readers. So that’s what I did,” Samenow said of his first post. “I had a good idea that the story was going to attract a lot of attention, it was going to resonate. So I wanted to make sure I put together a credible story with a range of viewpoints on the research.”

Samenow contacted at least 6-8 outside experts for his first post, he recollected, including meteorologists and social science experts. In order to get informed outside reviews of the research, Samenow sent sources he trusted the embargoed paper.

“I think initially, they may have been a little bit conservative in their criticism of the study, just because, you know, I think they don’t want to come out on… I think they wanted to be respectful in their comments and not combative without having had the chance to see what the overall reaction from the community was,” Samenow said. “So I think some of the experts I sent the study to bit their tongue in a sense, and were very diplomatic in their critique of the study.”

In the next 24 hours, once other science journalists, bloggers and scientists had had the opportunity to digest the substance of the study, Samenow said, that’s when you started to see the criticism of the paper harshen.

“So I tried to capture that in the piece I did the next day,” Samenow said, “to sort of give people a sense that the research community did have some major reservations about some of the methodology in the study, not all of it.”

Samenow obviously went to great lengths to seek outside comment in his original piece. Could he have anticipated that some outside experts might originally be too reserved in their criticism, and if so, is there anything he could have done about this? As Samenow describes it, Ed Yong happened to speak to Jeff Lazo, who was very candid in his comments and criticisms of the methodology shortcomings.

“The researchers I talked to did not provide that sort of feedback,” Samenow said. “And if I had talked to Jeff, the tenor of the story might have been quite a bit different. It’s hard, because you know, this was a study coming from a reputable journal, and so I think… and I’m not a statistical expert, and that’s where a lot of the criticism was… and in some of the conclusions they [the authors] drew based off of a limited data sample. And so, this had gotten through peer-review at a prestigious journal, and the researchers that I queried were not particularly harsh about that aspect of the study. Now if I’d talked to a different set of researchers, they might have been. I think it’s sort of the luck of the draw who you choose to talk to, and their personality. Are they more naturally contrarian and looking for aspects of studies to really try to tear apart, or are they trying to be constructive and respectful?”

But as we will see later, Samenow might have gotten more critical comments by seeking out researchers in the social sciences as opposed to researchers in the physical sciences. The outside experts quoted in Samenow’s original post were largely meteorologists and former officials from the National Hurricane Center and the American Meteorological Society. One outside expert in risk communication, Gina Eosco, quoted in Samenow’s original post did express some concern:

“The focus on the gendered names is one factor in the hurricane communication process, but social science research shows that evacuation rates are influenced by many non-weather factors such as positive versus negative prior evacuation experiences, having children, owning pets, whether a first responder knocked on your door to tell you to evacuate, perceived safety of the structure of your home,” Eosco said. “None of these very important variables were factored into this study.” – Washington Post

I asked Samenow whether he would change how he reported the original story if he could go back.

“Yeah, I think so,” he said. “I think what motivated me to do the second post was the criticisms of the statistical analysis and of the modeling in the study. So I thought it was important, given the very stark conclusions I’d reported the previous day, to make clear that there were a lot of experts who had reservations about that part of the study. If I had talked to different experts from the beginning, I think my story might have brought out those criticisms towards the top of the story so they would have been more prominent, and the tenor of the piece overall might have been a bit more skeptical.”

It’s interesting that because Samenow reported the story originally in a blog as opposed to a print story, it might have been easier as a result to roll the narrative into the more skeptical post that came the following day. A print journalist might not have had that option to ‘self-correct.’

“I think blogs tend to report stories in a more iterative fashion,” Samenow said. “That’s the really nice thing about blogs, is that it’s not just one shot, right? So you can report on something, and then after the fact, you can get reactions so that you can provide a greater level of insight than you can if it’s just one story and then done. That opportunity to provide updates is a real benefit I think, in covering a story like this as a blog as opposed to a news story. Had this [the original post] been a print story, and had it been the only version of the story that readers were able to consume, they might have gotten a misleading view of the research.”

But how many readers finished the original post and never looked at the subsequent, more skeptical commentary?

Getting it Right: A Roller Coaster at the Associated Press

“This [story] was a roller coaster of decision making.” – Seth Borenstein, AP science writer

Seth Borenstein has an extremely high bar for covering new research studies. The Associated Press (AP) rarely publishes ‘knock down’ stories on scientific research – the ‘here’s a study and here’s all the problems with it.’ In this way, I suppose we could imagine the AP to be the equivalent of a prestigious scientific journal, where incremental or negative results are rarely published.

“If it’s very inside baseball, if it’s news for other scientists, if it’s incremental, we really don’t write about it,” Borenstein said. “We have a very, very high bar… sometimes it’s too high of a bar. I would say, on average, of stories on studies, at least 9 out of 10 that I look at I say ‘no’ to. […] And my editors say ‘no’ to me at least half the time.”

Borenstein has covered hurricanes since 1989. He is the co-author of two out-of-print books on Hurricane Andrew and hurricane survival. In other words, Borenstein ‘knows’ hurricanes. But this study was different in many ways that confused science journalists used to covering climate change, weather and climate.

“PNAS comes out with a tip sheet, I look at it, I look at the title, I open the paper, and my jaw just drops,” Borenstein said. “Oh my god. My first reaction is, how stupid. I noticed that the authors are social scientists, not meteorologists. And so my first instinct is, I’m not going to write about this.”

After contacting study author Sharon Shavitt regarding several issues he had with the methods of the research, including the fact that “Sandy” can be perceived as an androgynous name, Borenstein again essentially decides not to cover the story.

That is, until an editor walks past Borenstein’s desk on the Wednesday afternoon before the paper’s embargo deadline on June 2, sees the paper pulled up on Borenstein’s computer and stops to stare over his shoulder and ask questions about these ‘deadly female hurricanes.’

“It stopped her walking by,” Borenstein said.

He realizes the story is a “talker.” Everyone else is going to write about the ‘deadly female hurricanes,’ and people are going to talk about it. Borenstein is, in his own words, truly conflicted. This is a narrative we don’t often see as science news critics and scholars – the behind-the-scenes intellectual struggle of a science journalist trying to decide whether and how to cover high-interest scientific research in a responsible way.

After talking to his boss in New York, Borenstein realizes why the study is still ‘naggingly’ interesting despite major methodological issues and potential overreaching conclusions on the archival hurricane data front. It’s the psychology component of the paper, versus the ‘which are more deadly,’ question, that captures Borenstein and his editor’s attention.

“If you look at it, there are two parts here. The psychology of how we react, and the issue of were they [female hurricanes] more deadly,” Borenstein said. “If we just pay attention to the psychology part, that’s kind of interesting.”

So that’s how Borenstein approached reporting the story: focusing on the experimental psychology results while being careful to hedge the paper’s findings on death tolls and damage in relation to female vs male named hurricanes.

Borenstein says he thinks he found a happy medium in his coverage of the paper.

“There were a bunch [of writers] whom I respect who were far more snarky than I would have been, and that made me a little nervous,” Borenstein said. “But I think that’s because they got stuck in meteorology, meteorology, meteorology, and it really wasn’t a meteorology study. The more you looked at it, it was a sociology, psychology thing.”

“I think you had such a spread of stories,” Borenstein said. “There were some that I thought were not skeptical enough. The one that stands out to me probably the most, and it sounds awful since it’s my main confederate, is Reuters... They understood that it was a social science study, and that’s good. But they didn’t point out the issue of statistical significance. That’s the one I probably had the most trouble with. I didn’t have as much trouble with the ones who went too much on the other end. I can understand that.”

Sharon Begley wrote the Reuters story on the study, titled “What's in a (hurricane) name? More deaths: study.” (Begley notes, however, that she doesn’t get to craft the headlines). Begley said that like everyone else, she scans the top science journals every week, including PNAS. When she saw the PNAS hurricane study, she recognized a ‘grabby’ subject that she also trusted had been critically peer-reviewed.

“There was no question that it would interest general readers,” Begley said. “I would have loved to reach more people to comment on it. My first efforts were the National Hurricane Center and the World Meteorological Association in Geneva. The National Hurricane Center just flat out refused to comment on the substance of the story of the paper, and they would not let me talk to any of their scientists. That was sort of a brick wall.”

In the end, as Begley tells it, at a wire service where reporters have a few hours to work on a story, she simply ran out of time. She ended up running the story with predominantly positive comments from the study authors themselves.

“The good thing about science is that it’s often, I won’t say always, but often self-correcting,” Begley said, referring to the critical commentary that came in blogs and online news sites in the aftermath of first reports lacking such commentary.

Borenstein, on the other hand, found more time to engage in skepticism.

“I think Jason [Samenow] and myself ended up in the same place, but came at it from different directions,” Borenstein said. “I started skeptical and the more I looked I become less skeptical, and he started not skeptical and the more he looked he became more skeptical.”

“It would be interesting to see how television covered it if they covered it,” Borenstein concluded. “When you look at science writers, you’re looking at a group who very much comprehend the concept of context and peer review. It’s when these things go outside the world of science writing that you’ve got to wonder how well they’re written and covered.”

Would the story have been different if the research hadn’t been published in PNAS, a highly prestigious scientific journal?

“There are essentially three journals that people pay attention to, and it was in one of them,” Borenstein said. “If it were in Bulletin of The American Meteorologists Society it would probably have still gotten attention, or if it were in one of the AGU journals, it would have but it would have been different. What would have happened is one reporter would have noticed it first, and then everyone else would jump on it. With PNAS, just like Science and Nature, one you have that several days of embargo so you can get all the outside comment and do it right. And for most science writers, PNAS is part of your weekly routine.”

The result: a wide swatch of news reports coming out at the same, many with overly positive things to say about a scientific paper that turned out to have major flaws in particular aspects.

Critical Comment – But Why the Sensational Headline at USA Today?

According to Doyle Rice at USA Today, the fact that this study came out of PNAS, a prestigious scientific journal, and the captivating nature of the subject made this story more than worth covering.

“It seemed like a natural for USA Today,” Rice said. “Social science as it relates to weather has been in the news lately, in the meteorological world, of how to better get the message of warnings […] when storms are coming, how to get the message across from a social perspective.”

Rice seemed to “get” that the study was about social science and psychology, not about physical science or meteorology. This was reflected in his choice of outside experts quoted in the story. Rice got outside comment from Hugh Gladwin, associate professor in the department of sociology and anthropology at Florida International University, who called the paper “very problematic and misleading.” He also got outside comment from Jeff Lazo, director of the societal impacts program at the National Center for Atmospheric Research (the expert Ed Yong also consulted in his more skeptical blog post).

Rice said he was also careful to phrase his lede such that the findings were clearly coming from this particular study and did not seem to represent a larger scientific consensus. I personally only find some fault with his story’s headline, “Ladykillers: Hurricanes with female names deadlier.” The headline doesn’t seem to match the skeptical comments appearing later in the story. Compare it to Ed Yong’s more questioning headline: “Why Have Female Hurricanes Killed More People Than Male Ones?”

But Rice said he doesn’t think he would have reported the story any differently given subsequent criticisms of the research expressed in the science media and the blogosphere.

“If I’d written it without getting any perspective, if I was writing as if it was a story that didn’t have any controversy, or didn’t have another perspective… I wouldn’t have wanted to put that story out without having perspective from other people,” Rice said. “But since I did, I felt fine about it.”

Rice does admit that the paper, especially the statistics involved in the archival data results, were difficult to mull through.

“Of the last 50 years since they’ve been naming hurricanes, for a little over half of that there were only female named hurricanes. There were no male named hurricanes before 1979,” Rice said. “It was still a little puzzling how they accounted for that in the study. Apparently they used some metrics or some way of analyzing the data and evening that out, even though there have been many more female hurricanes than male hurricanes. That’s one thing I would have liked to dig into a little deeper, with some of the methodology of how they did that, but it would have made the article too long, and it may have been getting too much into the weeds.”

Rice certainly reflected these concerns through several critical quotes from outside sources in his USA Today story.

Shankar Vedantam

Wait, what? Hurricanes with female names kill more people than 'male' hurricanes? http://t.co/xZ1J0kbSGR My #NPR story — Shankar Vedantam (@HiddenBrain) June 3, 2014

Shankar Vedantam (@HiddenBrain on Twitter), NPR science correspondent, also seemed in my opinion to cover the PNAS hurricane study in a rather overly positive light. His NPR story, however, does focus legitimately on the psychological results of the paper over the physical implications of hurricane names in terms of deaths. But he said this of the study in an e-mail to me:

The hurricane study had two dimensions. One, a correlational piece, which built a model of the “archival death data” caused by hurricanes with masculine and feminine names. Second, it had an experimental portion where volunteers evaluated the danger of storms with different names. Several researchers have criticized the first part of the study. The study researchers have pushed back, and argued their methodology and conclusions are sound. If there is more to say, I expect this will be worked out through the peer review process which, understandably, takes time. […] As far as I can tell, no one has questioned the experimental portion of the study, although some have raised questions about some volunteers being college students. I raised that question with the author Sharon Shavitt and she told me that the volunteers also included a sample of adults from young adults to the elderly. If the study had only involved the experimental data, I wonder whether anyone would have questioned the basic conclusion – that we ought to be careful about subtle biases involving male and female names.

On air for NPR, Vedantam added the following caveat to the study while interviewing Shavitt: “The short answer is there is not direct evidence. Destructive hurricanes are very rare, and we don’t know whether people actually behave differently in hurricanes depending on the name of the storm.” Vedantam did not include outside experts directly in the story.

In Summary: Skeptical

The feelings of the wider science journalism community regarding this paper might best be put in the words of a PhD student who responded to @HiddenBrain’s tweet: “Skeptical. But lab exp[eriment] is interesting.”

The positive I see in the news and science blog reporting of the PNAS hurricane study is that coverage quickly turned to critically picking apart the study, highlighting its meaningful findings as well as its overdrawn conclusions. As for the first news reports, I think the narratives of Samenow, Borenstein and Rice reveal a more complicated story than traditional media journalists getting the story wrong to various degrees and scientists and bloggers rushing in to express needed skepticism. I think the story of the PNAS female hurricane study highlights areas where we could all help more journalists get the story more right:

Be aware as scientists how we are framing our scientific research. The PNAS paper itself has a rather sensational title - Female hurricanes are deadlier than male hurricanes – and very strong, and perhaps overreaching, claims within the first page. Can we expect news reports to have more conservative headlines than the paper itself? Be aware as journalists who we are selecting as sources for outside comment, and exactly why these outside sources are qualified to comment on the study. Similar to peer-review, you generally wouldn’t ask a physicist to provide peer-review to a mass communication research study. This paper created some confusion, as many of the reporters covering it were on climate science and weather beats. But the expert meteorologists were perhaps not the best people to comment on the statistics, findings and implications of a largely social science and psychology study. Be aware of the constraints of journalists, and understand why particular scientific research is covered prominently in the news. This paper could hardly have come at a better time – the start of hurricane season, and at a time when hurricane naming and risk communication is a hot news topic. We have to understand that this research garnered coverage first because of the high-interest nature of the topic, and only second for its scientific relevance or (probably lower on the list) its statistical rigor. Understanding that can help us as readers approach these news reports with some skepticism ourselves.

A few notes on the study of my own (Nerd Alert!)

These are my own thoughts on the study, based on my own expertise in mass communication research. Proceed with care.

First of all, calling hurricanes with feminine names “much deadlier” than those with masculine names is a tad misleading. This conclusion is based upon correlational data where minimum pressure, normalized damage, the femininity/masculinity of the hurricane name and interactions between these different variables are entered into a mathematical model for predicting total deaths. The femininity/masculinity variable is NOT a significant predictor on its own in the authors' final model. Instead, only its interaction with damage or minimum pressure is significant, leading the authors to conclude that only relatively strong feminine named hurricanes have been associated with more deaths historically.

But just because we find that feminine named hurricanes are associated with more deaths than are male named hurricanes throughout history, I don’t think we can say, as the authors do, that “In other words, […] changing a severe hurricane’s name from Charley […] to Eloise […] could nearly triple its death toll.” The real-life Charley and Eloise likely had many factors that differentiated them, other than their names, even if they were of similar intensity as measured by damage and minimum hurricane pressure. [And what about level of flooding? Storm surge? Similar hurricanes can have very different storm surge intensities.]

What the researchers do next is telling. In order to say why female named hurricanes have been associated with more deaths historically, they go to the lab. I’ll repeat that: the lab. This is important, because while lab environments are great for isolating particular effects and their causes, these same environments are historically bad for determining the causes of real-world phenomena such as hurricane deaths when storms have female names.

Don’t get me wrong, lab experiments are GREAT for finding significant effects for small changes such as whether a hurricane has a male or female name. And the authors of the PNAS study present very compelling experimental results.

We know that when people have very little contextual information, they can still be great at using heuristics, or short-cuts, to arrive at important decisions. It’s why political ideology can determine who you vote for, even if you know next to nothing about the candidates. Political ideology, barring other information, is a great short-cut to tell you how to vote. Similarly, it seems, when you know nothing else about a hurricane, you might use the name as a mental short-cut to tell you how intense it’s going to be, especially if someone asks you to rate the storm intensity based on the name alone. In other words, when the researchers asked participants to make a decision based on limited information about a hurricane, the participants apparently used the masculinity of the name to make their judgments on hurricane intensity and evacuation decisions (in a lab). Why didn’t the authors measure other individual factors that might have influenced evacuation decisions? Experience with previous hurricanes, place attachment, etc.?

I should also note the differences in average perceived intensity and evacuation intention regarding male vs female named hurricanes are relatively small in this study (a move from 5 to 5.5 on a 7 point scale in one instance, and a move from 2.3 to 2.9 on a 7 point scale in another). The effect sizes are also relatively small (around .05 in this study; effect size of 0.1 is considered small, while 0.5 is considered large).

But in general, the experimental lab findings of the paper seem solid, even if masculinity/femininity of hurricane name explains only relatively small changes in perceived storm intensity and evacuation intent. The biggest problem arises in trying to take the findings of the lab experiments and apply them to real-life hurricane death tolls. The authors could have looked at so many other outcomes, including for example how many people actually evacuated historically from male vs female hurricanes. If the authors had found that more people evacuated from the same town in south Louisiana in the face of a male vs a female hurricane of similar strength, that might have been more telling. But in the end, a definitive answer might only come from a field experiment, or an experiment involving more real-world validity. (Doing this ethically, however, might be difficult).

Ok, nerd time over! Please leave thoughts below!