By David Mendoza - Monday, January 26, 2015

Last week, reddit hosted an AMA with Chris Ingraham of the Washington Post, David Yanofsky of Quartz, and Ritchie King of FiveThirtyEight. These three data journalists fielded a variety of questions, but this one caught my eye: “Do you say data is or data are?” Yanofsky deferred to his employer’s style guide: Quartz uses “data are.” Despite expressing a preference for “data are,” King is forced to use “data is” by his tyrannical boss, Nate Silver. Ingraham takes a more agnostic approach, preferring to play it by ear.

As you can see in the chart below, however, Americans are definitive in their preference. By an overwhelming majority, 77% of respondents chose “data is” over “data are” in this sample sentence: “Some experts say it’s important to drink milk, but the data is/are inconclusive.” Though wrong, a fifth of people surveyed selected “data are.” (I explain why they’re wrong below.)

Click here to embiggen this image.

Among all respondents, they were nearly equally divided between those who said they spent time considering if the word “data” was a singular or plural noun. Unsurprisingly, though, “data are” users were over 30 percentage points more likely to say that they have thought about this topic before. And like any good grammar pedant, “data are” users were 24 percentage points more likely to care “a lot” or “some” about this debate than “data is” users.

These results come from a poll Walter Hickey, King’s colleague at FiveThirtyEight, commissioned from SurveyMonkey Audience last summer. The sample size for the survey was 1,129 people. You can see the complete results here and read Hickey’s original post about it here.

Some other differences emerge between people who thought data was singular and those who thought it was plural. Younger respondents were the most likely to use “data is.” Those over 60 were the least likely to. Nearly a third of the most educated used “data are.” Perhaps this is explained by the fact that people with graduate degrees work in professions that are more likely to encounter the word datum, data’s rarely seen singular form. For instance, the American Psychological Association’s style guide requires followers to treat data as plural and therefore use “data are.” Regionally, respondents residing in the West were the most likely to use “data are,” while those in the Northeast were the least likely to.

Click here to embiggen this image.

Now here’s the reason why people who chose “data are” in the sample sentence are wrong. In the context of the sentence, “data” functions as a mass noun. Mass nouns — if you remember what your grade school English teacher taught you — cannot be counted since it comes “in variable but conceptually undifferentiated quantities,” as Professor of Linguistics at the University of Pennsylvania Mark Liberman notes. Thus, like other mass nouns, data is singular. If you substitute a synonymous mass noun like “evidence” in for “data,” it only makes sense if you use “is” (i.e., “but the evidence is inconclusive” vs. “but the evidence are inconclusive”).

Now this isn’t to say that “data is” is always and exclusively correct. There are many instances outside of scientific or technical writing were “data are” could be considered grammatical. Then again, linguist Geoff Nunberg contends that just because “data are” could be considered correct, it doesn’t also mean you don’t sound ridiculous saying that. As he wrote on Language Log,

My own view is that there are contexts where it’s okay to treat data as a plural, but none in which you can’t treat it as a singular—and that contrary to what many “reasonable” usage writers counsel, this isn’t simply a matter of “style and personal preference.” As the Economist example shows, there are times when treating data as a plural makes you sound not simply like a pedant but a fool.

But I’m sure many of you would vehemently disagree.