I’ve been excoriated a few times in comment threads when I’ve asserted that scientific information made accessible through open access (OA) publishing doesn’t necessarily give users access to the information — that actually, unless you’re well-educated and possess specialist knowledge, much of the information in the scientific literature will be inaccessible simply because you can’t understand it, much less put it to use.

With some experience teaching literacy to adults and English to non-English speakers, I know you can’t take language mastery for granted. Speaking isn’t the same as reading, fluency in daily activities doesn’t equate to comprehension of abstract or specialized information, and comprehension goes beyond words and sentences to realms and signals that are often “meta” to the words and sentences used.

Most scientific papers use a vocabulary pitched to a highly educated audience. It’s then sprinkled with jargon for a specialist audience familiar with statistics and trial design (quick, name the differences between a cohort, cross-over, or cluster trial design). Making sense of even the methods requires appreciable amounts of knowledge.

But let’s begin with readability.

A common measure of basic readability is the Flesch score. Standard reading level is at a Flesch score of about 60 or higher, with lower scores equating to more difficult reading. The scientific literature is pegged in various studies to have a reading score of around 30, with one 2002 BMJ study finding that British English is actually more readable than US English. A low readability score — lower equals more demanding — makes the information difficult or impossible to comprehend for the majority of non-scientific readers.

But how difficult a challenge is readability for Wikipedia readers? A study in 2010 of cancer information on Wikipedia found that while the information was as accurate as that in a professionally curated database, its advanced reading demands made it much less accessible (the professional database required about a 9th grade reading level, while Wikipedia’s cancer information required about a college sophomore’s reading skills).

Leveraging this and other information, a group of researchers in the Netherlands analyzed the readability of Wikipedia overall. Their paper, published in First Monday, is quite readable.

Wikimedia, the parent of Wikipedia, has known for some time that its core product is too difficult to read — to address this, it introduced a “Simple English” version of Wikipedia in 2003, which contains about 68,000 articles (compared to the 3.5 million in the main wiki). The Basic English suggested for the Simple English Wikipedia consists of about 850 words. In late 2003, the Simple English version had a readability score of 80 (Easy). By 2006, this score had dropped to 70 (Fairly Easy).

One interesting finding in this study is how many Wikipedia articles consist of five or fewer sentences. In filtering a few million articles, they found that 40% of Wikipedia articles consist of five sentences or less.

Eliminating these short articles, the authors found that the readability score of the majority of Wikipedia (73.5%) is 51.18 (SD = 13.84), which is lower than the goal of 60 (Standard). In addition, 45% of the articles could be qualified as Difficult or worse.

In the Simple English Wikipedia, 60% of the articles consist of five sentences or fewer. Eliminating these from the calculations, the reading score for the Simple English Wikipedia came in at 61.69, lower than the goal of 80. In fact, 94.7% of the articles scored lower than 80, and 42.3% were below the score needed to be considered Standard reading material.

The authors took a further step — comparing articles with the same title between the two versions. Comparing these, they found that the Simple English Wikipedia had an average score of 61.46 compared to a reading score of 49.27 for the general Wikipedia — both well off their reading level goals.

As the authors write in their conclusions:

The results of this study show that the readability of the English Wikipedia is overall well below a desired standard. . . . Moreover, half the articles can be classified as difficult or worse. This finding confirms our hypothesis that numerous articles on Wikipedia are too difficult to read for many people.

The authors also created a tool you can use to test the readability of any entry in Wikipedia.

There are two simple explanations for the declining readability scores in both the standard and Simple English versions of Wikipedia. First, it’s a skeuomorph of an encyclopedia, and encyclopedias are supposed to have formal, passive language and sophisticated vocabulary. That’s what we think it should be. The second explanation — robust borrowing from authoritative resources — also likely contributes to the works’ difficulties in reaching the optimal readability scores.

In fact, experts are notoriously bad at writing for non-expert audiences. Readability problems in medicine plague patient-education materials, for example. A 1989 study of the readability of smoking cessation materials found that “a serious disparity existed between the reading estimates of smoking education literature and the literacy skills of patients.” In short, many patients couldn’t read about how to quit. A 2001 study of patient-education materials aimed at parents and discussing pediatric topics found that the reading level was about four grades higher than the intended goal. A 2004 study of patient-education materials written for family medicine patients found a similar disparity. A 2008 study of orthopaedic patient education materials found that only 10% achieved the desired readability level. It seems that if you want to find an area where every medical specialty group is failing, you need look no further than the readability of their patient-education materials.

I once worked on a publication aimed at patients that went through academic editors as part of its approval process. It quickly became clear that the writers — who initially were writing at an appropriate level — were helpless to avoid writing for these academic editors after a time, because the feedback from these “experts” was firmly and unquestionably that too many nuances and qualifications were being left out by using short sentences and a limited vocabulary. Needless to say, the publication folded, having attained a reading level only a graduate student would understand.

Even when they try, it’s difficult for advanced readers and writers to write to a lower reading level. It feels condescending, like you’re “dumbing it down,” and also prevents you from using sentence structures and vocabulary that come to you naturally. It’s hard work, and it’s unnatural. But it’s also extremely important to do it right.

Wikipedia is free, yet it presents readability barriers to many of its users. This suggests that the challenge of making scientific information approachable — and therefore, accessible — is much deeper, much more difficult, and much more complex than simply removing reader funding.