Cloze Testing— The Leading UX Test for Content Comprehension

Use Cloze Deletion Testing to accurately assess content comprehension for product and marketing experiences

Working in the product industry for over a decade, I only learned about Cloze testing in 2016 — and I wish I learned sooner.

Often times, we assess content comprehensibility through correlative metrics: like tracking certain events (clicking a button), conversions (signing up), or other content variants (A/B testing). We also ask subjective questions in user surveys: “Which wording is easiest for you to understand?”

While somewhat effective, these methods do not directly and objectively test content comprehension. In fact, they are typically A/B testing dependent — meaning that they require relative comparison to test effectiveness (ex. having a user choose between two headers).

To bridge the empirical data gap, I contend that:

Cloze testing objectively measures user reading comprehension by elevating objective data over subjective opinion — a data-driven assessment for product and marketing content.

Jakob Nielsen provides a similar definition:

Cloze Tests provide empirical evidence of how easy a text is to read and understand for a specified target audience — measuring reading comprehension, not just a readability score.

Let’s Define

Well, what is legibility, readability, and comprehension? What’s the difference? Are they the same? Well, it gets a bit nuanced.

Legibility —how easily a reader can distinguish individual letters or characters from each other.

Readability — how easily a reader can consume and understand a written text. Typically, the complexity of the content’s vocabulary, syntax, and case impacts the reader’s ability to accurately understand the meaning of the words. Similarly, typographic presentation (font size, line height, and line length) also significantly impacts readability (source). The more readable the content, the less effort the user must expend and the faster the user can read.

Comprehension —how easily a reader can process text, derive meaning, and incorporate that meaning into a broader context. If a passage is easily comprehensible, then the user can:

Understand the meaning of the words

Derive broader meaning from the overall content

Draw inferences from the content

Answer broader questions about the content

Identify key point(s)

Identify intent

Understand the content’s tone, mood, and inflections

The difference between readability and comprehension can be summed up as: a sentence is readable if you can read the words; but, it is only comprehensible if you can understand the meaning and purpose of those words.

Well, this seems quite difficult to measure. But, what if we could empirically measure the exact point at which a sentence goes from incomprehensible to comprehensible? Readable to unreadable? Well, we do!

Let’s check it out.

The Background

Cloze testing was first codified by W.L. Taylor in 1953, but it wasn’t widely applied to the UX design and marketing fields until much later. In 2010, TW Bean compared both the Cloze and Maze (timed test) procedures for evaluating reading comprehension and found that “the maze test overestimated students’ ability to cope with an unfamiliar text, while Cloze appeared to be a reasonably accurate estimate.” Hence, the Cloze test remains one of the more accurate measures of reading comprehension.

The premise of a Cloze test is to assess whether the textual content is reasonably comprehensible for the target user spectrum. In other words, we are testing whether a reasonable user would be able to comprehend your content’s meaning.

The Cloze Deletion Test

A Cloze Test is a reading comprehension assessment whereby the participant is presented with a passage that has missing words or signs, and then must fill in the blanks. It tests your user’s ability to decode systemically interrupted and disjointed messages by making the most acceptable substitutions (given the content’s context).

Cloze tests require the ability to understand context and vocabulary in order to identify the correct language or part of speech that belongs in the deleted passages.

The participant is presented with a portion of language with certain words or signs removed. The participant must read the modified text and try to guess the missing words. The participant is scored based on the percentage of correctly guessed words, which yields a comprehension score.

It is important to note that the content does not need to be 100% comprehensible, but rather needs to pass a predefined threshold.

While the threshold can vary, Nielson uses the benchmark comprehension score of 60%:

If users get 60% or more right on average, you can assume the text is reasonably comprehensible for the specified user profile employed to recruit test participants.

Traditional Cloze Test

Traditional literary assessments typically involve large passages with an associated word bank.

Procedure

Recruit participants within your target demographic. Try for a randomized sample size of at least 10 to increase the statistical significance of the test. Select a content passage with clear key points and a targeted message. You may use both a printed passage on paper or a digital version where the user can type the selections. Time constraints are optional, but are helpful when testing the speed of comprehension. My recommendation is not to provide time constraints, but rather silently keep track of time to test the speed of comprehension. Omit every Nth word, replacing it with a blank space for the participant to write in the answer. The words you omit should be meaningful and distinct, so try not omit too many prepositions. Direct the participant to write only one word in each blank and to try to fill in every blank. Guessing is encouraged. It helps to see if multiple participants feel that a sentence means something different from the original intent. Advise participants that their spelling and score do not matter. This is a way of assessing the content itself, not the participant’s intelligence.

Scoring

In most instances, the exact word must be restored. A very similar word that also has the same meaning could also be counted as correct. Misspellings are counted as correct when the response is deemed correct in a meaning sense. Add up the total score to see if the comprehension is greater than 60% (6 out of 10 correctly filled in words). If the score is 60% or above, then it is reasonable to assume that the passage is comprehensible for your audience. You can then reasonably assume that like-content is also similarly comprehensible.

This works well for large passage assessment. In the product world, this would translate to help or support center content, large tooltips, action descriptions, and product descriptions.

Of course, in the UX world, words don’t exist in isolation on a blank page. There are situational and peripheral contexts that frame those words. Hence, the UX Cloze Test.

UX Cloze Test

If you’re looking to test marketing and product content, then I recommend modifying a traditional Cloze test to fit within the context of the webpage.

This provides a few benefits:

You can assess how typographic attributes (line height, size, word wrap) impact comprehension.

You can assess how supplemental visuals, color, and peripheral graphics help or hurt comprehension.

You can test how comprehension varies by device type: stationary web, mobile web, and native mobile.

Procedure

The same as a traditional Cloze test (above). You can choose either an oral or silent reading. An oral reading could provide more insight into the participant’s reasoning, but may also make the participant nervous and hesitant. A silent reading provides a more natural reading experience, but will not provide insight into the participant’s reasoning. The participant should observe the content within the context of the webpage. The webpage does not need to be a live, but can be a static copy with every Nth (5th word) omitted. Be sure to focus on your product’s key value propositions. Direct the participant to fill in the missing words, either with or without a word bank. Ensure that the participant feels comfortable and relaxed. This should not be a stress-inducing activity. It is okay not to get 100%. You may also add an optional debrief at the end of the test to learn more about the participant’s overall thoughts on the content: what was confusing? what was a guess? what wasn’t?

Scoring

The same as a traditional Cloze test. You could also be recording other contextual metrics: speed of comprehension, qualitative assessment of participants’ reasoning, and how different content variants (including page peripherals) impact comprehension.

TLDR

Cloze testing can help product and marketing teams accurately assess the comprehensibility of their content. It also objectively measures user reading comprehension by elevating objective data over subjective opinion — a truly data-driven assessment for product and marketing content.

A/B testing has its limits because it is a test of relativity — how A performs against B. Cloze testing does not require comparable variants, but rather, uses a quantitative score to assess the efficacy of individual passages.