Posted

At Mandarin Companion, we go to great lengths to ensure sure every book we publish is carefully written to be level appropriate and as easy to read as possible. The positive feedback we frequently receive from readers give us a good indicator that we are hitting that mark.

However there hasn’t been any independent research about our books until now. We were excited to read a research paper titled “Finding Something to Read: Intelligibility, Readability and Learner Chinese Texts”, written in 2017 by data scientist, particle physicist, and Chinese language enthusiast Dr. James C. Loach who used his expertise in analytics to create an algorithm to analyze the readability of Chinese texts written for second language learners. The smart man that he is, he too is an advocate of extensive reading for language learning and understands the immense impact fluent reading has on language acquisition. His findings were fascinating.

Readability

There has been very little dedicated research into the readability of Chinese. Dr. Loach defines readability of a text as the degree that a learner is able to “read fluently and enjoyably”. This highlights the importance of correctly matching the learners reading level with the text, a “non-trivial” task, as research shows that a relatively small difference in reading comprehension can result in a dramatic change in readability. This conclusion is also supported by the decades of research from the Extensive Reading Foundation which can be boiled down to this single chart.

As we know, Chinese is a very unique language due to the complexity of the script where one must master several thousands of distinct symbols to read native texts fluently. Dr. Loach notes:

The consequences for reading [Chinese] are so acute that even upper-intermediate learners, with useful levels of conversational Chinese, can struggle to find meaningful things to read. Basic [native level] texts often use characters that are only known by advanced [second language] learners, and the market for dedicated learner-oriented texts is extremely underdeveloped.

The Methodology

Leveraging his expertise of particle physics, Dr. Loach created an algorithm to assess the readability of a text and assign it a “readability” score. Since readability largely depends on the level of the reader, they used the character lists from the six levels of the HSK standards (the HSK is the standardized Chinese language proficiency test developed by the Chinese ministry of education). The levels of the HSK tests are tiered according to vocabulary words used and progressively become more advanced. HSK level 3 is considered basic competency while level 5 and 6 would be an advanced learner.

In comparison to a specific HSK level, the Chinese text analyzed was given a readability score ranging from 0 to 100*. Dr. Loach and his associates then fed Chinese texts into the algorithm to see how readable they are.

The Findings

For the study, they analyzed six Mandarin Companion books, three level 1 (300 characters), three level 2 (450 characters), and six books from the Sinolingua graded reader series, tiered with its first level starting at 500 words and moving up to a 3000 word level. The results certainly caught our attention.

With the Mandarin Companion books, there is good consistency between the books at each level and the higher-level books are indeed found to be slightly more difficult.

The results for the Sinolingua books are more surprising, showing that the lexical difficulty of the books does not increase in the way that would be expected based on their titles. In addition (though not shown), the difficulties of the stories inside particular books are found to vary significantly. Manual inspection of the books accords with the results of the algorithm. In particular, the 1000 Word and 2000 Word books do appear to be simpler and easier to read that the 500 Word book.

Our internal analysis at Mandarin Companion has shown that if you are at an HSK 3 level, you should be able to recognize 95.3% of the characters in our level 1 books. Our experience had also shown that those who have passed the HSK 3 level and are working towards HSK 4 were excellent candidates for our level 2 books. This research paper indicates that our series is leveled appropriately.

What perhaps most surprising was what seemed to be a large disparity within the Sinolingua series. The 500 word level reader has a similar readability score, and in some respects lower, than the 2500 word reader. Based on the analysis of Dr. Loach, it appears you would need to be at an advanced level of HSK 5 (2,500 vocab words) before you would be able to begin reading any of the books in this series regardless of the word level printed on the cover. Our anecdotal interaction with other learners who have used this series have shared experiences that support the conclusion of Dr. Loach’s analysis.

Conclusion

It is not easy to write Chinese books that are easy to read! Compiling a list of frequently used characters is just the first step in a very involved process. This latest research paper shows that even if a book uses a small amount of characters, it does not necessarily mean it will be easy to read.

It also confirms that Mandarin Companion level 1 is highly readable for people at an HSK 3 level, which is very encouraging for students preparing for the test.

This is the first independent study we have seen supporting the Mandarin Companion series and it is quite possible there will be more to come in the future. In the meantime, we’ll continue to publish books you can read.

*The research paper applied a readability score from 0 to 1. For this article, we multiplied the readability score by 100 to give it a range of 0 to 100.