Imagine that you’ve been studying hard for a couple of years and have finally passed HSK 6. Going by the HSK wordlists, you’d know about 2,600 characters and around 5,000 words.

The Chinese government defines literacy for urban, white collar workers as knowing more than 2,000 characters (source), so congratulations, you’re no longer illiterate.

Unfortunately, you may still feel illiterate because if you try to read native level newspapers and novels, you’ll be overwhelmed by new words and characters.

How overwhelmed you ask?

Let’s look at the numbers and see how often you’ll come across an unknown character if you know all the characters from the HSK wordlists up to level 6.

Normally words rather than characters are the important unit to consider when learning Chinese, but this article looks at characters because that will let us establish the minimum baseline of unknown words that someone at HSK level 6 is likely to encounter. You might be able to guess the meaning of unknown words you read in context if you know all the characters, but a word with an unknown character in it is guaranteed to be at least partly unknown.

The following tables contain HSK 6 character statistics generated by Chinese Text Analyser for several novels of varying length and difficulty.

Each table contains the total number of characters in the novel (both unique and overall), along with the total number of characters from the novel that are also found in levels 1-6 of the HSK wordlists.

《活着》

Characters Total HSK 6 %HSK 6 Unique 2,015 1,776 88.14% All 81,508 79,679 97.76%

《哈利波特与魔法石》

Characters Total HSK 6 %HSK 6 Unique 2,806 2,215 78.94% All 132,950 128,044 96.13%

《哈利波特与死亡圣器》

Characters Total HSK 6 %HSK 6 Unique 3,221 2,241 75.16% All 307,817 296,079 96.19%

《天龙八部》

Characters Total HSK 6 %HSK 6 Unique 4,118 2,552 61.97% All 1,023,987 983,936 96.09%

What’s interesting about the above numbers is that even though the more difficult texts have a much higher percentage of unique unknown characters, the percentage of all characters you’d be able to recognise is roughly similar at 96-97%.

You might look at those numbers and think that 96-97% recognition of all characters seems pretty good, but for the purposes of reading comprehension it’s actually quite terrible. If you do the math, that’s 3-4 characters out of every 100 that are unknown, which works out to one unknown character for every 20-30 characters you read. For reference, that’s about this much text:

我比现在年轻十岁的时候，获得了一个游手好闲的职业，去乡间收集民间歌谣。

Or one unknown character per sentence.

A typical Chinese novel has about 500-600 characters per page, and at the above rate, that works out to roughly 20 new characters per page.

That’s what HSK 6 gets you in real terms when reading a Chinese novel, approximately one unknown character per sentence and twenty new characters per page.

If your goal was to have no more than one new character per page of text, you’d need to recognise 99.8% of all characters on the page.

According to the JunDa character frequency list for imaginative texts you’d need to know ~4,400 of the most frequent characters to get that level of coverage.

In other words, HSK 6 gets you about halfway there.

If that seems disheartening, it gets worse.

Those 2,600 characters you already know get you 96-97% recognition of all characters you are likely to encounter in a novel. The next 2,000 characters will only get you an extra 3-4% recognition, and that makes learning them a long hard grind, with little reward along the way.

The point of this article is not to demotivate you though, rather it’s to help you set expectations.

If you’ve passed HSK 6 and are wondering why you still can’t read novels without being overwhelmed by new vocabulary, well, don’t worry, it’s completely normal and as the numbers show, it’s to be expected.