I’m working on a reliable, machine-readable edition of the Jōyō kanji data, and this came up. Can you spot the difference between 𠮟 and 叱? Me neither. Let’s look at the reference image:





…Welp. The left one is a left-to-right stroke stopping at the end, in the model of 七 “seven”; the right one is right-to-left, sweeping at the end, as in 匕 “spoon / sitting person”. But, still. These government people are very through, to list these minor variant glyphs of the same character.

Except these are supposed to be different characters altogether.

⁂

Let’s recap: a character is an abstract entity, and a glyph is a variation of the same character. The shapes ‘a’, ‘a‘ and ‘a‘ are different glyphs of the character LATIN SMALL LETTER A, and font designers can come up with literally infinite more shapes. The text standard for computers, Unicode, assigns one number (“code point”) to each character, not to each glyph; glyph variations are decided by fonts.

However, in the case of Chinese characters, things get blurry. If a character had variants with significantly different shapes (such as 兑 vs. 兌), it was given one code point for each. Only very minor variations were “unified” in the same code point. Unfortunately, these minor variations tend to be bound to locales – The Japanese cross the blade in 刃, the Koreans don’t ­– which means that even the timid unification was hugely controversial. One can, of course, use their country’s version of the characters simply by choosing an appropriate font; but computers don’t always choose the appropriate font, which means that from time to time Taiwanese people would stumble upon Japanese-style glyphs with are obviously completely wrong and unnaceptable (or the other way around).

A mechanism was designed to pacify this, which is the variant forms. Special, invisible control characters can be added to tell the computer which graphical variant is intended. However, most software don’t support this mechanism yet.

The Jōyō Kanji standard has a thing for telling people that the glyphs they’re using are wrong. There are two kinds of variants in the document. One are the “acceptable character forms” 許容字体. These are five characters (餌, 遡, 遜, 謎, and 餅) where the de facto glyphs in modern society differ from what they say it’s the standard. So the popular glyphs are listed in the table (between brackets) as acceptable. These variants are unified in Unicode, and selectable only by variation selectors; I added the relevant variation sequences to JoyoDB, though, again, most computers won’t display them as of 2016. If you want to try, here are them:

Variant unspecified Standard variant Accepted variant U+990C 餌 U+990C,U+E0103 餌󠄃 U+990C,U+E0100 餌󠄀 U+9061 遡 U+9061,U+E0101 遡󠄁 U+9061,U+E0100 遡󠄀 U+905C 遜 U+905C,U+E0101 遜󠄁 U+905C,U+E0100 遜󠄀 U+8B0E 謎 U+8B0E,U+E0101 謎󠄁 U+8B0E,U+E0100 謎󠄀 U+9905 餅 U+9905,U+E0101 餅󠄁 U+9905,U+E0100 餅󠄀

If they look the same to you, that’s too bad. Come back to this post in 10 years. Meanwhile, here are the reference images of what they should look like:

Standard variant Accepted variant U+990C,U+E0103 U+990C,U+E0100 U+9061,U+E0101 U+9061,U+E0100 U+905C,U+E0101 U+905C,U+E0100 U+8B0E,U+E0101 U+8B0E,U+E0100 U+9905,U+E0101 U+9905,U+E0100

So it’s just a matter of a) whether the “moving feet”(shin’nyō) component has one 辶 or two 辶 drops, and b) whether the “food” component, 食, is drawn “square” or in simplified cursive 飠.

The other kind of variant are the “popular-use character forms” 通用字体. These are non-unified characters; they got their own, distinct Unicode codepoints. Still, no one uses the recommended forms, so the Introduction gives a passing nod to the existence of the popular alternatives. This is related to the Japanese JIS character sets; the popular characters are the ones that were encoded in the first JIS releases, from whence they became well-established after the digital revolution.

Standard Popular U+5861 塡 U+586b 填 U+525d 剝 U+5265 剥 U+9830 頰 U+982c 頬

Since these are different Unicode codepoints, the difference will

show up in all computers; however, they’re still graphical variations

of the same fundamental Chinese character.

And then there’s 𠮟 vs. 叱: U+20B9F vs. U+53F1. At first sight it seems to be the same case as the three characters above. However, the Joyo document insists that U+53F1 is not the well-known Jōyō character with the readings shitsu and shi(karu) (“to scold”). One can confirm that they’re distinct characters in the classic Kangxi dictionary, page 173. Here’s what they were supposed to be:

Codepoint On Phonetic Kun Meaning 𠮟 U+20B9F shitsu 七 shi(karu) to scold 叱 U+53F1 ka 匕(< 化) – to open the mouth

What happened was that early computer practice had the shitsu/shikaru character drawn like the ka character. Ka isn’t used in modern Japanese, so no one cared. By the time they codified the distinction, people had already became used to 叱 (with a diagonal-stroked 匕) in the role of shitsu. What’s more, computers are more used to it; U+20B9F is a newer kind of Unicode character, outside the Basic Multilingual Plane (BMP), and software support to this day is still icky (this very blog system was giving me trouble to preserve it in the main text, and adding it to the title broke everything horribly) – not to mention the lack of font glyphs. Current input methods will choose U+53F1 for shitsu or shikaru, not for ka; and they won’t bring up U+20B9F at all.

Finally, even if the Japanese standards declare that this character shape is meant for ka/”open mouth”, the Unicode standard declares that the codepoint represents shitsu/shikaru “to scold” – the only concession for the original use being the data field kHanyuPinyin, which draws from the Hànyǔ Dà Zìdiǎn dictionary.

In effect, the two characters were accidentally unified as “to scold”, with the earlier “open mouth” meaning rendered obsolete. The Joyo Kanji document recognizes this, saying that now 叱/ka has become a graphical variant (異体字) of 𠮟/shitsu.