Ruby Markup

Intended audience: HTML coders, script developers (PHP, JSP, etc.), CSS coders, and anyone who wants to know how to mark up ruby annotations. Ruby is the name given to the small annotations in Japanese and Chinese content that are rendered alongside base text, usually to provide phonetic information, but sometimes to provide other information. We will assume that you are familiar with how you want your ruby to look. (If not, of how ruby works.) For a brief overview of what to expect when the browser renders the markup without CSS support, see the section What can I expect to see? below. This article will only discuss how to use HTML5 markup for ruby text. The aim of markup is principally to establish the relationships between the base text and the ruby text (the annotations). For information about how to then apply adjustments to the default styling of ruby text, see .

A simple example Let's start out with a simple example, then we can discuss more detailed aspects later. Let's suppose we want to produce the following: これは 日 に 本 ほん 語 ご です。 Just below is some code that can produce this. The rb element holds the ruby base, and the following rt element holds the associated ruby text. The example contains spaces so that the source code wraps to the next line. In fact you need to be careful about this. According to the HTML spec, you can have spaces before rt or rtc tags, but not before rb tags. However, some browsers currently add superfluous spaces to the rendered text if there are gaps or line breaks. For now, it's best to try to keep a whole ruby element on a single line, with no gaps, in the source code. これは<ruby><rb>日</rb><rt>に</rt> <rb>本</rb><rt>ほん</rt> <rb>語</rb><rt>ご</rt></ruby>です。 Test in your browser. There are a number of ways to use HTML markup to produce even this simple example. The code above uses opening and closing tags for all elements, and uses tags for rb elements, but we'll see in a moment that there are alternative approaches that use less markup, or move the markup around. Browsers will automatically position the annotations above the base characters in horizontal writing mode, and to the right in vertical writing mode. To change that you will need to use CSS ( ruby-position ). That and other ways to alter the rendering of the ruby are discussed in the companion article, .

Before going further, let's look at some general patterns for markup that are defined as valid in HTML. This is about how much markup you need, and the order in which marked up items appear. In the previous example we used rb tags, and we opened and closed each element. If you feel that this makes it harder to read the source code or to create things manually, then the HTML5 specification allows a number of alternative approaches. Interleaved markup Both of the following alternatives are also valid. Note that the basic pattern here is ruby-base ruby-text ruby-base ruby-text ... , which we'll refer to as interleaved. これは<ruby>日<rt>に</rt>本<rt>ほん</rt>語<rt>ご</rt></ruby>です。 Test in your browser. These two code samples produce ruby annotations above the relevant base text. これは<ruby><rb>日<rt>に<rb>本<rt>ほん<rb>語<rt>ご</ruby>です。 Test in your browser. The first of the above examples drops the rb tag, and is the style you will see in the examples in the HTML5 specification. The lower example drops the closing tags, resulting in the same simplification of markup, but making the rb content directly selectable for styling or scripting. It is important to note, by the way, that you can no longer allow extra spaces to appear inside the ruby element, since they will be produced in the output. You need to take care to ensure that your editor doesn't split the sequence across two or more lines while applying automatic source formatting! It is generally useful to include the rb tag to help with styling. HTML does allow you to style ruby base content in a couple of alternative ways. You could surround the rb content with a span element, although it's easier and at the same time more semantically meaningful to use the rb tag instead . Another way is to style the ruby element, then style the rt tags differently. However, be aware that if the rb content is not tagged, you won't be able to apply certain styling effects. (Bear in mind, also, that the need for such styling may not become apparent until a later date). For example, research for elementary and junior-high students by the Japanese government in 2010 indicated that 0.2% of them have difficulty reading hiragana, and 6.9% have difficulty with kanji. Kanji dyslexia is related to difficulty in visual recognition of complex drawings, and therefore adding ruby adds complexity and makes them even harder to read. The researchers tried several methods to improve readability and found that the best method was to replace kanji (ie. the rb text) with hiragana (ie. the rt text). Using CSS, a user or alternate style sheet can replace kanji with its annotation without having to change the markup (by hiding the rb tags and making the rt tags inline and full-size), but only if the rb content is independently selectable. So the conclusion here is that it's always better to use rb tags. Tabular markup There is another way of arranging the components of a ruby element. You can group all the ruby bases together and follow them with all the ruby text elements. The basic pattern here, then, is ruby-base ruby-base ... ruby-text ruby-text ... , which we'll refer to as tabular. One advantage to this approach is that it enables you to style the ruby text to appear inline in such a way that all the ruby text for a word follows that word together. Inline styling can be useful in space-constrained situations, where it would be too difficult to read small ruby characters. For example, if we take the interleaved example above and use CSS to make the ruby text appear inline, we will see something like: 日(に)本(ほん)語(ご) Usually, it would be preferable, while retaining the mapping of ruby text to individual base characters, to see: 日本語(にほんご) To achieve that, you can use the tabular approach shown below. これは<ruby><rb>日<rb>本<rb>語<rt>に<rt>ほん<rt>ご</ruby>です。 Test in your browser. Not all ruby annotations are displayed above the base. Each ruby base is now mapped to the same ruby text, but the order of elements has changed to be more like a tabular effect. You could visualise the markup like this: [ruby] [rb][rb][rb] [rt][rt][rt] [ruby] If you wanted, you could surround the sequence of rt elements with an rtc element, but that is mostly needed when working with double-sided ruby (see below). This tabular approach to marking up ruby also yields the expected results when a reader is searching for text in a page. If you are looking for the word 日本語 you won't find it in ruby that uses the interleaved model (since the text being searched actually reads 日に本ほん語ご). You will find it, however, if ruby uses the tabular model described here (because the underlying text is 日本語にほんご). In addition, the CSS Ruby specification which is in development, will place significant reliance on this model. For example, it is needed if styling is to be applied correctly for jukugo ruby.

What can I expect to see? As mentioned above, this article is concerned with how you should mark up content to correctly establish relationships between ruby base and ruby text components, rather than how to lay it out when the page is rendered, and the HTML5 specification's recommendations for the default style sheet don't go as far as indicating how to position ruby text relative to the base text. That is left to CSS, and we explore it further in the article . Having said that, however, you would expect the browser to be able to arrange simple ruby items visually in some sensible, albeit basic way, without applying CSS properties. All major browsers (by which read Chrome, Safari, Firefox, Edge and Internet Explorer) do position ruby text above horizontal base text when dealing with the simplest case, but handling varies when you move away from that. At the time of writing, Firefox is by far the most advanced in this regard. It will produce sensible results for all the markup descriptions in this page. When two ruby texts are associated with a single base text, however, it displays both on one side. This is fair enough: you are expected to indicate alternative positions using the CSS ruby-position property. It also has a bug when 'double-sided' ruby uses nested markup, and overlaps the two ruby texts. The other browsers fail to produce a useful layout when dealing with 'double-sided' ruby, except that Chrome and Safari do something sensible with nested markup. They also fail to position items appropriately when tabular markup is used, even for simple cases (although they parse the markup correctly). Finally, Edge and IE currently break down if you use only start tags inside the ruby element. All browsers allow you to style rb tags, which can be useful for the use cases mentioned earlier. The Internationalization Working Group has produced a set of for major browsers, covering the scenarios described in this page. The results are updated from time to time as browsers add more support.

Producing single-sided ruby This section tells you how to use markup for various types of single-sided ruby annotation, and covers various things to bear in mind. As mentioned above, there is more than one way to apply the markup – for the examples on this page, in order to make them clearer, we will use a consistent, minimal approach: start tags only, but rb tags included. However, the pages you reach when clicking on "Test in your browser" are fully marked up with start and end tags. Mono vs. group vs. jukugo The most common approach when creating ruby is to associate each base character with a single ruby annotation, ie. mono ruby. (All of the earlier examples illustrate mono ruby.) Mono ruby makes it easy to handle line breaks when justifying text, since the browser can split the line between any two base characters. It also maps base characters and annotations precisely, and allows styling to apply the fine rendering control you may need. Group ruby, on the other hand, assigns a single annotation to a sequence of base characters, and these base characters can no longer be split at the end of a line. Situations where group ruby is appropriate include sequences of base characters that are associated with a single phonetic sound, or semantic ruby that applies to a whole word, or even a phrase. Here is an example that shows group ruby on the left, and mono-ruby on the right. The two characters on the left are pronounced kyō, which is an indivisible sound. Note the difference in how the annotations are distributed in relation to the base characters. 今日 きょう の 会 かい 議 ぎ To mark up group ruby you simply put more than one base character in the rb tag, as shown in the following code sample. <ruby><rb>今日<rt>きょう</ruby>の<ruby><rb>会<rt>かい<rb>議<rt>ぎ</ruby>。 Test in your browser. If you want to apply jukugo ruby rules to your ruby text, you should mark up the content in the same way as mono ruby, using the tabular model*, and use one ruby element per compound noun. You don't need to worry about the overlaps in the markup. That will be taken care of by CSS. As previously mentioned, the markup simply estabishes the correspondances between base characters and annotations. Bopomofo Bopomofo, or zhuyin fuhao, characters used in ruby with Traditional Chinese characters are marked up in exactly the same way as mono ruby. No special markup is needed. The red coloring of the ruby text here is just to better show the position of the annotations in this example. There would not normally be a color difference. Note, in particular, how the tone marks appear to the right of the other bopomofo characters, even though they are not combining characters. You cannot expect bopomofo ruby to be aligned by default to the right of the base character. You will need to apply the appropriate CSS property. The markup merely establishes the relationships between the base characters and the ruby text. The positioning of the phonetic characters and tone marks to the right of the base character is achieved by styling. For example, the markup needed for the characters just above is as follows. <ruby><rb>第<rt>ㄉㄧˋ<rb>十<rt>ㄕˊ<rb>屆<rt>ㄐㄧㄝˋ</ruby> Test in your browser. Gaps in the sequence Occasionally you may want to mark up a sequence of base characters as a single ruby element when there is a non-kanji character in the middle of a word. Here is an example. 振 ふ り 仮 が 名 な One way to do this would be to use an empty rt element after り. You could do this using an interleaved approach. <ruby><rb>振<rt>ふ<rb>り<rt><rb>仮<rt>が<rb>名<rt>な</ruby> Test in your browser. Annotations don't appear above base characters. However, if you were to render the annotation inline, you'd have to ensure that your styling* removed anything that indicated the location of the missing character, so that you don't end up with 振（ふ）り（）仮（が）名（な）. If you want the inlining to produce annotations grouped into words, you would use the tabular approach. <ruby><rb>振<rb>り<rb>仮<rb>名<rt>ふ<rt><rt>が<rt>な</ruby> Test in your browser. Annotations don't appear above base characters. Edge puts all above the base, but the alignment is for group ruby. However, now the result would be missing a character. You would see 振り仮名（ふがな）, instead of 振り仮名（ふりがな）. A better alternative would be to repeat the character in both the base and ruby text, and rely on CSS or the browser's default style sheet to automatically hide the annotation when both base and ruby text are the same. See for details. <ruby><rb>振<rt>ふ<rb>り<rt>り<rb>仮<rt>が<rb>名<rt>な</ruby> Test in your browser. The り is shown as an annotation. <ruby><rb>振<rb>り<rb>仮<rb>名<rt>ふ<rt>り<rt>が<rt>な</ruby> Test in your browser. Annotations don't appear above base characters. The り is shown as an annotation. How long should my ruby element be? Given the ability to string multiple ruby pairings together in a single ruby element, the question arises as to what is the optimal number of pairings within any given ruby element. You are free to decide this for yourself. If, however, you want to use a jukugo ruby arrangement some time in the future, you will need to establish clear word boundaries, so that annotations don't overlap adjacent words. This may also be important if you want to produce inline versions of your annotations, and you have used the tabular markup approach. That will ensure that the annotations appear after the words that they refer to. In these cases, you should start a new ruby element for each word. Here is an example where compound nouns are annotated in separate ruby elements. (We use a dotted red line to show the boundaries between each compound noun.) 常 じょう 用 よう 漢 かん 字 じ 表 ひょう And here is one way to code it. <ruby><rb>常<rt>じょう<rb>用<rt>よう</ruby>

<ruby><rb>漢<rt>かん<rb>字<rt>じ</ruby>

<ruby><rb>表<rt>ひょう</ruby> Test in your browser. If you want inlined annotations appear on a word by word basis you would want to code using the tabular method. That would look like this. <ruby><rb>常<rb>用<rt>じょう<rt>よう</ruby>

<ruby><rb>漢<rb>字<rt>かん<rt>じ</ruby>

<ruby><rb>表<rt>ひょう</ruby> Test in your browser.