Rethinking text layout in Android and beyond

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

At the 2015 ATypI conference in São Paulo, Brazil, Google's Raph Levien discussed the recent improvements to the fonts and text-rendering framework used in Android. The most recent update to Android introduces several high-quality paragraph-layout features from the TeX typesetting system. Furthermore, the implementation may yet prove useful to other free-software projects.

Levien currently works as the lead engineer for the Android text stack, which (as one would expect) utilizes a number of free-software libraries, like FreeType and HarfBuzz. But Android also provides text widgets for Android apps, and in recent years the project has made a significant investment in producing open-source fonts.

Fonts and features

Addressing changes to the Android fonts first, he noted that the Roboto and Noto families have recently gained several key features. Roboto, which covers European scripts (Latin, Greek, and Cyrillic) was redesigned for the 2014 Android 5.0 ("Lollipop") release, fixing a number of small design problems. The new Roboto release also included the complete build chain for the fonts, which is often a significant missing piece for fonts under open licenses. Noto's goal is to cover every other writing system known to Unicode. The latest release (in concert with Android 6.0 "Marshmallow") added support for Tibetan, Mongolian, and Vai, bringing the number of supported scripts to 60.

In addition, Noto has also gained a set of color emoji. Since the emoji are built into the font, they work automatically wherever text can be entered or displayed in an Android app. Not everyone takes emoji seriously, of course, but Noto also includes a set of "keycap" glyphs (to represent keyboard keys) and world flags, both of which can be displayed by activating special ligatures in the browser, text widget, or application. For instance, the Japanese flag is encoded as a ligature for JP . Finally, in September, the Noto family was re-licensed from the Apache 2.0 license to the SIL Open Font License.

Levien said that Android's "Material Design" style guide has added detailed guidelines for typography that are intended to capture best practices for setting text on a page. Starting with Lollipop, Android has supported a number of optional OpenType features, including discretionary ligatures, various numeral styles (lining, non-lining, and tabular), fraction forms, and localized character forms (i.e., region-specific variants for characters, enabled by the system locale setting).

Paragraph optimization

But high-quality typography encompasses far more than individual glyphs. One of the biggest challenges, historically, has been breaking lines of text in a manner that does not produce excess white space between words and does not create lines of distractingly different lengths. It is an age-old problem, but Levien said that the Android team was motivated to implement a solution for it after the release of the Android Wear smart watch. The watch's significantly narrower text fields meant that uneven line lengths and awkward breaks were even more irritating than they are on Android phones.

For a solution, Levien and the others on the Android text team looked to Donald Knuth's TeX. The first improvement was to add support for automatic hyphenation. Deciding when to hyphenate is a strategic question, but enabling it begins with finding a reliable corpus of break-point data for the languages of interest. The Android solution uses the hyphenation patterns maintained by the TeX user community, which are also employed by hunspell, LibreOffice, and several other free-software projects. The data set currently covers 67 languages.

With hyphenation support in place, the next challenge was finding optimal line breaks for a given string of text. TeX's line-breaking algorithm was first described in the 1981 paper "Breaking Paragraphs into Lines" co-authored by Knuth and Michael Plass. In essence, it enables software to choose line breaks in a manner that minimizes the grand total of deviations from the average line length, over the entire paragraph.

This criterion is not quite the same as choosing all break points as close as possible to the average line length, because Knuth and Plass count hyphenation breaks as somewhat less desirable than non-hyphenation breaks. They also consider certain other patterns as undesirable, such as ending two successive lines with a hyphen. The TeX algorithm is also distinct from other line-breaking algorithms in that it computes optimal break points for the entire paragraph at once. Alternatives (or, at least, the alternatives that predate TeX) process text in a monotonic, line-by-line manner.

The algorithm starts with the available width of the text column and determines a range of acceptable line lengths. Each possible break point (between words, after punctuation marks, and at potential hyphenation points) is assigned a "penalty" value, and each line is scored for its "badness" quotient—a number that roughly equates to how much squeezing or expanding of the inter-word white space is required to fit the line into the acceptable length range. The algorithm then chooses a set of line breaks so that the sum of the "badness" numbers is minimized. The inter-word spacing of each line is adjusted to bring the line is as close as possible to the optimal column width—except for the final line of a paragraph, which can be left as short as necessary.

The secret to getting eye-pleasing results is in choosing quantities like the penalty values and the "badness" formula. Hyphenating words incurs more of a penalty than breaking between words, but it needs to score better than leaving an extra-wide chunk of white space. Knuth and Plass did extensive testing to arrive at their formulas; developers today, fortunately, do not need to repeat the entire process.

It is a complicated endeavor, but Levien said that real-world performance tests on Marshmallow indicate that it does not introduce noticeable delays versus the text layout in Lollipop. Android's implementation does not currently use every optimization or special case discussed by Knuth and Plass, but it does incorporate a few. For instance, it will not hyphenate the next-to-last line of a paragraph (which is regarded as undesirable by professional typesetters), unless that hyphenation will prevent the last line from having only one word on it (which is regarded as even less desirable).

The Android implementation also treats two-line paragraphs as a special case. For such short paragraphs, readers generally prefer to see lines of equal length. As it turns out, this is easy to do in the TeX algorithm: one simply deactivates the rule that, as described above, says "allow the final line to be as short as it needs to be."

Implementation details

In Marshmallow, the new line-breaking feature can be activated by setting the breakStrategy property on text widgets. There are three possible values: high_quality activates the TeX line-breaking algorithm, while simple uses the simplistic line breaker from earlier Android releases.

The third option, balanced , activates a different strategy—the text-wrap: balance algorithm that Adobe has proposed adding to CSS4. Is it akin to one of the intermediate options that Knuth and Plass discussed in their paper before arriving at their final answer. It works line-by-line and uses automatic hyphenation, but it stops short of computing whole-paragraph metrics to find the set of optimal break points. Regardless of how one feels about the algorithm's merits with respect to TeX, though, supporting it may be important to Android app developers if it does get added to CSS.

Currently, Marshmallow sets breakStrategy to high_quality for its display text widgets and to simple for editable text widgets, since having the line breaks jump around as one types is likely to be received poorly by users. The exception to this rule is Android's text-message app, for which hyphenation is turned off because users find it confusing to see hyphenation in an SMS. In any case, those settings are merely the defaults; app developers can change the setting at will in their own code.

The code for Android's line breaker is available in the Android Open Source Project (AOSP) source tree. Notably, though, Levien chose to name it "Minikin" rather than to use a name more in keeping with Android's traditional API and framework names (where it might have been called, say, ParagraphManager). In a discussion after the talk, Levien said that he hopes the library will prove to be reusable in other free-software projects. He added, though, that he does not have the time that would be required to set up and maintain Minikin as an open project.

He also noted that, while the Android team is happy with the results demonstrated so far, there are still plenty of unimplemented ideas that could be incorporated into the code base. In the original paper, Knuth and Plass discuss a number of extensions to the basic algorithm, such as hanging punctuation, automatically indenting code samples, and coping with the peculiar indentations expected of bibliography entries and indexes.

Levien ended the talk by saying that whole-paragraph optimization was originally seen as expensive work to implement for resource-constrained mobile devices. Some in the project considered leaving the line-breaking algorithm off by default and, in a sense, designating it as a "pro" feature. But the team ultimately decided that users stood to benefit by having high-quality text layout built into the system. Considering how often text-layout issues and the niceties of TeX compared to other document formats comes up in discussion these days, many in the free-software community may agree.

