At the 2015 ATypI conference in São Paulo, Brazil, Google's Raph Levien discussed the recent improvements to the fonts and text-rendering framework used in Android. The most recent update to Android introduces several high-quality paragraph-layout features from the TeX typesetting system. Furthermore, the implementation may yet prove useful to other free-software projects.

Levien currently works as the lead engineer for the Android text stack, which (as one would expect) utilizes a number of free-software libraries, like FreeType and HarfBuzz. But Android also provides text widgets for Android apps, and in recent years the project has made a significant investment in producing open-source fonts.

Fonts and features

Addressing changes to the Android fonts first, he noted that the Roboto and Noto families have recently gained several key features. Roboto, which covers European scripts (Latin, Greek, and Cyrillic) was redesigned for the 2014 Android 5.0 ("Lollipop") release, fixing a number of small design problems. The new Roboto release also included the complete build chain for the fonts, which is often a significant missing piece for fonts under open licenses. Noto's goal is to cover every other writing system known to Unicode. The latest release (in concert with Android 6.0 "Marshmallow") added support for Tibetan, Mongolian, and Vai, bringing the number of supported scripts to 60.

In addition, Noto has also gained a set of color emoji. Since the emoji are built into the font, they work automatically wherever text can be entered or displayed in an Android app. Not everyone takes emoji seriously, of course, but Noto also includes a set of "keycap" glyphs (to represent keyboard keys) and world flags, both of which can be displayed by activating special ligatures in the browser, text widget, or application. For instance, the Japanese flag is encoded as a ligature for JP . Finally, in September, the Noto family was re-licensed from the Apache 2.0 license to the SIL Open Font License.

Levien said that Android's "Material Design" style guide has added detailed guidelines for typography that are intended to capture best practices for setting text on a page. Starting with Lollipop, Android has supported a number of optional OpenType features, including discretionary ligatures, various numeral styles (lining, non-lining, and tabular), fraction forms, and localized character forms (i.e., region-specific variants for characters, enabled by the system locale setting).

Paragraph optimization

But high-quality typography encompasses far more than individual glyphs. One of the biggest challenges, historically, has been breaking lines of text in a manner that does not produce excess white space between words and does not create lines of distractingly different lengths. It is an age-old problem, but Levien said that the Android team was motivated to implement a solution for it after the release of the Android Wear smart watch. The watch's significantly narrower text fields meant that uneven line lengths and awkward breaks were even more irritating than they are on Android phones.

For a solution, Levien and the others on the Android text team looked to Donald Knuth's TeX. The first improvement was to add support for automatic hyphenation. Deciding when to hyphenate is a strategic question, but enabling it begins with finding a reliable corpus of break-point data for the languages of interest. The Android solution uses the hyphenation patterns maintained by the TeX user community, which are also employed by hunspell, LibreOffice, and several other free-software projects. The data set currently covers 67 languages.

With hyphenation support in place, the next challenge was finding optimal line breaks for a given string of text. TeX's line-breaking algorithm was first described in the 1981 paper "Breaking Paragraphs into Lines" co-authored by Knuth and Michael Plass. In essence, it enables software to choose line breaks in a manner that minimizes the grand total of deviations from the average line length, over the entire paragraph.

This criterion is not quite the same as choosing all break points as close as possible to the average line length, because Knuth and Plass count hyphenation breaks as somewhat less desirable than non-hyphenation breaks. They also consider certain other patterns as undesirable, such as ending two successive lines with a hyphen. The TeX algorithm is also distinct from other line-breaking algorithms in that it computes optimal break points for the entire paragraph at once. Alternatives (or, at least, the alternatives that predate TeX) process text in a monotonic, line-by-line manner.

The algorithm starts with the available width of the text column and determines a range of acceptable line lengths. Each possible break point (between words, after punctuation marks, and at potential hyphenation points) is assigned a "penalty" value, and each line is scored for its "badness" quotient—a number that roughly equates to how much squeezing or expanding of the inter-word white space is required to fit the line into the acceptable length range. The algorithm then chooses a set of line breaks so that the sum of the "badness" numbers is minimized. The inter-word spacing of each line is adjusted to bring the line is as close as possible to the optimal column width—except for the final line of a paragraph, which can be left as short as necessary.

The secret to getting eye-pleasing results is in choosing quantities like the penalty values and the "badness" formula. Hyphenating words incurs more of a penalty than breaking between words, but it needs to score better than leaving an extra-wide chunk of white space. Knuth and Plass did extensive testing to arrive at their formulas; developers today, fortunately, do not need to repeat the entire process.

It is a complicated endeavor, but Levien said that real-world performance tests on Marshmallow indicate that it does not introduce noticeable delays versus the text layout in Lollipop. Android's implementation does not currently use every optimization or special case discussed by Knuth and Plass, but it does incorporate a few. For instance, it will not hyphenate the next-to-last line of a paragraph (which is regarded as undesirable by professional typesetters), unless that hyphenation will prevent the last line from having only one word on it (which is regarded as even less desirable).

The Android implementation also treats two-line paragraphs as a special case. For such short paragraphs, readers generally prefer to see lines of equal length. As it turns out, this is easy to do in the TeX algorithm: one simply deactivates the rule that, as described above, says "allow the final line to be as short as it needs to be."

Implementation details

In Marshmallow, the new line-breaking feature can be activated by setting the breakStrategy property on text widgets. There are three possible values: high_quality activates the TeX line-breaking algorithm, while simple uses the simplistic line breaker from earlier Android releases.

The third option, balanced , activates a different strategy—the text-wrap: balance algorithm that Adobe has proposed adding to CSS4. Is it akin to one of the intermediate options that Knuth and Plass discussed in their paper before arriving at their final answer. It works line-by-line and uses automatic hyphenation, but it stops short of computing whole-paragraph metrics to find the set of optimal break points. Regardless of how one feels about the algorithm's merits with respect to TeX, though, supporting it may be important to Android app developers if it does get added to CSS.

Currently, Marshmallow sets breakStrategy to high_quality for its display text widgets and to simple for editable text widgets, since having the line breaks jump around as one types is likely to be received poorly by users. The exception to this rule is Android's text-message app, for which hyphenation is turned off because users find it confusing to see hyphenation in an SMS. In any case, those settings are merely the defaults; app developers can change the setting at will in their own code.

The code for Android's line breaker is available in the Android Open Source Project (AOSP) source tree. Notably, though, Levien chose to name it "Minikin" rather than to use a name more in keeping with Android's traditional API and framework names (where it might have been called, say, ParagraphManager). In a discussion after the talk, Levien said that he hopes the library will prove to be reusable in other free-software projects. He added, though, that he does not have the time that would be required to set up and maintain Minikin as an open project.

He also noted that, while the Android team is happy with the results demonstrated so far, there are still plenty of unimplemented ideas that could be incorporated into the code base. In the original paper, Knuth and Plass discuss a number of extensions to the basic algorithm, such as hanging punctuation, automatically indenting code samples, and coping with the peculiar indentations expected of bibliography entries and indexes.

Levien ended the talk by saying that whole-paragraph optimization was originally seen as expensive work to implement for resource-constrained mobile devices. Some in the project considered leaving the line-breaking algorithm off by default and, in a sense, designating it as a "pro" feature. But the team ultimately decided that users stood to benefit by having high-quality text layout built into the system. Considering how often text-layout issues and the niceties of TeX compared to other document formats comes up in discussion these days, many in the free-software community may agree.

Comments (26 posted)

Open design of its software is a "key OpenStack principle", and the Design Summit is a big part of how that is accomplished, Thierry Carrez said to open an introductory session at the OpenStack Summit in Tokyo. The session, entitled "Design Summit 101", was targeted at newcomers to provide them with an overview of how that summit-within-a-summit functions—and what it is meant to accomplish. In the session, Carrez, who is the director of engineering for the OpenStack Foundation, gave a nice look inside the project and into an important part of how OpenStack comes together.

Design Summit sessions are not supposed to be presentation-oriented and are focused on a specific topic. Depending on the type of session, it may be geared toward discussion or to deciding who is going to do what. Each session has a moderator to introduce the topic and to keep the discussion on track.

Goals

There are a number of goals that may be pursued in these sessions. One is to get feedback on an idea to see if there can be quick convergence of opinion across a broad range of attendees. Another is to push a feature through its implementation phase and to attempt to gather contributors to help do so. The discussion in a session can also lead to an alignment of a team's priorities. In addition, summit sessions provide a venue for participants to meet and socialize.

The Design Summit is held every release cycle in conjunction with the much larger OpenStack Summit. It is targeted at contributors or those who want to contribute and focuses on what to do in the next six-month cycle. Much of the first day of the four-day Design Summit is made up of cross-project sessions (like the distributed lock manager discussion we looked at last week). Those are meant to help avoid duplicating effort between the various OpenStack sub-projects.

There used to be a separate operators summit, but that has been folded into the Design Summit because feedback from operators is an integral part of the design process, Carrez said. There are "Ops" sessions throughout the first two days of the summit (the schedule shows the Design Summit sessions in color, while the rest of the summit is in gray). The idea is that developers will attend the operator sessions and vice versa.

The two middle days also have project team work sessions, where those teams discuss specific items that need to be resolved for the project. The final day is when everyone is "burnt down", Carrez said, so there are informal "meetups" for the projects. People can either do work if they want to on that day or "go outside if you prefer".

The parties are also part of Design Summit experience. There are fewer than there used to be, which is "probably a good thing", Carrez said with a chuckle. The parties are also shared between the events, so "you might run into salespeople" there.

Fishbowls and work sessions

Within the different types of sessions, there are two styles of interaction that dictate the kinds of rooms that will be used: fishbowls and work sessions. As its name implies, a fishbowl session has its seats arranged in semi-circles around the projector, which is typically used to show the Etherpad for tracking the notes from the session. Etherpads for all of the Design Summit sessions can be found on the OpenStack wiki—the Etherpad for DS101 is here.

In a fishbowl session, those expecting to participate should be seated somewhere near the front. There are usually no microphones, though some rooms still use them. Having microphones is a tradeoff: it is harder to get a lively discussion if participants have to wait until they get the microphone, he said. The rooms for fishbowls are typically fairly large and those sessions are advertised with full descriptions on the schedule page.

There can't be large rooms for everything, he said, so there are working sessions that are generally held in a room with a large conference table and possibly some seats around the edges of the room. These sessions are just listed as a "work session" for a particular team or project on the schedule—clicking through will give more details of the specifics of the session. That is meant to provide a little friction to help limit the attendance to work sessions because attendees "have to care enough to click through", which established contributors will generally be willing to do.

Fishbowls are all about getting feedback from the audience. They should start with a brief reminder of the session topic and then objectives should be defined. While it is important to have someone assigned to take notes in the Etherpad, anyone else can also join in to add notes. As one of the OpenStack community managers, Tom Fifield, noted, those who are not comfortable speaking up can add their concerns into the Etherpad, thus providing feedback without having to jump into a (possibly contentious) discussion.

The moderator in a fishbowl is expected to keep the discussion in focus and to ensure that the session is inclusive of those who want to participate. As the 40-minute session draws to a close, there should be an attempt to identify concrete actions or conclusions that came about in the session. For roughly one-third of these sessions, there won't be a clear outcome, Carrez said, but even in those sessions, the key stakeholders needed for future discussions will be identified.

The work sessions are geared toward getting work done. They are not for newcomers to ask questions or give feedback. The intent is to get people assigned to tasks based on the priorities that have been set for the project.

Tips and tricks

Checking the Etherpad before a session to prepare for it is an important part of attending. In fact, if you don't have time to do that, you should skip the session and prepare for another session later, Carrez said. It is also important to show up on time and for the meeting to end on time—so that participants can get to the next session.

People should edit in the Etherpad without hesitation, Carrez said. Different people's contributions are assigned different colors in the pad so that authors can be distinguished. The Etherpad can also be a place for questions, especially if the question is difficult to articulate from a language or technical perspective. In between sessions, talking with others from the sessions you have attended (in the developers lounge or "hallway track") is also useful.

There were a few questions at the end, mostly focused on where and how newcomers could best contribute. The work sessions are probably not the place, though there are some of those that are interested in user feedback, Fifield said. It is a matter of getting a feeling for the room and what the participants are trying to accomplish before deciding whether you can make a positive contribution.

The cross-project sessions tend to be less technical and are more geared toward those without a strong OpenStack background. Also, the operator sessions can be less technical and more feedback-oriented. The work sessions are in smaller rooms, some of them with only enough space for eight people, so they are much more hands-on and targeted at getting something specific done in the 40 minutes.

The fishbowl sessions are in some ways reminiscent of the format for microconferences at the Linux Plumbers Conference. They also resemble sessions from Linaro Connect, which reflects the common parent of that event and OpenStack Summit: the Ubuntu Developer Summit (UDS). That should probably not come as a surprise since they are all focused on resolving some kind of technical issue among the people present. The summit all seemed to work reasonably well: discussions were generally productive, conclusions and action items were determined, and contributors volunteered for the tasks that needed to be done.

[I would like to thank the OpenStack Foundation for travel assistance to Tokyo for the summit.]

Comments (none posted)

The Nova project is targeted at providing compute resources for OpenStack-based clouds. John Garbutt, who is the Nova project team lead (PTL) for the recent Liberty release, as well as the upcoming Mitaka release, presented an update on the project at the Tokyo OpenStack Summit. He looked at the changes that had come over the last few releases, with an emphasis on Liberty, while also giving a brief glimpse into the future of one of the foundational pieces of OpenStack.

He started out by noting that throughout the Liberty cycle the project had been conscious about trying to make sure that others in OpenStack were clear on what Nova was doing—and why. The mission of the project is stated on its web site: "To implement services and associated libraries to provide massively scalable, on demand, self service access to compute resources." The key piece there is that "Nova is all about compute", he said.

Priorities

But in order to get things done, there is a need to focus on particular features and to make time for them. The Nova team has identified a handful of priorities for the project, starting with having a good API with a strong ecosystem around it. In order to build up that ecosystem, the API needs to be the same in all of the different deployments of OpenStack so that other projects can rely upon it. The project is focusing on "doing a better job of that", Garbutt said.

Next up is "making sure we stay robust and reliable". That means when an API call is made, the right thing happens. That requires testing, fixing whatever bugs are found, as well as listening to operator and user groups about where the failures are.

The team has gotten a lot of feedback about the need for upgrades that are easy and that work correctly. It has taken many releases to get to where things are today, where Nova has "quite a good story" for upgrades. This will allow deployments to follow new releases more closely. There is a similar need for "scale out"; as deployments grow, Nova must reliably help by scaling out the compute resources seamlessly.

Maintaining the open culture within OpenStack is another priority for Nova. It is important to continue the innovation that the project has already brought to the table; open source and an open culture are major parts of that.

The final priority is to "focus on not expanding our scope". Nova is already a huge project, Garbutt said, but it needs to stay focused. There are a number of projects that could have been done inside Nova, but were spun out into separate projects. The Heat orchestration component is a good example of that.

Another specific example of avoiding scope creep for Nova happened recently. A "semi-high-availability" feature was proposed that would monitor an instance and bring it back up elsewhere if it crashed. The developers did not want to add it to Nova if that could be avoided, so they added some API calls so that an external high-availability tool could be used to implement it. That way, the problem was solved in the ecosystem, rather than within the Nova project itself.

Changes for Liberty

One of the major changes for the Liberty release "does not sound very sexy", but is actually fairly exciting to Nova developers: lots of architecture evolution. The "bowels of Nova" are being rewritten with an eye toward "maintaining stability while increasing velocity", Garbutt said. He acknowledged that stability and development velocity are at odds to some extent, but the project team is trying to find the right set of tradeoffs there. There are three themes to the evolution: API improvements, work on upgrades, and better scheduling and resource tracking.

The Nova API has evolved over time, but the team "stepped back" to try to better understand the API users and what they need from the API. It identified three types of user, each with different needs from the API.

First is the "absent user", who has some scripts that do what is needed to get their application up and running; they want those scripts to keep running even in the presence of upgrades to their cloud. Second is the "active user", who wants to use all of the new API calls and is happy to rewrite their scripts multiple times a year to do so. Those users want to be able to query what APIs are available so that the scripts can take advantage of the newer features. If the Nova team can't continue to change the API, these active users will get bored, Garbutt noted.

The third user type identified is the "multi-cloud user", who has applications that run in multiple public and private OpenStack clouds. Those users have "magical scripts" and SDKs that work on multiple different versions of OpenStack, including some components that are not even released yet. Nova supports installation from the Git trunk, for example, so there may be disparate versions in the various clouds. Upgrades on those clouds will not happen in lockstep. "That should work too", he said.

Beyond the users, the operations and development staff have an interest here as well. They likely have new problems to solve and need newer features from more recent releases, but they also need to know "who is using what" features, versions, and so on. The staff also would like to know, for example, how many active vs. absent users there are. "It would be great if we found a solution for this too".

APIs

The first Nova API was v2.0—an alias for v1.1—which was all "a bit confusing". In any case, there was this idea that it would consist of a base API plus a whole pile of extensions. Users could query to determine which extensions were available. But it is easy to make mistakes when creating an API and there was no way to evolve v2.0, he said.

So now there is a v2.1 of the API that is exactly the same as v2.0, but it got rid of the idea of extensions—they are bundled in with Nova now. All of the extensions are listed as being present for queries, so there is the same API everywhere. Evolution will be handled with "microversioning", so v2.2 will have some newer calls and potentially some deprecated ones; API users can request a particular minimum version. There will continue to be support for the v2.0 API in Nova, but by default Liberty will use v2.1.

There are also third-party APIs in the tree, such as one for Amazon's EC2, that have been a struggle to keep working and to get people to care about. The team has been working with an external project, which uses the Nova API to interface with EC2, to ensure that the calls needed are available. That will allow Nova to deprecate its EC2 support, which will probably be removed in the Mitaka release.

Upgrades

One of the key tenets of Nova (which it shares with other OpenStack components) is independence for the control and data planes. "That sounds fancy", Garbutt said, but what it really means is that "Nova can die in a fire" and it won't take the hypervisor or virtual machines (VMs) with it. Some downtime on the control plane (Nova) can be tolerated.

Nova supports upgrades from the latest stable branch to the next release, as well as to the next commit within the same cycle. As he noted previously, Nova supports installation from the trunk, which effectively means that it has to support upgrades between commits.

Another important upgrade requirement is that the existing configuration should "just work". It may cause Nova to spit out warnings about problems that will need to be addressed before the next upgrade, but those shouldn't have to be dealt with during the upgrade process. Those warnings are often for deprecated features; the project would rather not deprecate things, but sometimes must as a last resort, he said.

As the Nova PTL, Garbutt felt that he needed to put up a "complicated, scary diagram" of the architecture (which can be seen in the YouTube video of the talk), but he said that what it depicts is actually fairly straightforward. There are REST requests made at the API level that bubble down through the rest of the Nova pieces and eventually result in the creation of a VM (i.e. compute node).

One of those "pieces" is the Nova database, which needs to be dealt with as part of the upgrade process. Both the schema and the data in the database may need to be upgraded. In addition, the Nova control-plane components can be upgraded together, but the compute-node components may be upgraded over time so the conductor (which sits between the compute nodes and the database) must be able to convert the internal remote procedure call (RPC) formats between the old and the new. That requires lots of testing to ensure that "the magic happens" and the upgrade process works.

There is a four-step process for upgrades. First, the database is updated, then the API and control plane are restarted. The database used to be upgraded during that restart, but that turned out to be quite slow, so now the database is dealt with first. Then, the compute nodes (i.e. nova-compute) can be restarted one-by-one at convenient times. Once that is complete, the RPC version that is used can be pinned to the new (upgraded) version. This process has been available since the Kilo release; it stabilized in Liberty and more work is being done to reduce the downtime for Mitaka.

Wrapping up

The project has been working on defining its scope, which will help it avoid the problems of scope creep. For example, Nova does support containers by using LXC and libvirt, but it treats them more like VMs. The Magnum project is geared toward treating containers in a more "container-like" fashion, so that is where the container effort should go. Similarly, nova-docker was removed from the tree because it did not have the testing infrastructure needed to ensure that it did not break when things inside Nova changed.

For Liberty, there has been lots of progress. The architecture evolution has continued and there have been improvements in making upgrades have less of an impact, though there is still lots more work to do there. The API story has gotten better, which should help grow the ecosystem around Nova. In addition, over 60 blueprints were implemented for Liberty and over 400 bugs were fixed.

For Mitaka (and beyond), the Nova team will be working on better support for compute cells, which are meant to support very large deployments. There will be more of a focus on the user experience, which includes better API documentation, better error reporting, as well as improvements to the scheduler. The Nova team will also be working to keep the development process evolving, trying to reduce the review bottlenecks, and to release more often.

[I would like to thank the OpenStack Foundation for travel assistance to Tokyo for the summit.]

Comments (none posted)