The vast majority of books and magazines are typeset using hyphenation and justification

(written as H&J from here on in). In print, it’s everywhere: All lines of text except the last lines of paragraphs are stretched out to the same length. Flush left and flush right. Hyphens are used to break words at the end of lines to help prevent gaps in word spacing. Like this:

Article Continues Below

We hold these truths to be self-ev­i­dent, that all men are cre­at­ed e­qual, that they are en­dowed by their Cre­a­tor with cer­tain un­al­ien­a­ble Rights, that a­mong these are Life, Lib­er­ty and the pur­suit of Hap­pi­ness. That to se­cure these rights, Gov­ern­ments are in­sti­tut­ed a­mong Men, de­riv­ing their just pow­ers…

In contrast, nearly all text on the web is set flush left, with no hyphens at the end of lines. (This assumes a left-to-right Latinate language like English.) In the world of print, this is sometimes called “ragged right” or a “hard rag” because of the sawtoothed edge created on the right by the uneven line lengths. Today on the web, it’s nearly universal:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers…

This no longer needs to continue as it has. And if the many criticisms of iPad typography are any guide, for many design niches like eBooks, it shouldn’t continue if customer expectations are to be met.

But what to do? Well, few web designers are aware of it, but H&J can be a part of their work today. First, a quick look at the history.

Just a dash, please#section2

The hyphen was carried forward from the world of handwritten manuscripts and into the world of print with Johannes Gutenberg’s system of movable type. However, in movable type, the hyphen also solved a mechanical problem:

The Gutenberg printing press required words made up of individual letters of type to be held in place by a surrounding non-printing rigid frame. Gutenberg solved the problem of making each line the same length to fit the frame by inserting a hyphen as the last element at the right side margin. This interrupted the letters in the last word, requiring the remaining letters be carried over to the start of the line below.

Gutenberg’s hyphen was a short, double line, inclined to the right at a sixty degree angle. It looked like this:

Fig 1. Example of Gutenberg’s hyphen.

For Gutenberg, the hyphen served a dual purpose. It provided the spacer block necessary to bring the line of type flush to the inside of the holding frame, while at the same time, it printed a character that announced its purpose to the reader. The hyphen says to the reader, in effect: “Pardon me while I break this word and end the line right here. I’m doing this to preserve the overall look of the text. Ignore me as best you can.”

In this, the hyphen makes a small demand in exchange for a larger aesthetic payoff. If you take a long look at a column of type from one of Gutenberg’s bibles, you’ll find vibrancy and balance. Now, the mechanical problems of movable type are long gone, of course, and typesetting has been digital for decades. Yet H&J is still predominant: the payoff remains.

The hyphen says: “Hey, it still looks good, right?” And it’s hard to argue with the habits and expectations of readers that have built up over five centuries of practice. If you want the look that says book, hyphenation and justification bring the weight of history to bear.

Using hyphenation and justification today#section3

When it comes to new browser features, Flash-y effects get the glory and so it’s no surprise that support for a special unicode font character called the soft hyphen would go largely unnoticed. But the soft hyphen is the key to good-looking hyphenation and justification. And over the years it’s gained support in every A-grade browser: IE6+, Opera 7.1+, Safari 2+, Firefox 3+, and Chrome. This, combined with a little JavaScript jiggery, makes H&J a viable design technique today.

The soft hyphen#section4

What’s a soft hyphen? The HTML spec says:

In HTML, there are two types of hyphens: The plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored. In HTML, the plain hyphen is represented by the “-” character ( - or - ). The soft hyphen is represented by the character entity reference ­ ( ­ or ­ )

OK. So how does it work and what do you do? Here are the main considerations:

Coding the word breaks#section5

When you insert ­ (or ­ ) within a word, it signals the browser that it’s okay to break the word in that particular spot if doing so helps preserve the integrity of the word spacing. In other words, when deciding whether to break a word at the end of a line, the browser will give a greater priority to maintaining uniform word spacing. Let’s say, for example, the word is “constitution.” “Constitution” can be carved up at three spots, like this: con-sti-tu-tion.

So HTML like this— con­sti­tu­tion —tells the browser that if it needs to wrap a part of that word to the next line to preserve word spacing, it’s okay to wrap it. And if it does, the word can be broken up at any one of the three spots where ­ is inserted. (Note: As you’ll see, hard coding it like this in the HTML is not recommended. This is just an explanation of how it works.)

Hyphens appear where needed automatically#section6

The soft hyphen is an actual character in the font. But the browser will only display it if the word is broken at the end of a line. This show/hide behavior happens automatically.

Apply soft hyphens at all possible breaks#section7

Text on the web can change: Column widths resize along with different window sizes, devices, zoom levels, and text size selections. There is no practical way to predict exactly where and how lines of text will wrap. This is an unavoidable side effect of one of the great features of electronic text.

Completely at odds with the fixed nature of print, this leads inescapably to the right way to apply soft hyphens in HTML: Soft hyphens should be inserted at all possible hyphenation points. Now, at first glance this may seem inelegant and wasteful, but when soft hyphens are added programmatically, as you’ll soon see, it’s not a problem at all.

As an example, here is a sample page with soft hyphens hard coded into the HTML text. (The online tool Hypho-o was used to insert the soft hyphens.) Resizing the browser window or zooming larger or smaller will reflow the text and show how the browser preserves word spacing while hyphens appear and disappear at the ends of each line as needed.

The downsides of hard coding#section8

Hard coding soft hyphens is a good path to understanding how they work, but a bad thing to do in practice. Soft hyphens make the HTML text hard to read and edit. Additionally, they may create difficulties for search engines. Users can’t turn soft hyphenation on and off with a simple UI widget. Using JavaScript to apply soft hyphens makes a lot more sense and works quite well.

By far the most mature library for hyphenation in HTML is Hyhenator.js by Mathias Nater. Hyphenator.js relies on the same data compression algorithms and hyphenation dictionaries found in products like TEX (for which it was originally developed by Franklin Liang in 1983), Open Office, and the HTML to PDF converter Prince which implements the CSS3 Paged Media Module.

Here is a simple page containing both English and German text. There is a toggle widget in the upper right for turning hyphenation on and off. There is also a bookmarklet version of Hyphenator.js.

Based on a Project Gutenberg HTML edition of Joseph Conrad’s Heart Of Darkness, here are some simple examples of the first chapter, each using the same modified version of Hyphenator.js 2.0 and the Sizzle selector engine, with the font size adjusted for the following devices:

Hyphenator.js also has a merge-and-pack tool for creating an optimized and minified single JavaScript file, as well as instructions for rolling your own. Remember that hyphenation is basically a search and replace. If there’s a lot of hyphenation on the page, some delay in page display may be unavoidable. Hyphenator.js also inserts the zero width space (ZWS) character for intelligent URL line wrapping.

The zero width space (ZWS)#section10

The zero width space is essential to getting a good result with H&J. It’s encoded as ​ . Kingdesk Web Design, who have done considerable work on the problem of hyphenation, describes the zero width space this way:

Similar to the soft hyphen, the zero space character communicates allowable line breaks within strings of text. But unlike the soft hyphen, it does not show a hyphen at line’s end. This is ideal for forcing consistent wrapping of long URLs. It also can be used to force line breaks in uncooperative web browsers after hard hyphens in words like “zero-space” and “soft hyphen”.

To control line wrapping problems when long strings are created with “hard” hyphens (or the en dash ( – ) or em dash ( — ) characters), or when the browser might be confused on where to break a string when using characters such as ( )[ ] { } « » % ° · / ! ?, the ZWS can provide the browser with useful hints on what to do.

For example, to preserve readability, the following tells the browser it’s okay to wrap after a hard hyphen but not before:

The zero-​width space.

For wrapping long URLs, the ZWS is inserted following forward slashes:

http://"‹​code.​"‹google.​"‹com/​"‹p/​"‹hyphenator/

All of this is preferably done with JavaScript. But as a matter of page load time and practicalities, hard coding the ZWS here and there as you need to doesn’t have any serious downsides.

The soft hyphen is a character in the font with its own Unicode designation. This means that in a copy/paste operation, the soft hyphen travels right along with the other characters.

In a plain text editor it might show up as a question mark. In MS Word, the soft hyphens will be stripped, unless you choose “text only” formatting. Search engines like Google or Bing will ignore them when pasted into the search box.

The bottom line is that browsers—rightly or wrongly—don’t strip out the soft hyphens automatically on copy. And whether the soft hyphens are hard coded or inserted with script makes no difference. The only surefire solution is to strip the soft hyphens on copy using a script. Thankfully, this was worked out in Sweet Justice—an English-only hyphenation script—by Facebook developer Carlos Bueno. (Source on Github.) This is also the solution in Hyphenator.js as of version 3.0.

The issue of how browsers will handle soft hyphens and other “empty space” characters like ZWS going forward remains to be seen.

Find on this page#section12

Similar to the select/copy/paste problem is find. As of this writing, only Firefox does this correctly in conformance with the HTML spec: “For operations such as searching and sorting, the soft hyphen should always be ignored.” The browser is supposed to ignore the soft hyphens when searching for a word. But in every browser tested other than Firefox, the search goes wrong after the first syllable “con” in the word “constitution” because of the inserted soft hyphen. Similarly, soft hyphens can also cause unwanted spaces within strings when sending text using right click context menus and the like. The receiving apps usually ignore the spaces even though they’re visible, but still, it’s unsettling to the user.

The solutions to these annoyances lie squarely with browser makers.

High-res displays like the iPhone Retina, convenient e-reading devices like the iPad, and web fonts have brought a new focus on web typography. Hyphenation and justification is an important and time honored technique. Hopefully the information here will help make it an option for onscreen reading sooner, rather than later.