What's new in TeX, part 1

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

Development in the world of the TeX document-preparation system is steady, gradual, and solid. This fact reflects the maturity of Donald Knuth's TeX engine, which underlies everything else in the system. TeX has been essentially frozen for about 30 years and is one of the very few large software systems that is generally considered to be bug-free. Development now occurs in those layers that surround the typographic core: formats that supply higher-level control of TeX for the ready production of various classes of documents and packages for drawing, slides, sheet music, poetry, and for tweaking TeX's behavior.

In this two-part series, we will look at recent developments in the world of TeX (including LaTeX and similar systems). Considering the pace of development in the TeX community, the notion of "new" that I have in mind is a time horizon of five years or so, although I might mention things that happened even before then. This first part will touch upon typography, programming TeX, and creating diagrams.

TeX basics

Although TeX is still essentially oriented toward the creation of static, paginated documents—and might seem to be losing some relevance in our online world—it is still widely used, especially by mathematicians and scientists in quantitative fields (physics, computer science, etc.). The core reason for this is the same reason that TeX is also popular with various authors and publishing houses in the humanities—those who publish typographically demanding scholarly editions, perhaps mixing languages that employ a variety of alphabets. TeX's purpose is to achieve the best possible automated typography.

This can be seen not only in its unparalleled rendering of mathematical equations, but in its attention to aesthetics in the setting of prose: TeX contains sophisticated algorithms that adjust line breaking and hyphenation to optimize the appearance of entire paragraphs and pages considered as wholes. This attention to detail becomes critically important in complex documents, where typography becomes part of the expression of ideas.

The official and predominant installation method for TeX is TeX Live, which was traditionally distributed on a DVD and is still available in that form. To get the most recent versions of all its parts, however, you will want to follow the usual procedure and install TeX Live through the network. The versions available through Linux package management systems are usually out of date, but the current release is available from the project site. The release notes for TeX Live 2015 are a list of relatively minor, technical details, so I won't be discussing those changes specifically.

The timeliest and most complete source of documentation for the hundreds of TeX packages will be on your disk after you install TeX Live. Open the file /usr/local/texlive/2015/index.html in your browser for links to manuals and examples in various languages.

For those unfamiliar with TeX, when you process a document with the TeX typesetting system, you do so by invoking one of several TeX engines. The various engines differ in the output format that they produce and in how they implement some of TeX's algorithms—which determines what additional features are available. The original tex engine predates Unicode (so it expected an ASCII file) and produced only DVI (for "device independent") files. DVI was intended to be translated into PostScript or other printer commands with a separate tool. Contemporary TeX engines, though, can produce PDF files (e.g., pdfTeX), can understand Unicode text (e.g., XeTeX), and even incorporate scripting languages (e.g., LuaTeX).

The TeX engines should not be confused with TeX document formats, which are large collections of macros that define a set of higher-level layout commands. The most well-known format is LaTeX; another format (which has become popular for book publishing) is ConTeXt. Formats and engines are orthogonal: the pdftex and pdflatex commands invoke the same engine, but the former will process only plain TeX, whereas the latter supports LaTeX.

Fonts and Typography

You should probably work with the LuaLaTeX or XeLaTeX engines for new projects (or their plain-TeX equivalents, LuaTeX and XeTeX), unless you must use TeX packages that are incompatible with these engines or you require a particular feature that's only available with the traditional, PDF-based engine pdfLaTeX.

The reason for this advice is that LuaLaTeX and XeLaTeX have both feet firmly in Unicode land, and their font handling is far more flexible and straightforward than that of the venerable alternatives. One of the annoying drawbacks of TeX in the past was that it lived in its own font universe, and could only use the typefaces that were designed for it.

Generally, TeX was blind to all the other beautiful fonts that you might have installed on your computer. With XeLaTeX and LuaLaTeX, though, you can now easily use any OpenType or TrueType font on your system. And, as we shall shortly see, the maturing of the fontspec and unicode-math packages in recent years radically improves the font-handling landscape for TeX users.

Here is a minimal LaTeX document that shows how to make arbitrary font changes, selecting from among several OpenType/TrueType fonts—some in the TeX Live directory tree and some in the system font directories:

\documentclass{article} \usepackage{fontspec} \defaultfontfeatures{Scale=MatchLowercase,Ligatures=TeX} \begin{document} {\fontspec{Ubuntu}The }{\fontspec{Fetamont Bold10}quick } {\fontspec{Punk Nova}brown {\bfseries fox }} {\fontspec{Sawasdee}jumps {\itshape{\bfseries over }}} {\fontspec{CMU Serif}{\scshape the }} {\fontspec{Overlock}lazy }{\fontspec{Ubuntu Condensed}dog.} \end{document}

This file is intended to be processed with the lualatex command, which allows us to use the common names of fonts rather than having to know their actual filenames—as is required by all other engines, including XeLaTeX. This convenience is one of several reasons that lualatex should probably be your preferred typesetting command, unless you need to use a package or feature that only works with one of the others.

The defaultfontfeatures command in the third line of the example selects two options for the fontspec package. The Scale=MatchLowercase option scales the various fonts so that their lower-case letter heights are optically equal: fonts with the same nominal point size can appear to be different sizes, so this option makes them blend better when mixing fonts within a line. The Ligatures-TeX option enables the familiar TeX ligatures, such as "``" for an opening quotation mark.

In the code sample, bfseries select the boldface variant of the currently selected (or default) font; itshape and scshape select the italic and small caps variants, respectively. The code sample also shows how these can be combined to produce, in this example, boldface italics.

Here is the result when you process the file with lualatex .

The image was made by cropping the PDF output and converting it to a PNG. You can see where LuaLaTeX has chosen the appropriate font variants in response to the font attribute commands ( bfseries , scshape , etc.), some of which are nested. This works because the example uses fonts with these variants available; if the needed variants are not available, those commands will be ignored.

I've actually used this style of ad-hoc font switching when making posters and name tags but, for the more usual kind of document, you will want to select a harmonious set of fonts at the beginning and use them consistently throughout, switching among them with the standard commands for italic, monospace, etc., as required. Here is how you do this with fontspec:

\documentclass{article} \usepackage{fontspec} \defaultfontfeatures{Scale=MatchLowercase,Ligatures=TeX} \setmainfont{Overlock}[BoldFont={* Black}, BoldItalicFont={* Bold Italic}, SmallCapsFont={* SC}] \setmonofont{PT Mono} \begin{document} The {\bfseries quick} {\itshape brown fox {\bfseries jumps over}} {\scshape the {\tt lazy dog.}} \end{document}

Running this through lualatex gives the result in the second figure.

In the options to the setmainfont command (which, unusually, come after the main argument), the asterisks stand for the main font name. This provides a convenient shorthand for selecting font variants. Fontspec is incredibly flexible, allowing you to choose entirely different typefaces for bold, italic, etc., if you want to. You can also choose which font features are activated in every situation; for example, you can decide to use historical ligatures when italics are used, but not in upright text.

LuaLaTeX and XeLaTeX both allow you to use Unicode input without including any additional packages. This lets you replace the traditional TeX commands for accents, and it allows the use of any characters available in the font. This is a Turkish translation of the common English pangram we used in the preceding examples:

{\fontspec{CMU Serif} hızlı kahverengi tilki tembel köpeğin üstünden atlar}

When inserted into our minimal document example, it is typeset as in this figure:

Note, though, that this approach will fail if you choose a font without the glyphs required. For example, attempting to set the above line using the Overlock font will simply skip the "ğ", which is missing from that font.

With the addition of the unicode-math package, Unicode input can even be used in equations. This package also builds its typeset mathematical output using Unicode glyphs, and it allows you to select any math font without loading additional packages:

\documentclass{article} \usepackage{unicode-math} \begin{document} Here is the elementary version of Stokes' Theorem: \medskip XITS (STIX) Math: \setmathfont{xits-math.otf} \[ ∫_Σ ∇ ⨯ 𝐅 ⋅ dΣ = ∮_{∂Σ}𝐅⋅d𝐫 \] \end{document}

The results of running this through luatex can be seen in the figure below. A longer example also showing other variations is available by clicking on the thumbnail.

Using Unicode math input clearly leads to source files that are easier to read, but it may not be to your liking if your system or text editor makes the input of Unicode too cumbersome. You can, of course, freely mix traditional TeX math markup with direct Unicode input.

If you do use Unicode characters for math in your source files, you must take care to use the symbols with the correct meaning, rather than merely the correct appearance. In the example file, we've used the uppercase Greek Sigma (U+03A3) to represent the surface of integration. There is, however, another Unicode character that will appear almost identical in the source file, but which is intended to mean the summation operator (U+2211).

When typesetting equations, TeX treats letters (variables) and operators differently, as it must. So, if you accidentally use the operator sigma, the size and spacing of the symbol will be incorrect, and the equations will look quite wrong.

(Note that if you get an error upon loading unicode-math, you may have to reinstall TeX Live 2015. There was a conflict with another package that was only fixed a few weeks before I write this—perhaps a counterexample to my advice to download a recent version rather than settling for the one in your distribution's repository.)

Programmability

TeX is not only a system of declarative markup tags for text and equations. It is also a Turing-complete programming language, meaning that it can express arbitrary computations. Many popular LaTeX packages perform computations in TeX in order to work their magic, but it is an arcane and tricky language to program in, and quite difficult to read.

LuaTeX (which includes the LuaLaTeX engine) is a project that embeds the Lua language within TeX. It's still officially in beta, but over the last few years has become stable and mature enough that LuaLaTeX is now considered the preferred engine for new projects. It is the focus of future development, and the ConTeXt project has adopted it as its official engine.

Lua is a scripting language designed specifically for embedding, and is therefore small and efficient. It has a familiar, imperative syntax and can be immediately understood with no previous exposure to the language. After a few minutes with the documentation, anyone who knows Python or any similar language can write basic programs in Lua. LuaTeX embeds Lua in such a way that it has access to the internals of TeX, and it can be used to manipulate the boxes and other elements that make up the typeset result. It can also make the results of Lua calculations available to TeX for typesetting. A simple example should make clear how this can be useful:

\documentclass{article} \usepackage{luacode} \begin{document} \pagestyle{empty} \begin{luacode*} function esn (n) return (1 + 1/n)^n end function etn (n) tex.print(string.format('%5d & %1.8f \\\\', n, esn(n))) end \end{luacode*} Convergence to $e$: \begin{tabular}{ll} \rule[-2mm]{0pt}{4mm}$n$ & $(1 + \frac{1}{n})^n$ \\ \hline \luadirect{ for n = 10, 110, 10 do tex.print(etn(n)) end } \hline \end{tabular} \end{document}

In the first part of this document, we have a luacode environment, where we have defined two functions. The first ( esn() ) maps a number n to a simple expression that yields the number e in the limit as n goes to infinity. The second function ( etn() ) prints a string that can be embedded within normal TeX source.

The LaTeX code begins next, with a line introducing a table that will show values of n and the convergents approaching e. Within the table, the columns are built with a luadirect command that immediately executes its argument as Lua code, using calls to one of the functions defined earlier. The typeset result is shown in the figure. The ability to perform calculations and typeset the results in a single TeX file, using a language that is simple to program in, opens up a world of new possibilities, especially for authors of mathematical material.

Another strong use case for LuaTeX is in the automated creation of PDF documents from assorted data sources—for example, consider forms of database publishing, such as the printing of catalogs from product databases. In these cases, there is no TeX formatting in the original data, so some form of flexible mapping from data structures to TeX concepts leading to the final PDF is required. The embedded scripting provided by LuaTeX makes this easier than the alternatives.

Graphics

The PGF/TikZ package, a huge project that provides a complete solution to creating all sorts of diagrams within a TeX document, has learned several new tricks in recent years. A recent article introduced TikZ's new network-graph facilities, including the exploitation of LuaTeX to implement the automated layout of graph diagrams. Here, we'll show how to combine a little scripting along the lines illustrated above with TeX's graphics packages:

\documentclass{article} \usepackage{luacode} \usepackage{pgfplots} \begin{document} \begin{luacode*} function esn (n) return (1 + 1/n)^n end function etp (n) tex.print(string.format('(%5d, %1.8f)', n, esn(n))) end \end{luacode*} \begin{tikzpicture} \begin{axis}[xlabel=$n$, ylabel=$(1 + \frac{1}{n})^n$] \addplot coordinates{ \luadirect{ for n = 10, 110, 10 do tex.print(etp(n)) end } }; \addplot[red] coordinates { % (0, \luadirect{tex.print(math.exp(1))}) % (110, \luadirect{tex.print(math.exp(1))}) }; \end{axis} \end{tikzpicture}

In this example we've replaced the second Lua function with one that prints out a pair of coordinates that we can use in a PGFPlots command. The import of pgfplots in the document preamble creates the tikzpicture environment. The axis environment and \addplot command within the tikzpicture environment invoke the pgfplots subsystem, which provides a specialized language for drawing graphs. The syntax it supports is designed to be more convenient than using plain TikZ for this purpose.

The result is shown in the following figure, where the approach of the convergents to the limit, the transcendental number e (shown as a red line), is illustrated.

With the advent of these easy-to-learn tools, it has become possible to undertake a project such as an entire mathematics textbook, with all calculations and graphing done within a single TeX document.

TikZ and PGF recently received a major update to version 3.0. Because of the power and relative ease of use of PGF and TikZ, much of the action in the past few years on the TeX graphics front has taken the form of new TikZ packages.

A few interesting recent additions or major upgrades are Sa-TikZ, for the automated drawing of switching networks; bondgraph, for making "bond graphs" of physical systems; hf-tikz, which allows you to highlight parts of formulas; the randomwalk package, which calculates and prints 2D random walks; tikzorbital, for drawing colorful pictures of atomic and molecular orbitals; tqft, for topological quantum field theory; and forest, for drawing linguistic trees. All of these packages are documented within the TeX Live installation.

The world of TeX has come a long way since I started using it, when we edited our files on a terminal attached to a remote computer, and checked our output by jogging to the computer room to pick up our printouts.

The next installment of this series will delve into the interaction between the traditional world of TeX, which began as a way to typeset documents for printing, and our current environment of electronic documents that adapt to an assortment of reading devices.