At least from the perspective of rendering, text is often the most complex part of a traditional two-dimensional user interface. In such an interface, the two main components are rectangular images and text. The rectangular images are often quite static, and can be represented by two triangles and four indexes into a texture atlas that is uploaded to graphics memory once and then retained. This is something that has low complexity and which the graphics hardware has been optimized to handle quickly.

Text starts as a series of indexes into an international database of writing systems (Unicode). It is then, based on some selection algorithm, combined with one or more fonts, which is in principle a collection of shapes and some lookup tables and executable programs that convert said indexes into shapes and relative positions. These shapes, basically filled paths made out of bezier curves, then have to be rasterized at a specified size, and this can range from simple and neat outlines to complex ones with lots of detail. (By rasterization, I mean finding out how much of each target pixel, or subpixel in some cases, is covered by the shape.)

All combined, it is a heavy process. But luckily, instead of redoing every step for every string, we can often cache intermediate results and reuse them later.

For instance, it is possible to rasterize the glyphs the first time they are used, keep this in memory, and then at each subsequent use, render these glyphs the same way images are rendered as described above: By putting the rasterized glyphs in a texture atlas and representing the glyphs by two triangles and indexes into this atlas. In fact, when the Text.NativeRendering render type is in use in Qt Quick, this is precisely what happens. In this case, we will ask the underlying font system (CoreText on macOS, GDI/DirectWrite on Windows and Freetype on Linux) to rasterize the glyphs at a specific size, and then we will upload these to a texture atlas which can later be referenced by the triangles we put on the screen.

There are some limitations of this approach however: Since the font size has to be known before the glyphs are rasterized, we may end up rasterizing and caching the same glyphs multiple times if the text in the UI comes in many different sizes. For some UIs that can be too heavy both for frame rate and memory consumption. Animations on the font size, for instance, can cause us to rasterize the shapes again for every frame. This rasterization is also done on the CPU, which means we are not using the resources of the device to its full potential when preparing the glyphs.

Additionally, transformations on the NativeRendering text will give pixelation artifacts, since they will be applied to the pre-rasterized image of the glyph, not its actual mathematical shape.

So what is the alternative?

For a more flexible approach, we want to actually do the rasterization on the GPU, while rendering our frame. If we can somehow get the shapes into the graphics memory and rasterize them quickly using a fragment shader, we free up CPU resources and allow both transformations and size changes without any additional penalty on performance.

There are several approaches to this problem. The way is is done in Qt is by using so-called distance fields. Instead of storing the rasterized glyphs in texture memory, we store a representation of the shapes in a texture atlas where each texel contains the distance to the nearest obstacle rather than the coverage.

Once these distance fields are created and uploaded to texture memory, we can render glyphs at any font size and scale quickly on the GPU. But the process of converting the shapes from the fonts into distance field is still a bottle neck for startup time, and that in particular is what this blog is about.

So what is the problem?

Creating the distance fields is CPU-bound, and - especially on lower-end hardware - it may be very costly. By setting the QT_LOGGING_RULES environment variable to "qt.scenegraph.time.glyph=true" we can gain some insight into what that cost is. Lets for instance say that we run an example that displays 50 unique Latin characters with the Deja Vu Sans font (the simple and neat outlines further up). With the logging turned on, and on an NXP i.MX6 we have for testing in our QA lab, we get the following output:



qt.scenegraph.time.glyph: distancefield: 50 glyphs prepared in 25ms, rendering=19, upload=6



From this output we can read that generating the necessary assets for these 50 glyphs took 19 ms, over one whole frame, whereas uploading the data to the graphics memory took 6 ms. It is the 19 ms for converting into distance fields that we will be able to reduce. These 19 ms may not seem like a lot, but it will cause the rendering to skip a frame at the point where it happens. If the 50 glyphs are displayed at startup, then those 25 ms may not be as noticeable, but if it is done during an animation, it would be something a user could notice. It is worth mentioning again, though, that it is a one-time cost as long as the font remains in use.

Running the same for the HVD Peace font (linked as the complex font above), we get the following output:



qt.scenegraph.time.glyph: distancefield: 50 glyphs prepared in 1016ms, rendering=1010, upload=6



In this case, we can see that rendering the distance fields takes a full second, due to the high complexity of the outlines in use.

Another use case where we may see high costs of generating distance fields is if the number of unique glyphs is very high. So let us test an arbitrary, auto-generated "lorem ipsum" text in Chinese with 592 distinct characters:



qt.scenegraph.time.glyph: distancefield: 592 glyphs prepared in 1167ms, rendering=1107, upload=60



Again, we see that generating the distance fields takes over one second. In this case, the upload also takes a bit longer, since there is more data to be uploaded into graphics memory. There is not much to be done about that though, other than making sure it is done at startup time and not while the user is watching a smooth animation. As mentioned, though, I will focus on the rendering part in this blog.

So what is the solution?

In Qt 5.12, we will release a tool to help you improve on this for your application. It is called "Qt Distance Field Generator" and you can already find the documentation in our documentation snapshot.

The way this works is that it allows you to pregenerate the distance fields for either a selection of the glyphs in a font or all of them. Then you can append these distance fields as a new font table at the end of the font file. Since this custom font table follows SFNT conventions, the font will still be usable as a normal TrueType or OpenType file (SFNT mandates that unsupported font tables are ignored).

So the font can be used as normal and is still compatible with e.g. Qt Widgets and Text.NativeRendering, where the rasterization will still go through the system.

When the font is used in Qt Quick with Text.QtRendering, however, the special font table will be detected, and its contents will be uploaded directly to graphics memory. The cache will therefore be prepopulated with the glyphs you have selected, and the application will only have to create distance fields at runtime if they are missing from this set.

The result of this can be impressive. I repeated the experiments, but with fonts where I had pregenerated distance fields for all the glyphs that were used in the example.

First example, simple and neat "Deja Vu Sans" font, 50 latin characters:



qt.scenegraph.time.glyph: distancefield: 50 pre-generated glyphs loaded in 11ms



Second example, complex "HVD Peace" font, 50 latin characters:



qt.scenegraph.time.glyph: distancefield: 50 pre-generated glyphs loaded in 4ms



Third example, 592 Chinese characters:



qt.scenegraph.time.glyph: distancefield: 592 pre-generated glyphs loaded in 42ms



As we can see, there is a great improvement when a lot of time is spent on creating the distance fields. In the case of the complex font, we got from 1016 ms to 4 ms. When more data is uploaded, that will still take time, but in the case of the Chinese text, the upload was actually faster than when the distance fields were created on the fly. This is most likely a pure coincidence, however, caused by the order of the glyphs in the cache causing slightly different layouts and sizes.

Another peculiar thing we can see is that the complex font is faster to load than the simple one. This is simply because the glyphs in that font are square and compact, so there is not a lot of unused space in the cache. Therefore the texture atlas is a little bit smaller than for the simpler font. The complexity of the outlines does not affect the loading time of the atlas of course.

Running the same tests on my Windows Desktop workstation, we see that there is not as much overhead for generating the distance fields, but there is still some performance gain to be seen in some cases.

For 50 Latin glyphs with Deja Vu Sans, both tests clocked in at 3 ms, which was mainly spent uploading the data. For HVD Peace, however, generating the distance fields took 131 ms (versus 1 ms for just the upload) and for the Chinese text it took 146 ms (vs 11)

Hopefully this can help some of you get even better performance out of your Qt devices and applications. The feature is already available in the Qt 5.12 beta, so download the package and take it for a test drive right away.