This article presents a high performance approach to string handling and a low level C library. Tools, game engines, and games can benefit from safe, fast string handling. Many string libraries are based on C's standard functions. While the standard C string functions are sometimes adequate, they are often unsafe, slow, or not suited to multi-threaded programming. Often, they thwart safe, clean, efficient problem solving.

Many high level string libraries build on C's standard approach and inherit these problems. This article presents a new, low-level library to address and overcome these issues. In addition, the code for the string library has also been made available here for download.

Strings are a derived data type used to organize sets of characters. Applications of all kinds, including word processors, text editors, programming tools, and games make extensive use of strings. Most games use strings for messages, player information, configuration, scripting, and internal identification of resources.

A string is an array of characters. The most popular approach to string handling dates back to the creation of the C language.[1] This approach uses a pointer to the first character in the string and a zero to signal the end of the string.

If the first character in string is a zero, then the string is said to be empty. A non-empty string can be a single a single character such as “A”, a word like “hello”, a whole sentence, or even an entire file. Standard double quotes are used to demarcate a string and single quotes are used for individual characters.

string characters in the string and the zero-terminator string length

0



















0 "A" 'A' 0

















1 "hello" 'h' 'e' 'l' 'l' 'o' 0









5 "set x 3.14" 's' 'e' 't' ' ' 'x' ' ' '3' '.' '1' '4' 0 10

Chracter Types

The new library defines and uses the type 'chr' for a character. The library can be rebuilt for any power-of-two character size.

Terminating Strings

The new library supports standard zero-terminated C strings and is designed to co-exist with all existing code. The biggest difference between the new library and the standard C library is the addition of an optional pointer-terminator parameter to the string handling routines. The pointer-terminator is a pointer to one character past the last valid character in the string. That is, it is a pointer to the zero-terminator, or where the zero-terminator would be if it were present.

There are significant benefits to supporting pointer-terminated strings. One benefit is that pointer-terminated strings allow you to safely define words in a shared buffer without modifying the buffer. For example, if you are scanning a buffer and find a single word that you would like to operate on as an individual string, you have two options with C's standard routines: Either temporarily poke a zero into the buffer to end the word, or copy the word into a second buffer. Modifying the buffer is sometimes not possible and usually not desirable. Copying the word to a second buffer will typically involve memory allocation and freeing and always increase processing time. Both options are undesirable. With pointer-terminated strings, you avoid these problems by passing pointers to the start and end of the word.

Pointer-terminated strings also allow the programmer to find the length of a string by subtracting the starting pointer from the pointer-terminator. Furthermore, routines that scan backwards through strings are simplified when the end of the string is known.

Forced Zero-Termination

One rule the new library follows is that buffer building functions always add or force a zero-terminator into the buffer. This helps simplify the handing of buffer overflow issues by truncating strings that do not fit entirely into the destination buffer. This also makes it easier to integrate the new library with existing code.

Pointer-Termination Parameters

There are potential drawbacks to using pointer-terminated strings. One drawback is that you often don't have a pointer to the end of the string, and to get that pointer, you have to scan the string yourself. To address this drawback, the pointer-terminator may always be passed as a null pointer. When the pointer-terminator is passed as a null pointer, the routine treats the string as a standard zero-terminated string.

Another slight drawback is the performance hit incurred when passing null pointer-terminators. To address this, many of the routines have multiple versions, some of which do not take a pointer-terminator and rely on the string being zero-terminated.