The ups and downs of strlcpy()

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Adding the strlcpy() function (and the related strlcat() function) has been a perennial request (1, 2, 3) to the GNU C library (glibc) maintainers, commonly supported by a statement that strlcpy() is superior to the existing alternatives. Perhaps the earliest request to add these BSD-derived functions to glibc took the form of a patch submitted in 2000 by a fresh-faced Christoph Hellwig.

Christoph's request was rejected, and subsequent requests have similarly been rejected (or ignored). It's instructive to consider the reasons why strlcpy() has so far been rejected, and why it may well not make its way into glibc in the future.

A little prehistory

In the days before programmers considered that someone else might want to deliberately subvert their code, the C library provided just:

char *strcpy(char *dst, const char *src);

with the simple purpose of copying the bytes from the string pointed to by src (up to and including the terminating null byte) to the buffer pointed to by dst .

Naturally, when calling strcpy() , the programmer must take care that the bytes being copied don't overrun the space available in the buffer pointed by dst . The effect of such buffer overruns is to overwrite other parts of a process's memory, such as neighboring variables, with the most common result being to corrupt data or to crash the program.

If the programmer can with 100% certainty predict at compile time the size of the src string, then it's possible (if unwise) to preallocate a suitably sized dst buffer and omit any argument checks before calling strcpy() . In all other cases, the call should be guarded with a suitable if statement to check the size of its argument. However, strings (in the form of input text) are one of the ways that humans interact with computers, and thus quite commonly the size of the src string is controlled by the user of a program, not the program's creator. At that point, of course, it becomes essential for every call to strcpy() to be guarded by a suitable if statement:

char dst [DST_SIZE]; ... if (strlen(src) < DST_SIZE) strcpy(dst, src);

(The use of < rather than <= ensures that there's at least one byte extra byte available for the null terminator.)

But it was easy for programmers to omit such checks if they were forgetful, inattentive, or cowboys. And later, other more attentive programmers realized that by carefully controlling what was written into the overflowed buffer, and overrunning into more exotic places such as function call return addresses stored on the stack, they could do much more interesting things with buffer overruns than simply crashing the program. (And because code tends to live a long time, and the individual programmers creating it can be slow to to learn about the sharp edges of the tools they use, even today buffer overruns remain one of the most commonly reported vulnerabilities in applications.)

Improving on strcpy()

Prechecking the arguments of each call to strcpy() is burdensome. A seemingly obvious way to relieve the programmer of that task was to add an API that allowed the caller to inform the library function of the size of the target buffer:

char *strncpy(char *dst, const char *src, size_t n);

strncpy()

strcpy()

n

src

dst

n

dst

Thefunction is like, but copies at mostbytes fromto. As long asdoes not exceed the space allocated in, a buffer overrun can never occur.

Although choosing a suitable value for n ensures that strncpy() will never overrun dst , it turns out that strncpy() has problems of its own. Most notably, if there is no null terminator in the first n bytes of src , then strncpy() does not place a null terminator after the bytes copied to dst . If the programmer does not check for this event, and subsequent operations expect a null terminator to be present, then the program is once more vulnerable to attack. The vulnerability may be more difficult to exploit than a buffer overflow, but the security implications can be just as severe.

One iteration of API design didn't solve the problems, but perhaps a further one can… Enter, strlcpy() :

size_t strlcpy(char *dst, const char *src, size_t size);

strlcpy() is similar to strncpy() but copies at most size-1 bytes from src to dst , and always adds a null terminator following the bytes copied to dst .

Problems solved?

strlcpy() avoids buffer overruns and ensures that the output string is null terminated. So why have the glibc maintainers obstinately refused to accept it?

The essence of the argument against strlcpy() is that it fixes one problem—sometimes failing to terminate dst in the case of strncpy() , buffer overruns in the case of strcpy() —while leaving another: the loss of data that occurs when the string copied from src to dst is truncated because it exceeds size . (In addition, there is still an unusual corner case where the unwary programmer can find that strlcat() , the analogous function for string concatenation, leaves dst without a null terminator.)

At the very least, (silent) data loss is undesirable to the user of the program. At the worst, truncated data can lead to security issues that may be as problematic as buffer overruns, albeit probably harder to exploit. (One of the nicer features of strlcpy() and strlcat() is that their return values do at least facilitate the detection of truncation—if the programmer checks the return values.)

All of which brings us full circle: to avoid unhappy users and security exploits, in the general case even a call to strlcpy() (or strlcat() ) must be guarded by an if statement checking the arguments, if the state of the arguments can't be predicted with certainty in advance of the call.

Where are we now?

Today, strlcpy() and strlcat() are present on many versions of UNIX (at least Solaris, the BSDs, Mac OS X, and Irix), but not all of them (e.g., HP-UX and AIX). There are even implementations of these functions in the Linux kernel for internal use by the kernel code. Meanwhile, these functions are not present in glibc, and were rejected for inclusion in the POSIX.1-2008 standard, apparently for similar reasons to their rejection from glibc.

Reactions among core glibc contributors on the topic of including strlcpy() and strlcat() have been varied over the years. Christoph Hellwig's early patch was rejected in the then-primary maintainer's inimitable style (1 and 2). But reactions from other glibc developers have been more nuanced, indicating, for example, some willingness to accept the functions. Perhaps most insightfully, Paul Eggert notes that even when these functions are provided (as an add-on packaged with the application), projects such as OpenSSH, where security is of paramount concern, still manage to either misuse the functions (silently truncating data) or use them unnecessarily (i.e., the traditional strcpy() and strcat() could equally have been used without harm); such a state of affairs does not constitute a strong argument for including the functions in glibc.

The appearance of an embryonic entry on this topic in the glibc FAQ, with a brief rationale for why these functions are currently excluded, and a note that " gcc -D_FORTIFY_SOURCE " can catch many of the errors that strlcpy() and strlcat() were designed to catch, would appear to be something of a final word on the topic. Those that still feel that these functions should be in glibc will have to make do with the implementations provided in libbsd for now.

Finally, in case it isn't obvious by now, it should of course be noted that the root of this problem lies in the C language itself. C's native strings are not managed strings of the style natively provided in more modern languages such as Java, Go, and D. In other words, C's strings have no notion of bounds checking (or dynamically adjusting a string's boundary) built into the type itself. Thus, when using C's native string type, the programmer can never entirely avoid the task of checking string sizes when strings are manipulated, and no replacements for strcpy() and strcat() will ever remove that need. One might even wonder if the original C library implementers were clever enough to realize from the start that strcpy() and strcat() were sufficient—if it weren't for the fact that they also gave us gets() .