TL;DR

This blog post explains how setting an environment variable can save thousands (or in some cases, tens of thousands) of unnecessary system calls that can be generated by glibc over small periods of time.

This has been tested on Ubuntu Precise (12.04) and Ubuntu Xenial (16.04). It likely applies to other flavors of Linux, as well. It is very easy to test if this applies to you and to correct it, if so. Keep reading for more details!

Setup your own NPM registry for free. Sign up!

Quick summary

To avoid extra system calls on server processes where you won’t be updating the timezone (or can restart processes when you do) simply set the TZ environment variable to :/etc/localtime (or some other timezone file of your choice) for a process. This will cause glibc to avoid making extra (and unnecessary) system calls.

To understand why this is and how to test if your processes can benefit, read on!

The problem

In our previous blog post about Linux system calls, we explained the different system call methods and highlighted a very interesting method in particular: the vDSO system call method.

The purpose of the vDSO system call method was to create a way for certain, very frequently used system calls (like gettimeofday , time , clock_gettime , etc) to avoid needing to actually enter the kernel and cause a context switch from user land to kernel land. The result of this method is that certain system calls, like those listed, can be used by programs at much, much lower cost.

What would happen if every time you called one of these fast vDSO system calls (like time ) you also called a normal system call like, say stat , which does not pass through the vDSO?

If you did that, you’d essentially be negating some of the performance improvement you were meant to gain by the vDSO optimization in the first place; you’d be making a slow system call very often.

It turns out that this situation happens rather often with a pair of functions that are commonly used together:

time : A vDSO-enabled system call used to obtain the number of seconds since the epoch, and

: A vDSO-enabled system call used to obtain the number of seconds since the epoch, and localtime : A glibc provided function which converts the output of time to a local time in the user’s timezone. This is not a system call, but internally localtime can make a system call in some cases.

The time vDSO-enabled system call and the localtime function from glibc are often used together in applications either directly by the programmer, or at a lower level unbeknownst to the programmer for formatting dates and times for everything from log messages to SQL queries. This pair is commonly used in Ruby on Rails.

It turns out that the localtime function in glibc will check if the TZ environment variable is set. If it is not set (the two Ubuntus I’ve tested do not set it), then glibc will use the stat system call every time localtime is called.

In other words: your system supports calling the time system call via the Linux kernel’s vDSO to avoid the cost of switching to the kernel. But, as soon as your program calls time , it calls localtime immediately after, which invokes a system call anyway. So, you’ve eliminated one system call with the vDSO, but replaced it with another.

Let’s see a sample program that shows this behavior, how to use strace to detect this, and finally how to prevent it by setting the TZ environment variable.

Sample program showing the issue

Let’s start by creating a simple test program that reproduces this issue: #include <time.h> #include <stdio.h> int main(int argc, char *argv[]) { int i = 0; time_t timep; printf("Greetings!

"); for(i=0; i<10; i++) { time(&timep); localtime(&timep); } printf("Godspeed, dear friend!

"); return 0; }

You can compile this program by simply running gcc -o test test.c . As you can see, this program simply calls time and localtime in a loop 10 times.

Verifying this with strace

Every single call to the localtime glibc function will generate a system call to stat . Don’t believe me? Let’s use strace to prove it using the program shown above: $ gcc -o test test.c $ strace -ttT ./test ... truncated output ... 23:55:16.957457 write(1, "Greetings!

", 11) = 11 <0.000033> 23:55:16.957560 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3 <0.000013> 23:55:16.957650 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000007> 23:55:16.957723 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000006> 23:55:16.957797 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127 <0.000011> 23:55:16.957904 lseek(3, -71, SEEK_CUR) = 56 <0.000008> 23:55:16.957975 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71 <0.000008> 23:55:16.958048 close(3) = 0 <0.000009> 23:55:16.958130 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958215 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958296 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000010> 23:55:16.958375 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958454 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958533 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958611 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958690 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958769 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009> 23:55:16.958848 write(1, "Godspeed, dear friend!

", 23) = 23 <0.000020>

In the strace output above we see a few things:

The /etc/localtime file is opened. fstat is called twice, followed by two reads to the file to pull in the timezone data. Next, 9 calls to stat are made passing /etc/localtime over and over.

Notice that the strace output does not show the call to time . This is expected. System calls made via the vDSO do not appear in strace output. To see them, you’d need to use ltrace instead.

What’s going on here is that the first call to localtime in glibc opens and reads the contents of /etc/localtime . All subsequent calls to localtime internally call stat , but they do this to ensure that the timezone file has not changed.

On many systems /etc/localtime is a symlink to a timezone file. It is conceivable that a program might be running when the /etc/localtime symlink is updated. If this were to happen, glibc would notice this when localtime is called and re-read the file before doing any time conversions.

Preventing extraneous system calls

Many production systems use the UTC timezone and don’t ever need (or want) to change that. For this use case, there’s no reason to stat the /etc/localtime file or symlink over and over and over when localtime is called. The timezone is never going to change from UTC (or if it does, the application can just be restarted).

The easiest way to prevent these stat calls is to set the TZ environment variable. When the TZ environment variable is set glibc will:

Notice you’ve told it explicitly what timezone file to use for your program. Read the file and cache it internally. Never stat or read that file path again, as long as the value of the TZ environment variable is left unchanged.

Let’s set the TZ variable and check with strace : $ TZ=:/etc/localtime strace -ttT ./test 00:14:53.274564 write(1, "Greetings!

", 11) = 11 <0.000051> 00:14:53.274721 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3 <0.000038> 00:14:53.274873 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000026> 00:14:53.275001 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000026> 00:14:53.275129 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127 <0.000028> 00:14:53.275257 lseek(3, -71, SEEK_CUR) = 56 <0.000027> 00:14:53.275379 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71 <0.000033> 00:14:53.275512 close(3) = 0 <0.000029> 00:14:53.275657 write(1, "Godspeed, dear friend!

", 23) = 23 <0.000048>

As you can see, simply setting TZ causes glibc to read /etc/localtime a single time and never again. The same program runs with 9 fewer system calls all because of a single environment variable.

Effect on production systems

The effect of setting TZ on a production system will depend mostly on how often localtime is called. This is application specific and can vary with request load. All that said, if you aren’t changing your timezone often (or ever), it may be worth simply eliminating these unnecessary calls even if your system isn’t making many of them.

On my test system with a real life app (not the example shown above):

Without setting TZ during normal operations yields approximately: 14,925 calls to stat over a 30 second period (or roughly 497 stat s per second).

during normal operations yields approximately: 14,925 calls to over a 30 second period (or roughly 497 s per second). With TZ set during the same time period results in 8 calls to stat over a 30 second period.

So, I’ve eliminated on the order of 10,000 extra system calls (and their associated context switches) without changing anything other than an environment variable. Pretty cool.

Conclusion

Using strace and questioning why patterns of system calls emerge from your infrastructure can help you better understand exactly what your systems are doing and why. You may even be able to remove unncessary system calls and save some system resources, too.

Related posts

If you enjoyed this post, you may enjoy some of our other low-level technical posts: