Finding the Root Cause of a Web Request Latency

When trying to solve complex performance issues, a kernel trace can be the last resort: capture everything and try to understand what is going on. With LTTng, that would go like that:

lttng create lttng enable-event -k -a lttng start ...wait for the problem to appear... lttng stop lttng destroy

Once this is done, depending on how long the capture is, there are probably a lot of events in the resulting trace (~50.000 events per second on my mostly idle laptop), now it is time to make sense of it !

This post is a first in a serie to present the LTTng analyses scripts. In this one, we will try to solve an unusual I/O latency issue.

Our test case is a PHP image board (tinyib-mappy) and we want to understand why some client requests are significantly slower than others. In this example, we manually call sync to force the kernel to flush the dirty buffers on disk to have an easy example, but the methodology is the same with more complex use-cases.

On the server, we are recording a kernel LTTng session as explained earlier. On the client, we test and measure with this command:

$ while true; do time wget -q -p http://10.2.0.1/; sleep 0.5; done ... real 0m0.017s real 0m0.016s real 0m0.454s real 0m0.016s real 0m0.015s ...

The output here shows that one request took much longer than the usual delay (around 430ms more). Now we will try to understand the source of this latency.

I/O usage

The resulting trace covers around 16 seconds. Our first step to get an idea of what was the I/O activity in the trace is to look at the lttng-iousagetop script:

$ ./lttng-iousagetop analysis-20150115-120942/ Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313] Per-process I/O Read ############################################################################### ██████████████████████████████████████████████████ 528.0 KB lttng-consumerd (19964) 0 B file 0 B net 528.0 KB unknown ███████ 80.0 KB lttng-consumerd (19964) 0 B file 0 B net 80.0 KB unknown ███ 39.45 KB apache2 (31544) 33.7 KB file 5.33 KB net 424.0 B unknown ██ 29.58 KB apache2 (31591) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31592) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31545) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31517) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31520) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31584) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31589) 25.28 KB file 4.0 KB net 318.0 B unknown Per-process I/O Write ############################################################################### ██████████████████████████████████████████████████ 2.24 MB find (22785) 0 B file 0 B net 2.24 MB unknown ███████████ 528.0 KB lttng-consumerd (19964) 0 B file 0 B net 528.0 KB unknown ████ 225.24 KB apache2 (31544) 0 B file 221.3 KB net 3.94 KB unknown ███ 168.93 KB apache2 (31591) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31592) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31545) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31517) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31520) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31584) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31589) 0 B file 165.98 KB net 2.95 KB unknown

In this first section, we see the top 10 processes performing I/O system calls. From the information in the trace, we try to identify if the requests are disk or network-related, but sometimes the trace does not have all the information we need (especially with FD passing between processes). Of course LTTng itself is doing some I/O to store the trace, but besides that, we see the apache2 threads receiving our HTTP GET requests, reading some files on disk and answering on the network. We also see a find process that was launched on the system during the capture of the trace, but up to now, nothing alarming.

Note: you can see that the apache2 processes read less than they write, it is not because Apache magically creates new data, it is a current limitation in the syscall instrumentation. When Apache sends a complete file, it opens it, mmap it and passes the address of the memory map in a vector to writev, but the current instrumentation does not show us the content of the vectors for now, so we can't assume that it relates to the last opened file. This can be fixed in the instrumentation and should be soon.

Next, we see the same information, but sorted by files, so we can see the files shared by multiple processes (and the respective FD number in each process):

Files Read ############################################################################### ██████████████████████████████████████████████████ 304.0 KB anon_inode:[lttng_stream] (lttng-consumerd) 'fd 28 in lttng-consumerd (19964)' ██████████████████████████████████████████ 259.87 KB /var/www/html/tinyib.db 'fd 14 in apache2 (31591)', 'fd 14 in apache2 (31592)', 'fd 14 in apache2 (31545)', 'fd 14 in apache2 (31517)', 'fd 14 in apache2 (31520)', 'fd 14 in apache2 (31544)', 'fd 14 in apache2 (31584)', 'fd 14 in apache2 (31589)', 'fd 14 in apache2 (31588)', 'fd 14 in apache2 (31590)' ██ 12.85 KB socket (lttng-sessiond) 'fd 44 in lttng-sessiond (19839)' 6.0 KB /proc/interrupts 'fd 3 in irqbalance (1337)' 4.88 KB /lib/x86_64-linux-gnu/libc.so.6 'fd 3 in find (22785)', 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in sync (22791)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 4.82 KB socket:[168432] (ssh) 'fd 3 in ssh (10368)' 4.45 KB /proc/stat 'fd 3 in irqbalance (1337)' 3.25 KB /lib/x86_64-linux-gnu/libz.so.1 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 3.25 KB /lib/x86_64-linux-gnu/libpthread.so.0 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 2.51 KB /usr/share/locale/locale.alias 'fd 3 in find (22785)' Files Write ############################################################################### ██████████████████████████████████████████████████ 2.24 MB /dev/null 'fd 1 in find (22785)', 'fd 2 in bash (22784)', 'fd 2 in bash (22790)', 'fd 2 in bash (22796)' ██████ 304.0 KB /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata 'fd 32 in lttng-consumerd (19964)' ██████ 304.0 KB pipe:[180336] (lttng-consumerd) 'fd 18 in lttng-consumerd (19964)' █ 55.33 KB unknown (origin not found) (apache2) 'fd 12 in apache2 (31590)' 30.52 KB /var/log/apache2/access.log 'fd 8 in apache2 (31591)', 'fd 8 in apache2 (31592)', 'fd 8 in apache2 (31545)', 'fd 8 in apache2 (31517)', 'fd 8 in apache2 (31520)', 'fd 8 in apache2 (31544)', 'fd 8 in apache2 (31584)', 'fd 8 in apache2 (31589)', 'fd 8 in apache2 (31588)', 'fd 8 in apache2 (31590)' 12.85 KB socket (lttng) 'fd 3 in lttng (22797)' 2.54 KB socket:[164921] (sshd) 'fd 3 in sshd (10339)' 1.69 KB socket:[164888] (sshd) 'fd 3 in sshd (10236)' 1.39 KB /dev/pts/4 'fd 5 in ssh (10368)' 604.0 B /dev/pts/1 'fd 0 in bash (10237)', 'fd 2 in bash (10237)', 'fd 1 in lttng (22752)'

Once again, we see some LTTng-related I/O, we see our sqlite database accessed by apache, the apache log files and now we understand where find is writing ( find / >dev/null )

Next, we see the actual block I/O performed, so how much really touched the disk during the trace period (not cached or buffered):

Block I/O Read ############################################################################### ██████████████████████████████████████████████████ 38.47 MB find (pid=22785) 128.0 KB (pid=-1) 28.0 KB sync (pid=22791) Block I/O Write ############################################################################### ██████████████████████████████████████████████████ 7.57 MB jbd2/sda1-8 (pid=245) █████████████████████████████████████████ 6.24 MB (pid=-1) ███ 572.0 KB kworker/u32:1 (pid=21668) █ 264.0 KB lttng-consumerd (pid=19964) 56.0 KB sync (pid=22791) 40.0 KB lttng-consumerd (pid=19964)

Here, we are starting to see some interesting stuff: even though we didn't do much I/O as far as the system calls are concerned, some kernel threads wrote around 14MB of data directly on disk. So there was definitely some I/O activity but we don't know which process started it.

Finally, the iotop script gives us the breakdown of network usage per network interface:

Network recv_bytes ############################################################################### ████████████████████████████████████████████████████████ 103.9 KB eth1 ███ 6.65 KB eth0 Network sent_bytes ############################################################################### ████████████████████████████████████████████████████████ 1.72 MB eth1 8.03 KB eth0

Ok, so in this phase we saw the overall I/O usage in the trace, no obvious process was saturating our ressources, but we know we are writing actual data on disk. Now let's have a look at the I/O latencies.

I/O latency

The I/O latency is the delay an I/O system call takes to complete (time elapsed between the syscall entry, syscall exit), or the delay for a block request to complete (block_rq_issue, block_rq_complete). My bash process is always waiting to read one character for my keyboard, so in all the examples, I will filter out the syscalls that consume less that 2 bytes with the parameter --minsize 2

Latency statistics

The first step is to get some statistics:

$ ./lttng-iolatencystats analysis-20150115-120942/ --minsize 2 Syscalls latency statistics (usec): Type Count Min Average Max Stdev ---------------------------------------------------------------------- Open 14724 0.250 32.173 437697.753 3607.107 Read 1034 0.360 2.383 482.860 15.337 Write 1036 0.298 7.778 2667.346 82.972 Sync 153 13.393 6601.007 994745.622 80413.186 Disk latency statistics (usec): Name Count Min Average Max Stdev ----------------------------------------------------------------------------------------- sda1 10211 0.002 0.507 24.260 1485.700

Now we are getting somewhere ! If we look at the Max column, the Open and Sync have particularly huge numbers compared to the other statistics.

Latency frequency distribution

To illustrate this, let's look at the frequency distribution of just the open operations which include most of the syscalls that return a new file descriptor (open, openat, connect, accept, etc).

$ ./lttng-iolatencyfreq prototyping/analysis-20150115-120942 --minsize 2 Open latency distribution (usec) ############################################################################### 0.250 ████████████████████████████████████████████████████████████ 14723 21885.125 0 43770.000 0 65654.875 0 87539.751 0 109424.626 0 131309.501 0 153194.376 0 175079.251 0 196964.126 0 218849.002 0 240733.877 0 262618.752 0 284503.627 0 306388.502 0 328273.377 0 350158.252 0 372043.128 0 393928.003 0 415812.878 1

As we can see, most of the requests (14723) have a latency between 0.250 and 21885.125 usec, only one is way above. If we want to "zoom in" on this graph, we can just use the --min and --max parameters to filter out the data.

Top latencies

Now that we know we have some outliers in Open and Sync operations, let's look at the top 10 worst I/O latencies found in this trace:

$ ./lttng-iolatencytop analysis-20150115-120942/ --minsize 2 --limit 4 Top open syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) [12:18:52.946080165,12:18:52.946132278] open 52.113 N/A apache2 31588 /var/lib/php5/sess_mr9045p1k55vin1h0vg7rhgd63 (fd=13) [12:18:46.800846035,12:18:46.800874916] open 28.881 N/A apache2 31591 /var/lib/php5/sess_r7c12pccfvjtas15g3j69u14h0 (fd=13) [12:18:51.389797604,12:18:51.389824426] open 26.822 N/A apache2 31520 /var/lib/php5/sess_4sdb1rtjkhb78sabnoj8gpbl00 (fd=13) Top read syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:37.256073107,12:18:37.256555967] read 482.860 7.0 B bash 10237 unknown (origin not found) (fd=3) [12:18:52.000209798,12:18:52.000252304] read 42.506 1.0 KB irqbalance 1337 /proc/interrupts (fd=3) [12:18:37.256559439,12:18:37.256601615] read 42.176 5.0 B bash 10237 unknown (origin not found) (fd=3) [12:18:42.000281918,12:18:42.000320016] read 38.098 1.0 KB irqbalance 1337 /proc/interrupts (fd=3) Top write syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:49.913241516,12:18:49.915908862] write 2667.346 95.0 B apache2 31584 /var/log/apache2/access.log (fd=8) [12:18:37.472823631,12:18:37.472859836] writev 36.205 21.97 KB apache2 31544 unknown (origin not found) (fd=12) [12:18:37.991578372,12:18:37.991612724] writev 34.352 21.97 KB apache2 31589 unknown (origin not found) (fd=12) [12:18:39.547778549,12:18:39.547812515] writev 33.966 21.97 KB apache2 31584 unknown (origin not found) (fd=12) Top sync syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None) [12:18:37.227867532,12:18:37.232289687] sync_file_range 4422.155 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) [12:18:37.238076585,12:18:37.239012027] sync_file_range 935.442 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) [12:18:37.220974711,12:18:37.221647124] sync_file_range 672.413 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)

If we look at the first line in the Open category, we see clearly that a PHP session file took 437ms to open, whereas it usually takes around 20ms. If we remember the beginning of this example, we had a latency 430ms longer than usual, we are on the right track ! Now we need to understand why this latency occured. In order to narrow down this research, we can center the analysis around the interesting timestamp range:

$ ./lttng-iolatencytop analysis-20150115-120942/ --minsize 2 --timerange [12:18:50.432950815,12:18:50.870648568] Top open syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) Top sync syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361]* sync 994745.622* N/A sync 22791 None (fd=None) *: Syscalls started and/or completed outside of the range specified

Now here is a very interesting line: we now know that while our open was running, a sync was also running at the same time. It started before and terminated after the range we specified. So we could now widen the range to match this one and for example see the actual disk usage recorded during the period this syscall was active, or the list of the other syscalls affected by this one.

Note: instead of using the --timerange option, we can control individually the beginning and the end of the range with --begin and/or --end .

Note 2: the time is relative to the timezone of the machine running the analysis script, so if you are exchanging results with people in a different timezone, add the --gmt option to have a common time reference.

Note 3: if you are working on a trace that spans multiple days, you will have to specify the date as well.

I/O requests log

Finally, we can also have a look at it in a timeline (instead of top or frequency). For that matter, we have the lttng-iolog script (or --log option in the other scripts) which can give us an understanding of the sequence that lead to our problem. Here is an example centered around the sync period:

$ ./lttng-iolog analysis-20150115-120942/ --timerange [12:18:50.162776739,12:18:51.157522361] Log of all I/O system calls Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None) [12:18:50.432904513,12:18:50.432912248] open 7.735 N/A apache2 31517 /dev/urandom (fd=13) [12:18:50.432912958,12:18:50.432923604] read 10.646 32.0 B apache2 31517 /dev/urandom (fd=13) [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) [12:18:50.870685082,12:18:50.870689642] open 4.560 N/A apache2 31517 /var/www/html/inc/header.html (fd=14) [12:18:50.870696308,12:18:50.870698097] read 1.789 6.0 B apache2 31517 /var/www/html/inc/header.html (fd=14) [12:18:50.870785224,12:18:50.870788336] open 3.112 N/A apache2 31517 /var/www/html/tinyib.db (fd=14) [12:18:50.870795212,12:18:50.870796527] read 1.315 100.0 B apache2 31517 /var/www/html/tinyib.db (fd=14) [12:18:50.870866830,12:18:50.870867961] read 1.131 1.0 KB apache2 31517 /var/www/html/tinyib.db (fd=14) ...

Conclusion

Through this example, we identified an unusual latency and got up to the root cause: someone (or a script) launched the sync command which commited all the dirty buffers in the page cache to disk. This caused a high I/O load (for our old desktop hard drive). At the same time, our client tried to access the index page. When the client connected, PHP tried to open a temporary session file on disk which stalled 437ms because of the current disk activity.

One particularity about this workflow compared to the traditionnal debugging practices is that we only recorded the trace once and then investigated it through various tools and options. So we definitely understood this exact latency, not a look alike. For recurring problem it does not make much difference, but if you have ever investigated a weird case that happens only once a month, you know the value of this approach. Moreover, reproducing this exact example would have been really difficult considering the fact that it depends on the state of the page cache when the sync is started.

If you want to reproduce this example, the trace is available here. Feel free to experiment with the other LTTng analyses scripts, all the setup instructions are there.

Finally, we appreciate any feedback/suggestions/contributions and new use-cases/analyses ideas, so experiment with it and help us make great low-level performance debugging/monitoring tools.

Next time: an IRQ analysis.

Please enable JavaScript to view the comments powered by Disqus.