Amazon's Elastic Compute Cloud command-line tools are useful building blocks for creating more complex shell scripts. They allow you to start and stop instances, get their status, add tags, manage storage, IP addresses, and so on. They have one big disadvantage: they take a long time to run. For instance, running ec2-describe-instances for six instances takes 19 seconds on an m1.small AWS Linux instance. One answer given, is that this is caused by JVM startup overhead. I found that hard to believe, because on the same machine a Java "hello world" program executes in 120ms, and running ec2-describe-instances --help takes just 321ms. So I set out to investigate, and, using multiple tracing tools and techniques, this is what I found.

By running the ec2-describe-instances shell script using the shell's -x flag I found that ec2-describe-instances calls ec2-cmd.

$ bash -x `which ec2-describe-instances` + __ZIP_PREFIX__EC2_HOME=/opt/aws/apitools/ec2 + __RPM_PREFIX__EC2_HOME=/usr/local/aes/cmdline + /opt/aws/apitools/ec2/bin/ec2-cmd DescribeInstances

$ bash -x /opt/aws/apitools/ec2/bin/ec2-cmd DescribeInstances [...] + /usr/lib/jvm/java/bin/java -classpath :/opt/aws/apitools/ec2/lib/activation-1.1.jar:[...] com.amazon.aes.webservices.client.cmd.DescribeInstances

Using the same technique I found that ec2-cmd runs Java on the class com.amazon.aes.webservices.client.cmd.DescribeInstances.So from then on I invoked Java directly to avoid the noise caused by the shell scripts, as their time overhead was negligible.

I then checked the interaction between the command and the AWS web services. I did this by running tcpdump and strace in parallel and looking at the logs' timestamps. The following command starts logging packets, runs the Java program logging, and then stops logging packets. On an otherwise idle machine this gives you a very clear packet trace.

( sudo tcpdump -n >tcpdump.out & ) ; \ strace -f -o strace.out -T -tt \ /usr/lib/jvm/java/bin/java \ -classpath :/opt/aws/apitools/ec2/lib/activation-1.1.jar: [ ... ] \ com.amazon.aes.webservices.client.cmd.DescribeInstances ; \ sudo killall tcpdump

18:51:33.053700 IP 10.39.93.124.36632 > 172.16.0.23.domain: 10259+ A? ... 18:51:33.053985 IP 10.39.93.124.36632 > 172.16.0.23.domain: 17669+ AAAA? ... 18:51:33.175941 IP 172.16.0.23.domain > 10.39.93.124.36632: 10259 1/0/0 A (51) 18:51:33.209711 IP 172.16.0.23.domain > 10.39.93.124.36632: 17669 0/1/0 (113) 18:51:33.211305 IP 10.39.93.124.57985 > 172.16.0.23.domain: 64553+ PTR? ...

30775 18:51:00.087534 execve("/usr/lib/jvm/java/bin/java", ["/usr/lib/jvm/java/bin/java", "-XX:-UsePerfData", "-classpath", "/opt/aws/apitools/ec2/lib/activa"..., "com.amazon.aes.webservices.clien"...], [/* 31 vars */]) = 0 <0.000417> 30775 18:51:00.088415 brk(0) = 0x1e9d000 <0.000019> 30775 18:51:00.088576 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f915f52a000 <0.000021> 30775 18:51:00.088679 readlink("/proc/self/exe", "/media/ephemeral0/opt/jdk1.7.0_04/jre/bin/java", 4096) = 46 <0.000045>

Ignoring some ssh interactions, the first packets came out at 18:51:33This was 33 seconds after the program had started running.Therefore, I had to concentrate on Java's startup performance and ignore the AWS interactions. In retrospect that should have been a warning that I should have noticed.

I then run strace with the -c flag, which produces a cumulative count of the system calls and the time spent in them. The results were interesting.

% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 98.34 18.562341 4428 4192 571 futex 1.07 0.201412 1 178548 gettimeofday 0.27 0.051676 47 1094 mprotect 0.12 0.023182 4 6428 5687 stat 0.11 0.020168 160 126 madvise 0.06 0.011662 1 11292 lseek 0.01 0.002735 0 13208 read

export EC2_JVM_ARGS = -XX:-UsePerfData

Reasoning that futex was simply showing the synchronization between threads I decided to try to eliminate gettimeofday, because of a reference I found regarding its effect on virtualized machines . After some searching and reading the command's invocation code I found I could eliminate the gettimeofday calls by setting an environment variable.This reduced the clutter in the strace file, but didn't improve the performance.

I then tested how long it took to open and read all the files, by writing a command sequence that got the file names from the trace log and concatenated their contents.

fgrep ' open(' strace.out | grep -v ENOENT | sed 's/",.*//;s/.*"//' | egrep -v '^/(proc|dev)' | xargs cat | wc -c

The total size of the output was 25MB and the command run swiftly. So simply reading the files was not costly.

Another thing that struck me in the trace file were thousands of failed calls to open class files in the current directory.

stat("/home/dds/org/codehaus/xfire/fault/XFireFault.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/org/codehaus/xfire/XFireException.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/ImageAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/KernelAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/ImageValueAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/RamdiskAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/BlockDeviceMappingAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/ImageMapAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/ProductCodesAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/ImageListAttribute.class", 0x7f82cb39e280) = -1 ENOENT (No such file or directory) stat("/home/dds/com/amazon/aes/webservices/client/LaunchPermissionAttribute.class", 0x7f82cb39ffd0) = -1 ENOENT (No such file or directory)

for jar in " ${ LIBDIR } " /*.jar ; do CP = " ${ CP } : ${ jar } " done

CP

for

CP = $( echo $CP | sed 's/://' )

I traced the cause to be the way the invocation shell script built the class path by concatenating all jar files in the library.This sequence results in the class path, stored in thevariable, starting with a colon, which is taken to mean to search in the current directory. I fixed that problem by adding the following line after theloop:This seemed to improve the command's execution time, but only by about a couple of seconds.

While examining the strace files I found many long read/seek sequences, like the following.

13190 15:07:29.981602 lseek(3, 56113794, SEEK_SET) = 56113794 <0.000034> 13190 15:07:29.981713 read(3, "PK\3\4

\0\0\0\0\0\345)p@\216a\310\346\f\3\0\0\f\3\0\0001\0\0\0", 30) = 30 <0.000034> 13190 15:07:29.981848 lseek(3, 56113873, SEEK_SET) = 56113873 <0.000034> 13190 15:07:29.981953 read(3, "\312\376\272\276\0\0\0001\0%

\0\4\0\26

\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000036> 13190 15:07:29.986162 lseek(3, 56112935, SEEK_SET) = 56112935 <0.000040> 13190 15:07:29.986288 read(3, "PK\3\4

\0\0\0\0\0\345)p@\261\311$\200\f\3\0\0\f\3\0\0001\0\0\0", 30) = 30 <0.000033> 13190 15:07:29.986410 lseek(3, 56113014, SEEK_SET) = 56113014 <0.000031> 13190 15:07:29.986507 read(3, "\312\376\272\276\0\0\0001\0%

\0\4\0\26

\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000043> 13190 15:07:29.990264 lseek(3, 56112080, SEEK_SET) = 56112080 <0.000036> 13190 15:07:29.990379 read(3, "PK\3\4

\0\0\0\0\0\345)p@\32\17|\317\10\3\0\0\10\3\0\0001\0\0\0", 30) = 30 <0.000042> 13190 15:07:29.990515 lseek(3, 56112159, SEEK_SET) = 56112159 <0.000034> 13190 15:07:29.990626 read(3, "\312\376\272\276\0\0\0001\0%

\0\4\0\26

\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 776) = 776 <0.000040> 13190 15:07:29.994931 lseek(3, 56111220, SEEK_SET) = 56111220 <0.000047> 13190 15:07:29.995065 read(3, "PK\3\4

\0\0\0\0\0\345)p@$\333\211\31\f\3\0\0\f\3\0\0002\0\0\0", 30) = 30 <0.000044> 13190 15:07:29.995204 lseek(3, 56111300, SEEK_SET) = 56111300 <0.000035> 13190 15:07:29.995316 read(3, "\312\376\272\276\0\0\0001\0%

\0\4\0\26

\0\27\0\30\7\0\31\7\0\32\1\0\6"..., 780) = 780 <0.000043> 13190 15:07:29.999541 lseek(3, 56110361, SEEK_SET) = 56110361 <0.000043> 13190 15:07:29.999667 read(3, "PK\3\4

\0\0\0\0\0\345)p@2u\340\335\v\3\0\0\v\3\0\0002\0\0\0", 30) = 30 <0.000047> 13190 15:07:29.999809 lseek(3, 56110441, SEEK_SET) = 56110441 <0.000037>

ec2-run-instances $AMI -b /dev/sdb = ephemeral0 ...

tar cf - /opt/aws | sudo tar -C /media/ephemeral0/ -xf - sudo ln -s /media/ephemeral0/opt/ /opt tar cf - /usr/lib/jvm | sudo tar -C /media/ephemeral0/ -xf - sudo ln -s /media/ephemeral0/usr/lib/jvm/ /usr/lib/jvm tar cf - /usr/share | sudo tar -C /media/ephemeral0/ -xf - sudo ln -s /media/ephemeral0/usr/share/ share

This got me thinking about the performance implications of these, and whether the seek calls were somehow interacting with Amazon's Simple Storage Service (S3), which was used as the machine's backing storage. Searching for data on AWS's performance, I came across an excellent technical report written by Simson L. Garfinkel of Harvard's Computer Science Group. There he mentioned that S3 performance depended on the size of the objects stored. Although the operating system's file system and buffer cache should isolate the seek system calls from S3, I decided to test whether using local storage would improve the performance. I therefore modified the instance's startup command to attach some local storage to the machineand moved most of the files Java was opening into it.However, I didn't see any performance benefit.

I then considered whether all those seek/read calls were an inefficiency of the OpenJDK that comes with Amazon's AWS Linux. Could Sun's (now Oracle's) Java runtime be more efficient? Based on instructions by Dmitry Leskov I downloaded and run Orcacle's Java 1.7, but the performance appeared to deteriorate a bit. Another dead-end.

In the end it turned out that all the solutions I tried produced negligible or (in the case of Oracle's JVM negative) results:

removing thousands of calls to gettimeofday ,

, optimizing Java's library path,

using local storage,

using Oracle's JVM.

Apparently, AWS command-line tools are indeed slow due to Java's long startup time. Dynamic class loading ensures that running small programs or even an AWS command with the -help flag isn't slow, but when the command needs to do some actual work and thus loads all the required class files, it takes many seconds to run.

So what can one do? Here are some things that do work.

Use a faster multi-core machine: an m1.large instance runs the tools twice as fast as an m1.small one.

Make the command work harder for you. For instance, pass multiple instance ids to ec2-terminate-instances, or multiple tags to ec2-create-tags, instead of calling each command multiple times.

Use the API from another language. As an example, you can obtain your machine's instance id in just 10ms by using the following shell command. curl -s http://169.254.169.254/2011-01-01/meta-data/instance-id

If you're looking for the answer on why the AWS command-line tools are slow I hope that this blog entry has saved you the expense of running down a few blind alleys. Otherwise, I hope you found out how you can use tracing tools to investigate performance problems.