Docker for Mac Named Volume Speed Penalty

September 21, 2016

Summary

Based on some test I did today; importing a 10MB gzip compressed MySQL database dump, it seems that using a named volume is over twelve times slower than using bind mount on Docker for Mac version 1.12.1. When a named volume is used the data is stored into a virtual harddisk Docker.qcow2 . Besides the significant speed penalty, this has to me also the disadvantages of a higher risk of data loss; everything is in a single file, and making it more of a hassle to restore individual files from backup into the virtual harddisk.

Introduction

Yesterday I experimented a bit with a Docker MySQL container. I decided to check out a named volume first:

$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \ --volume mysql-data:/var/lib/mysql mysql

Somehow I was expecting a mysql-data directory to show up on my Mac mini, still running "El Capitan". So, I used find to hunt down this directory in ~/Library . Nothing found. Next, I inspected the mysql-db container:

$ docker inspect mysql-db ... "Mounts": [ { "Name": "mysql-data", "Source": "/var/lib/docker/volumes/mysql-data/_data", "Destination": "/var/lib/mysql", "Driver": "local", "Mode": "z", "RW": true, "Propagation": "rprivate" } ], ...

But sudo ls /var/lib/ didn't show a docker folder. Then I realised that since "Docker for Mac" uses a virtual machine the named volume must be created in a virtual drive. So I used screen to connect to the virtual machine as follows:

screen \ ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty

Note: if you don't see a prompt, press Enter.

I logged in as root , which requires no password. Running ls -al with the path reported by docker inspect did give the expected result:

moby:~# ls -al /var/lib/docker/volumes/mysql-data/_data/ total 188452 drwxr-xr-x 5 999 ping 4096 Sep 20 23:25 . drwxr-xr-x 3 root root 4096 Sep 20 23:25 .. -rw-r----- 1 999 ping 56 Sep 20 23:25 auto.cnf -rw-r----- 1 999 ping 1329 Sep 20 23:25 ib_buffer_pool -rw-r----- 1 999 ping 50331648 Sep 20 23:25 ib_logfile0 -rw-r----- 1 999 ping 50331648 Sep 20 23:25 ib_logfile1 -rw-r----- 1 999 ping 79691776 Sep 20 23:25 ibdata1 -rw-r----- 1 999 ping 12582912 Sep 20 23:25 ibtmp1 drwxr-x--- 2 999 ping 4096 Sep 20 23:25 mysql drwxr-x--- 2 999 ping 4096 Sep 20 23:25 performance_schema drwxr-x--- 2 999 ping 12288 Sep 20 23:25 sys

Note: you can quit the screen session press Ctrl+A followed by Ctrl+\ and answer "Really quit and kill all your windows" with "y", see the last part of Getting Started with Docker for Mac.

The virtual harddisk is /dev/vda2 and the actual data is stored in the file Docker.qcow2 on the host computer, in my case a Mac mini:

$ cd ~ $ ls -hl Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/ total 2251328 -rw-r--r-- 1 john staff 1.1G Sep 20 18:13 Docker.qcow2 : :

I don't consider it a safe idea, to store database related files all together into a virtual disk that will be backed up as a single large file. So I decided to check out bind mount; mounting a directory on OS X onto a path in the MySQL container. This way I would have access to all the database related files directly on OS X. This would make recovery from back up much easier in case of a mishap. But which method was faster? As I wanted to import a very large database dump, 1.2G compressed using gzip -9 , speed was very important to me. Since I already had created a container with a named volume using:

$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=mypassword \ --volume mysql-data:/var/lib/mysql mysql

I decided to test first the speed of storing data into a virtual harddisk. After a few attempts, which gave me the impression that this method would be slow, I decided to run the import using time to measure how long it would take.

$ time (gzip -dc big-data-dump.sql.gz |\ docker exec --interactive mysql-db \ bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD') mysql: [Warning] Using a password on the command line interface can be insecure.

Note that I use bash so I can use the environment variable $MYSQL_ROOT_PASSWORD , which is set to the actual password inside the container, instead of exposing the actual password. If there is an easier way to do this, let me know.

This morning, the process was still running. A quick check using du -hs showed that about 24% was done. I canceled the process, this was taking way too long; it had been running for close to 14 hours. As I had imported a smaller version, but still large, version of this database before in a Parallels virtual machine so I knew that this was excessive. So I decided to first do some tests with a much smaller database dump.

Edit: there seems to be an issue with the large database dump as I have problems importing it using bind-mount as well.

Named Volume versus Bind-mount

I run a local version of MediaWiki to keep notes and links. Its database dump compressed using gzip -9 is just 10M. I ran four tests using a named volume, the results are given below. Because I was not sure if VirtualBox might have something to do with the slow down I quit this program before I ran the last test. But no significant difference with the previous three runs.

Before each test I quit Docker and deleted Docker.qcow2 using:

$ rm Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/\ Docker.qcow2

Then I restarted Docker and created the container and did the import as follows:

$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \ --volume mysql-data:/var/lib/mysql mysql $ time (gzip -dc wikidb-20160921-100357.sql.gz | docker exec --interactive \ mysql-db bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD')

Results, formatted horizontally:

+-----------+----------+----------+ | real | user | sys | +-----------+----------+----------+ | 4m23.581s | 0m0.239s | 0m0.132s | | 4m15.187s | 0m0.242s | 0m0.129s | | 4m03.262s | 0m0.243s | 0m0.129s | | 4m12.767s | 0m0.240s | 0m0.126s | +-----------+----------+----------+

The average "real" over 4 runs is 4m13.634s.

Next I tested using a bind-mount. Again, before each test I quit Docker and deleted Docker.qcow2 using:

$ rm Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/\ Docker.qcow2

Then I restarted Docker and created the container and did the import as follows:

$ mkdir -p ~/Docker/volumes/mysql-data/ $ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \ --volume /Users/john/Docker/volumes/mysql-data/:/var/lib/mysql mysql $ time (gzip -dc wikidb-20160921-100357.sql.gz | docker exec --interactive \ mysql-db bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD') $ rm -rf ~/Docker/volumes/mysql-data/

Results, formatted horizontally:

+-----------+----------+----------+ | real | user | sys | +-----------+----------+----------+ | 0m21.033s | 0m0.242s | 0m0.133s | | 0m21.281s | 0m0.236s | 0m0.132s | | 0m19.348s | 0m0.235s | 0m0.128s | | 0m20.143s | 0m0.238s | 0m0.128s | +-----------+----------+----------+

The average "real" over 4 runs is 20.451s. Or using bind-mount instead of a named volume is 12.4 times faster.

Out of curiousity I also ran four tests inside a Ubuntu 15.10 installation running under VirtualBox, repeating the following two lines:

$ mysql -uroot --password=S3CR3T -e 'DROP DATABASE wikidb' $ time (gzip -dc wikidb-20160921-100357.sql.gz | mysql -uroot --password=S3CR3T)

+-----------+----------+----------+ | real | user | sys | +-----------+----------+----------+ | 0m48.117s | 0m1.196s | 0m0.220s | | 0m40.243s | 0m1.228s | 0m0.160s | | 0m39.176s | 0m1.208s | 0m0.136s | | 0m42.999s | 0m1.172s | 0m0.328s | +-----------+----------+----------+

As the configuration of MySQL might be different I can't conclude that VirtualBox is about twice as slow compared to Docker for Mac using bind-mount. Moreover, I had migrated this virtual machine from Parallels which might have resulted in a less than optimal virtual disk layout.

Conclusion

The current version of Docker for Mac; Version 1.12.1 (build: 12133), is over 12 times slower when using a named mount compared to bind-mount when importing a MySQL database dump.

Discussion

It's not clear to me why a virtual disk is being used for storing data. Not only named and unnamed mounts use this disk, but it's also used for storing images, etc. As this virtual disk has a limit of 60G, which can be much lower than the space available on the host operating system, this can lead to confusion of users; they might get an "out of space" error message while they have more than enough space left on the hosts' hard drive. As all data is stored in a monolithic file there is quite a risk of accidental data loss. And of course, this makes recovery of specific files from backup cumbersome.