Ansible is a great tool for configuration management but because of the way it’s designed a common complaint is that it’s not as fast as other tools like Salt, Chef or Puppet. This is because Ansible doesn’t have an agent that listens (although it can) on a host and uses a different type of deployment methodology that is based on SSH. This post isn’t about the pros and cons of each tool, but rather about ways to improve upon Ansible’s default configuration values. By default Ansible ships with very conservative default values. This is smart in my opinion because it offers greater compatibility out-of-the-box. Here I highlight some safe adjustments that can be made to the default configuration for improved performance (speed!)

Real World Playbook Test

For this test I’m using a real-world playbook that I use in my homelab when provisioning a new CentOS VM. It configures some basic things (hostname, ssh keys, etc), installs common packages/utilities and tunes some OS configurations.



The VM I’m running the playbook from is on a Centos 7 VM running on an ESXi 6.5 Host. The playbook will be running against 12 target VMs. The VMs it will be talking to are on the same VMNetwork. The Ansible VM has 4 vCPUs and 8GB of ram.

Before tuning Ansible, we’ll need to gather some metrics about how each playbook run performs. Fortunately in Ansible v2.0 and higher there are two built in callbacks that can be enabled: timer and profile_tasks Timer will output the total playbook run time, similar to running the time command before an ansible-playbook command. The second and more interesting of the two IMO is profile_tasks . This callback displays a nice summary of each TASK and how long it took to execute. To enable these settings edit (or create) an ansible.cfg file. You can check to see if you already have an Ansible config file by running:

1 2 3 $ ansible --version ansible 2.5.3 config file = /home/directory/ansible/ansible.cfg

This tells you the location of the configuration file that Ansible uses and the version. If you don’t see a config file listed you can create one in the directory where your playbooks will be run.

We’re going to add the following line to the config file under the [defaults] subsection:

1 2 [defaults] callback_whitelist = timer, profile_tasks

I’m running the follwing playbook command:

1 ansible-playbook init_centos.yml -e @group_vars/vault.yml -- limit vms

Here’s the output from the playbook run with the default configuration:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Friday 08 June 2018 16:04:29 -0400 (0:00:16.486) 0:02:04.805 *** ================================================================= TASK | Install packages ---------------------------------- 20.73s TASK | Start filebeat and enable service ----------------- 16.49s TASK | Install filebeat ----------------------------------- 6.15s Gathering Facts ------------------------------------------- 5.76s TASK | Install rpms for Spacewalk / RHN ------------------- 5.15s checkmk : TASK | Copy Checkmk Agent Listener -------------- 2.81s checkmk : TASK | Copy Checkmk Agent ----------------------- 2.52s TASK | Copy Influxdata repo (for Telegraf) ---------------- 2.22s TASK | Ensure mount directories exist --------------------- 2.21s TASK | Copy Telegraf config ------------------------------- 2.18s TASK | Copy filebeat config template ---------------------- 2.17s TASK | Copy user ssh/config ------------------------------ 2.16s TASK | Update /etc/services file -------------------------- 2.14s TASK | Set /etc/hostname ---------------------------------- 2.13s TASK | Disable SELinux (Centos 7) ------------------------- 2.10s TASK | Copy ssh keys -------------------------------------- 2.09s TASK | Install prowl -------------------------------------- 2.09s TASK | Copy .bash_logout for user ------------------------- 2.08s TASK | Copy .bashrc for user ------------------------------ 2.08s TASK | Copy iTerm2 bash shell integration for user -------- 2.07s Playbook run took 0 days, 0 hours, 2 minutes, 4 seconds

The important line here is the last one: Playbook run took … 2 minutes, 4 seconds That’s 124 seconds. Not terrible, but if you’re deploying to a large number of machines (say 50 or 100) those minutes can quickly add up.

Let’s start making some configuration tweaks and see if we can speed things up.

Enable SSH Pipelining

To enable SSH pipelining, add this to your ansible.cfg file under the [defaults] heading:

1 pipelining = True

From the Ansible manual: Enabling pipelining reduces the number of SSH operations required to execute a module on the remote server, by executing many ansible modules without actual file transfer.

Let’s run the same playbook again but with this configuration option set and see what happens:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Friday 08 June 2018 16:07:19 -0400 (0:00:23.585) 0:01:56.055 *** ================================================================= TASK | Start filebeat and enable service ----------------- 23.58s TASK | Install packages ---------------------------------- 16.75s TASK | Install filebeat ----------------------------------- 6.17s Gathering Facts ------------------------------------------- 5.50s TASK | Install rpms for Spacewalk / RHN ------------------- 4.61s checkmk : TASK | Copy Checkmk Agent Listener -------------- 2.33s checkmk : TASK | Copy Checkmk Agent ----------------------- 2.26s TASK | Set /etc/hostname ---------------------------------- 1.91s TASK | Copy Influxdata repo (for Telegraf) ---------------- 1.90s TASK | Copy ssh/config for user --------------------------- 1.88s TASK | Copy ssh keys for user ----------------------------- 1.87s TASK | Copy .bash_logout for user ------------------------- 1.83s TASK | Copy Telegraf config ------------------------------- 1.82s TASK | Update /etc/services file -------------------------- 1.82s TASK | Copy .bashrc for user ----------------------------- 1.82s TASK | Copy Telegraf environment default ------------------ 1.81s TASK | Copy .bashrc for root ------------------------------ 1.80s TASK | Install prowl -------------------------------------- 1.79s TASK | Install prowl API key ------------------------------ 1.77s TASK | Copy .bash_logout for root ------------------------- 1.77s Playbook run took 0 days, 0 hours, 1 minutes, 55 seconds

Here we can see that the play run completed 9 seconds faster. Not bad. Let’s see if we can tweak it some more.

Reduce poll interval to 5s

The default poll interval is set to 15 seconds. This is how often Ansible will check on task that’s running and decide if it can proceed. Let’s set it to 5 seconds and see what happens. Add or edit this line in the ansible.cfg file, again under the [defaults] heading:

1 poll_interval = 5

Let’s run it:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Friday 08 June 2018 16:10:09 -0400 (0:00:13.277) 0:01:46.888 *** ================================================================= TASK | Install packages ---------------------------------- 18.66s TASK | Start filebeat and enable service ----------------- 13.28s TASK | Install filebeat ----------------------------------- 5.61s Gathering Facts ------------------------------------------- 5.50s TASK | Install rpms for Spacewalk / RHN ------------------- 4.77s checkmk : TASK | Copy Checkmk Agent Listener -------------- 2.33s checkmk : TASK | Copy Checkmk Agent ----------------------- 2.19s TASK | Copy filebeat config template ---------------------- 2.01s TASK | Copy ssh/config for user -------------------------- 1.87s TASK | Copy Telegraf environment default ------------------ 1.86s TASK | Set /etc/hostname ---------------------------------- 1.86s TASK | Copy ssh keys for user ---------------------------- 1.84s TASK | Copy Influxdata repo (for Telegraf) ---------------- 1.84s TASK | Copy .bash_logout for root ------------------------- 1.80s TASK | Update /etc/services file -------------------------- 1.79s TASK | Install prowlnotify -------------------------------- 1.77s TASK | Copy .bashrc for root ------------------------------ 1.77s TASK | Copy .bash_logout for user ------------------------- 1.76s TASK | Copy .bashrc for user ------------------------------ 1.76s TASK | Copy sudoers file --------------------------------- 1.76s Playbook run took 0 days, 0 hours, 1 minutes, 46 seconds

It took 106 seconds to run the playbook that time. That’s 18 seconds faster than what we started with. Nice.

Let’s try another tweak and see if we can’t do even better.

Increase forks to 25

For my use case I’m increasing the number of simultaneous forks to 25 from the default value of 5. Again, Ansible ships with pretty sane defaults. We don’t want sane, we want fast. Let’s see how how this does:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Friday 08 June 2018 16:12:10 -0400 (0:00:17.528) 0:01:25.858 *** ================================================================= TASK | Start filebeat and enable service ----------------- 17.53s TASK | Install packages ---------------------------------- 10.29s TASK | Install filebeat ----------------------------------- 8.51s Gathering Facts ------------------------------------------- 3.77s TASK | Install rpms for Spacewalk / RHN ------------------- 3.01s checkmk : TASK | Copy Checkmk Agent Listener -------------- 1.59s TASK | Disable SELinux (Centos 7) ------------------------- 1.40s TASK | Update /etc/services file -------------------------- 1.39s TASK | Install prowlnotify -------------------------------- 1.39s checkmk : TASK | Copy Checkmk Agent ----------------------- 1.36s TASK | Copy sudoers file --------------------------------- 1.30s TASK | Copy Telegraf config ------------------------------- 1.30s TASK | Install treesize in /usr/local/bin ----------------- 1.29s TASK | Copy .bash_logout for user ------------------------- 1.27s TASK | Copy .bashrc for root ------------------------------ 1.27s TASK | Install prowl -------------------------------------- 1.25s TASK | Copy .bash_logout for root ------------------------- 1.23s TASK | Copy ssh/config for user --------------------------- 1.23s TASK | Copy filebeat config template ---------------------- 1.22s TASK | Copy Telegraf environment default ------------------ 1.22s Playbook run took 0 days, 0 hours, 1 minutes, 25 seconds

Very nice. Now we’re at 85 seconds. Remember, I’m running the exact same playbook just with new configuration values (options). This is very good but I think there’s more we can do.

Enable fact_caching

By enabling this value we’re telling Ansible to keep the facts it gathers in a local file. You can also set this to a redis cache. See the documentation for details.

Fact_caching is what happens when Ansible says, “Gathering facts” about your target hosts. If we don’t change our targets hardware (or virtual hardware) very often this can be very helpful. Enable it by adding this to your ansible.cfg file:

1 2 fact_caching = jsonfile fact_caching_connection = /tmp/.ansible_fact_cache

What happens when we run it now?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Friday 08 June 2018 17:25:14 -0400 (0:00:03.000) 0:01:15.530 *** ================================================================= TASK | Install packages ---------------------------------- 17.33s TASK | Install filebeat ----------------------------------- 4.46s TASK | Install rpms for Spacewalk / RHN ------------------- 3.87s Gathering Facts ------------------------------------------- 3.82s TASK | Start filebeat and enable service ------------------ 3.00s checkmk : TASK | Copy Checkmk Agent Listener -------------- 2.34s checkmk : TASK | Copy Checkmk Agent ----------------------- 1.47s TASK | Install prowl -------------------------------------- 1.40s TASK | Update /etc/services file -------------------------- 1.38s TASK | Install prowlnotify -------------------------------- 1.33s TASK | Set /etc/hostname ---------------------------------- 1.33s TASK | Ensure mount directories exist --------------------- 1.28s TASK | Copy iTerm2 bash shell integration for user -------- 1.27s TASK | Copy Telegraf environment default ------------------ 1.25s TASK | Copy Influxdata repo (for Telegraf) ---------------- 1.24s TASK | Copy .bash_logout for user ------------------------- 1.23s TASK | Copy .bashrc for user ------------------------------ 1.22s TASK | Copy .bashrc for root ------------------------------ 1.21s TASK | Disable SELinux (Centos 7) ------------------------- 1.20s checkmk : TASK | Create Checkmk Agent Unit ---------------- 1.20s **Playbook run took 0 days, 0 hours, 1 minutes, 15 seconds**

75 seconds. Very nice. These tweaks have made a huge difference.

Let’s recap

We’ve reduced our playbook run time from 2 minutes and 4 seconds down to 1 minute and 15 seconds. (184 seconds -> 75 seconds) That’s 40% less time to run the exact same playbook with just a few configuration tweaks.

By adding / editing these configuration values we were able to cut our playbook run time nearly in half. Now, these results aren’t going to be the same for everyone, every playbook or every environment. There are many factors that account for Ansible performance.

It’s clear, however, that modifying the defaults as we did here results in significant performance gains and can save you time on deployments.

( I’ll add a pretty table with summary here someday. )