This article will summarise how we benchmark two or more versions of our software to determine which version runs faster. It may provide some insights for your benchmarking efforts, and we welcome further insights into it.

This article aims to collect and share the insights we have accumulated from benchmarking Freecell Solver which is a CPU and RAM intensive software application written in C. It is written from the point of view of benchmarking it on a modern GNU/Linux installation.

Points

Use the “sudo_renice” script sudo_renice is a shell wrapper for nice and ionice that runs the command under optimal resource utilisation. This way it runs faster and gives less variance in the results. It is reproduced here: #!/bin/bash sudo nice -n-20 ionice -c1 -n0 sudo -u "$USER" "$@"

Use the same machine and same operating system installation for both timings It is important to run both the "before" and "after" versions of the benchmark on the same physical computer, with the same system installation and in similar conditions - one after the other.

Make sure the benchmarked process is practically the only thing running We noticed that running an X environment with a resource-heavy desktop environment such as KDE Plasma, can slow down the program and skew the result. As a result, it is a good idea to stop X and use a virtual console or a remote shell such as ssh, and use a process monitor such as htop to make sure nothing else that consumes CPU or RAM is running (such as system services or daemons, or stale processes that were not killed).

Make sure the system is not overheated We noticed that once the computer becomes overheated, the CPU is being throttled and performance decreases. Make sure this is not the case by making use of the "sensors" command from lm_sensors, PowerTOP, and perhaps by waiting a little using the UNIX sleep command.

Run each benchmarked process several times Keep track of the results, and try to see which are generally (minimally, on average, etc.) faster. Also see some previous discussion of it on the Linux-IL mailing list.

Compile flags You should build both versions using CPU flags for maximal performance such as -O3 , -march=native , -flto , -fwhole-program , and possibly -fomit-frame-pointer . Profile-guided optimization may prove useful as well.