Presentation of CPU Governor

The modern x86 processors provide two mechanisms to reduce power consumption when idle. Useful for a portable PC, it can also help for physical servers with limited workload.

However, don’t think about virtual machines, they are not concerned.

CPU Idle States

In the x86 architecture several CPU states, called C-states, have been defined, allowing systems to save power by decreasing CPU functionalities. These C-states are broadly similar across processors but the exacts details may vary.

C0 : The processor is working as usual.

: The processor is working as usual. C1 : The processor doesn’t execute any instruction but can start working again without any delay.

: The processor doesn’t execute any instruction but can start working again without any delay. C2 : The processor is stopped but keeps the complete state of its registers and caches. A delay is necessary for the processor to start working again.

: The processor is stopped but keeps the complete state of its registers and caches. A delay is necessary for the processor to start working again. C3: The processor is sleeping and doesn’t keep its caches. A longer delay than in C2 state is needed for the processor to start working again.

CPU Freq

CPU Freq or CPU speed scalling is a way to reduce power consumption by adjusting the clock speed of the processor.

Five CPU Freq governors are available in RHEL 7:

performance : with this static governor you get the highest possible clock frequency but without any power saving benefit. It is best suited for heavy workload when the CPU is almost never idle.

: with this static governor you get the possible clock frequency but without any power saving benefit. It is best suited for heavy workload when the CPU is almost never idle. powersave : here your CPU gets the lowest possible clock frequency but at the cost of the lowest CPU performance. One serious drawback of this static governor happens when the system experiences unexpected high loads: in this case it can consume more power than other governors with higher clock frequencies. For this reason, it is not recommended to use it except when overheating is a problem.

: here your CPU gets the possible clock frequency but at the cost of the lowest CPU performance. One serious drawback of this static governor happens when the system experiences unexpected high loads: in this case it can consume more power than other governors with higher clock frequencies. For this reason, it is to use it except when is a problem. ondemand : this dynamic governor sets the maximum clock frequency when system load is high and minimum clock frequency when idle, but without intermediate state. This allows the system to adjust power consumption according to system load but at the expense of latency . In case switches between idle and heavy workloads happen too often, performance and power saving can suffer. Otherwise, it is one of the best options.

: this dynamic governor sets the maximum clock frequency when system load is high and minimum clock frequency when idle, but without intermediate state. This allows the system to adjust power consumption according to system load but at the expense of . In case switches between idle and heavy workloads happen too often, performance and power saving can suffer. Otherwise, it is one of the best options. userspace : with this governor, any process running as root can set the clock frequency.

: with this governor, any process running as root can set the clock frequency. conservative: this dynamic governor is very similar to the ondemand governor except it adjusts clock frequency more gradually. Instead of choosing between maximum and minimum clock frequencies, it selects the clock frequency according to usage. This more granular approach provides significant power saving but at the expense of an ever greater latency than the ondemand governor.

Installation Procedure

Install the kernel-tools package to get access to the cpupower command:

# yum install -y kernel-tools

Note: The kernel-tools package and the cpupower command are not strictly needed to manipulate CPU governors. They only provide a convenient interface. All operations can be done using the /sys/devices/system/cpu/ path and the echo command.

Basic Operations

To get the list of the various idle states supported by the CPU number 0 (available idle states are given for various types of servers), type:

# cpupower idle-info CPUidle driver: intel_idle CPUidle governor: menu analyzing CPU 0: ... Available idle states: POLL C1E-ATM C2-ATM C4-ATM C6-ATM (Atom CPU N2800) Available idle states: POLL C1-IVB C1E-IVB C3-IVB C6-IVB (Xeon CPU E3-1245 V2) Available idle states: POLL C1-NHM C1E-NHM C3-NHM C6-NHM (Core i5 CPU M 430) Available idle states: POLL C1-HSW C1E-HSW C3-HSW C6-HSW C7s-HSW C8-HSW C9-HSW C10-HSW (Celeron 2961Y) ...

To get the same information for all the CPUs on a server, type:

# cpupower -c all idle-info

Note: The -c option can be replaced with –cpu.

To get the list of the available governors for the CPU number 0, type:

# cpupower frequency-info -g analyzing CPU 0: available cpufreq governors: conservative userspace powersave ondemand performance

Note: The -g option can be replaced with –governors.

To get all the details about the available CPU frequencies for the CPU number 0, type:

# cpupower frequency-info analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 10.0 us hardware limits: 1.20 GHz - 2.27 GHz available frequency steps: 2.27 GHz, 2.27 GHz, 2.13 GHz, 2.00 GHz, 1.87 GHz, 1.73 GHz, 1.60 GHz, 1.47 GHz, 1.33 GHz, 1.20 GHz available cpufreq governors: conservative userspace powersave ondemand performance current policy: frequency should be within 1.20 GHz and 2.27 GHz. The governor "conservative" may decide which speed to use within this range. current CPU frequency: 1.20 GHz (asserted by call to hardware) boost state support: Supported: yes Active: yes 1900 MHz max turbo 2 active cores 1900 MHz max turbo 1 active cores

Note1: Without the -c option, only the information about the CPU number 0 is displayed.

Note2: The governor conservative is the current configuration.

To change the governor to performance for all the CPUs, type:

# cpupower frequency-set -g performance Setting cpu: 0 Setting cpu: 1 Setting cpu: 2 Setting cpu: 3

Note1: Without the -c option, all the CPUs are affected.

Note2: To only change for the CPUs number 0, 1 and 2, type: # cpupower -c 0-2 frequency-set -g performance

Note3: The -g option can be replaced with –governor.

# cpupower -c all frequency-info ... current policy: frequency should be within 1.20 GHz and 2.27 GHz. The governor "performance" may decide which speed to use within this range. ...

To specify the governor userspace for the CPU number 1 with a CPU frequency of 1.2GHz, type:

# cpupower -c 1 frequency-set -f 1.2 Setting cpu: 1 # cpupower -c 1 frequency-info -p analyzing CPU 1: current policy: frequency should be within 1.20 GHz and 2.27 GHz. The governor "userspace" may decide which speed to use within this range. # cpupower -c 1 frequency-info -f analyzing CPU 1: current CPU frequency: 1199000 (asserted by call to kernel) # cpupower -c 1 frequency-info --hwlimits analyzing CPU 1: hardware limits: 1.20 GHz - 2.27 GHz

Note: Only the governor userspace allows frequencies to be set.

Standard Operations

All commands seen previously don’t persist after reboot. The standard way to set up CPU governor in a persistent way is through the tuned daemon and the governor directive (alternatively, using rc.local only works if the tuned service is disabled).

If we look at the throughput-performance tuned profile (/usr/lib/tuned/throughput-performance/tuned.conf), we can see:

... [cpu] governor=performance energy_perf_bias=performance min_perf_pct=100 ...

Note: The energy_perf_bias directive allows software on supported Intel processors to more actively contribute to determining the balance between optimum performance and saving power.

Therefore, if you want to define a specific configuration, create a new tuned profile with the tuned inheritance mechanism in the /etc/tuned directory (see the tuned tutorial):

[main] include=throughput­-performance [cpu] ...

Behaviour Analysis

To get a better understanding of the way your system behaves, install the powertop package:

# yum install -y powertop

Then, run the powertop command:

# powertop PowerTOP 2.3 Overview Idle stats Frequency stats Device stats Tunables Summary: 528.4 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 4.1% CPU use Usage Events/s Category Description 2.2 ms/s 177.5 Interrupt PS/2 Touchpad / Keyboard / Mouse 445.2 us/s 44.4 Timer tick_sched_timer 3.4 ms/s 39.2 Interrupt [27] nvkm 318.1 us/s 29.2 Timer hrtimer_wakeup 137.7 us/s 19.4 Process [rcu_sched] ...

By pressing the tab key, you get access to different kinds of information (more details are available in the powertop manual):

the Overview tab displays a general state of the system.

tab displays a general state of the system. the Idle stats tab presents the CPUs and GPUs currently loaded in the system in relationship with their C-states.

tab presents the CPUs and GPUs currently loaded in the system in relationship with their C-states. the Frequency stats tab shows the P-states of a system in relationship with the idle state.

stats tab shows the P-states of a system in relationship with the idle state. the Device stats tab presents the list of devices in the system that consume the most power.

tab presents the list of devices in the system that consume the most power. the Tunables tab lists the devices that are present on the system. You can tune the system to be power friendly by toggling each item from bad to good.

Source: RHEL 7 Power Management Guide.

Additional Resources

To go further, you can explore the following articles about: