Introduction

Today, the vast majority of us rely on mobile devices every day. Over 95% of Americans own cellphones, nearly three quarters own desktop or laptop computers, roughly 50% own tablets, and one-in-five adults are “smartphone only” web surfers, indicating that they no longer have home broadband service, according to a recent study by the Pew Research Center. Our reliance on mobile devices makes their efficient power consumption ever more critical.

On Intel® Architecture-based platforms, Advanced Configuration and Power Interface (ACPI) S3 and S4 system states are often implemented to save energy when the system is not being used. However, bringing the system back to the active state can take from hundreds of milliseconds to tens of seconds. Because of this latency, newer Intel® System-On-Chip (SoC) releases introduced S0ix, which is a new set of sub-states for the ACPI S0 active state.

Using S0ix, the platform can achieve significant energy savings, similar to using S3, which can lead to longer battery life and less power consumption for mobile devices. When using S0ix, users will also experience lower latency than using S3 for an “instant on” experience in scenarios such as Audio Wake on Voice and Integrated Sensor Hub background sensing use cases. This paper provides a brief introduction of S0ix, how it works on Linux*, and how to debug S0ix-related issues.

This paper also describes how to:

Check if S0ix is supported on an Intel® SoC.

Perform basic debug if S0ix is not running properly.

Report issues if the user’s platform has a failure for the S0ix state.

What is S0ix

S0ix-states represent the residency in the Intel® SoC idle standby power states. The S0ix states shut off parts of the SoC when they are not in use, while still maintaining optimal performance. These states are triggered when specific conditions within the SoC have been achieved, for example, when certain components are in low power states.

From an ACPI-compatible-OS point of view, S0ix is an idle condition while still in “S0 active” state.

However, S0ix is not totally transparent to the OS. In order to enter the S0ix state, there are specific platform-dependent conditions the OS must meet. For this purpose, the ACPI 6.2 Specification introduced a Low Power S0 Idle Capable Flag in the Fixed ACPI Description Table (FADT). For x86 systems, this flag informs the OS whether an Intel® SoC has S0ix support or not.

From the hardware perspective, the SLP_S0# signal indicates that the system has entered the deepest S0ix state.

How to identify platform support for S0ix

To determine whether the platform supports S0ix or not, users must check the LOW_POWER_S0_IDLE_CAPABLE flag in the ACPI Fixed ACPI Description Table (FADT). The flag informs OSPM that the platform is able to achieve power saving in S0. In effect, when the bit “Low Power S0 Idle” is set to “1”, it indicates that the system supports S0ix. Otherwise, when the bit “Low Power S0 Idle” is set to “0”, it indicates that the system has disabled S0ix. Users can run the following shell script as root account to check:

#!/bin/bash cd /var/tmp/ acpidump -b iasl -d *.dat lp=$(grep "Low Power S0 Idle" /var/tmp/facp.dsl | awk '{print $(NF)}') if [ "$lp" -eq 1 ]; then echo "Low Power S0 Idle is" $lp echo "The system supports S0ix!" else echo "Low Power S0 Idle is" $lp echo "The system does not support S0ix!" fi

If the script returns that S0ix is not supported on the system, then users can check if the “Low Power S0 Idle Capability” option is available in the BIOS setup. If yes, users can enable it to let the kernel recognize the Low Power S0 Idle Capability. If no, the S0ix is NOT supported on the platform, and the steps in this paper cannot change that.

S0ix in Linux

In Linux, S0ix can be achieved in two ways:

Suspend-to-idle S0ix (also called S2Idle S0ix) uses the Linux system power management framework to put all devices into a low power state and then idles all processors.

To enable, set the default system sleep state to s2idle with the command:

~$ sudo echo s2idle > /sys/power/mem_sleep

Note: Do this only once per boot. If the kernel determines S0ix support exists, it may be the default already. Refer to mem_sleep for more details.

Then trigger suspend to idle with the command:

~$ sudo echo mem > /sys/power/state

This scenario does not require any special tuning.

Opportunistic S0ix (also called runtime S0ix) uses the Linux runtime power management framework to put all the idle devices into a low power state, so that S0ix can be achieved if there is no user activity. Note that runtime power management for some devices may not be enabled by default and must be explicitly tuned by users.

Suspend-to-idle S0ix

The Linux system power management framework provides interfaces to enter different system sleep states. The interfaces are widely used on Intel® SoCs to reach different ACPI sleep states. For example, S4 is entered when doing Linux suspend to disk, and S3 is entered when doing Linux suspend to memory.

Suspend to idle, also referred to as s2Idle, is a generic, pure software, lightweight variant of system suspend. It allows more energy to be saved relative to runtime by freezing user space, suspending timekeeping, and putting all I/O devices into low-power states (possibly lower-power than available in the working state). This enables the processors to spend time in their deepest idle states while the system is suspended. S2idle is always supported if the “CONFIG_SUSPEND” kernel configuration option is set.

The following figure shows the S2idle workflow in Linux, S0ix via Linux system PM framework. The box at the bottom shows the state where all processors are idle and waiting for a wakeup interrupt. On Intel® SoCs with low power S0 idle capability running on a kernel that supports S0ix, the device-suspend callbacks and other platform-specific kernel hooks solve S0ix constraints before idling the processors. The platforms are actually in the S0ix state when the system is in S2idle state.

Reaching S0ix residency using the Linux S2idle framework is easy. Enter one of these instructions:

~$ sudo echo s2idle > /sys/power/mem_sleep && sudo echo mem > /sys/power/state or ~$ sudo echo freeze > /sys/power/state

Opportunistic S0ix

Opportunistic S0ix means the system can reach S0ix residency automatically when the system is not in use. This is achieved by using the Linux runtime power management framework.

In theory, opportunistic S0ix is entered automatically, when the system is idle and devices have been put into low power state via the runtime PM framework. However, in practice, users must perform the following steps to reach S0ix at runtime:

Turn on auto power control for all the PCI devices with the command:

~$ sudo powertop --auto-tune

Turn off system display (for non-panel self refresh display):

~$ sudo export DISPLAY=”:0.0” ~$ sudo xset dpms force off

Unplug USB devices.

How to verify S0ix is working

Before verifying S2idle or Opportunistic S0ix residency, users can check if S2idle PC10 is available using the sysfs file or turbostat tool, which verifies CPU package readiness for the S0ix entry.

~$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us or ~$ sudo turbostat --show Pk%pc10

PC10 is entered only when the Pk%pc10 column shows a non-zero value during the S2idle state, as shown in the following example:

Pk%pc10 83.27 83.27

S2idle S0ix Entry

In Linux, there are several methods to verify that a user’s system has entered the S0ix state.

For the common case, there are two methods:

1. Use the Linux OS sysfs interface:

~$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us 70114822

The S0ix state is entered only when the low_power_idle_system_residency_us counter increases during the S2idle low power state.

2. Use an upstream tool, such as turbostat:

~$ sudo turbostat --show SYS%LPI echo freeze > /sys/power/state

The S0ix state is entered only when the SYS%LPI column shows a non-zero value after manually waking up from S2idle.

Alternatively, use RTC to automatically wake S2idle up with the following command:

~$ sudo turbostat --show SYS%LPI rtcwake -m freeze -s 15 rtcwake: wakeup from "freeze" using /dev/rtc0 at Sat Jun 30 04:00:38 2018 16.281752 sec SYS%LPI 84.39 84.39

For special cases, because of different Linux S0ix sysfs debug interfaces for different platforms, there are different solutions for the Intel® Atom™ platform and the Intel® Core™ platform.

On the Intel® Atom™ platform, check the Intel_telemetry sysfs file using the command:

~$ cat /sys/kernel/debug/telemetry/s0ix_residency_usec 13788081

The S0ix state is entered only when s0ix_residency_usec increases when reading from the SoC counter during the S2idle state.

Note: Make sure the following kernel options are configured for the telemetry sysfs interface support:

CONFIG_INTEL_PMC_IPC=y CONFIG_INTEL_PMC_CORE=y CONFIG_INTEL_IPS=m

On the Intel® Core™ platform, check the PMC debug sysfs file using the command:

~$ cat /sys/kernel/debug/pmc_core/slp_s0_residency_uses 40629300

The S0ix state is entered only when slp_s0_residency_usec counter increases during the S2idle state.

S2idle S0ix Wake Up

For S2idle S0ix wake up, all the basic S0 wake events are relevant. Events include (but are not limited to):

Connectivity: WLAN, voice

User Events: Power button, lid open/close

Device Insertion/Removal: USB connect/disconnect, SD card

Timer: RTC alarm

Input Devices: USB keyboard, PS2 keyboard, BT keyboard, touchpad

According to Linux kernel documentation, device capability for issuing wakeup events is a hardware matter, and the kernel is responsible for keeping track of it. However, whether a wakeup-capable device should issue wakeup events is a policy decision. This is managed in the user space through a sysfs attribute, the “power/wakeup” file. User space can write the "enabled" or "disabled" strings to indicate whether the device is supposed to signal system wakeup.

The initial value in the “power/wakeup” file is "disabled" for most devices. The major exceptions are power buttons, keyboards, and Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool.

For example, a user can enable a wakeup event for USB device insertion and removal using the commands:

~$ sudo echo enabled > /sys/bus/usb/devices/usb1/power/wakeup ~$ sudo echo enabled > /sys/bus/usb/devices/usb2/power/wakeup

Opportunistic S0ix

For a Non-PSR (Panel Self-Refresh) platform, before checking opportunistic S0ix, a user must enable all the PCI devices runtime power management settings using the powertop tool, turn off display screen, and unplug the USB devices. Refer to the following example:

Turn on all the PCI devices auto power control with the command:

NOTE: auto-tune is for opportunistic idle only.

~$ sudo powertop --auto-tune

Turn off system display:

~$ sudo export DISPLAY=”:0.0” ~$ sudo xset dpms force off

Unplug USB devices. Wait for 5 seconds, and verify the S0ix residency using either turbostat:

~$ sudo turbostat --show SYS%LPI

or using sysfs interface:

~$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us

Troubleshooting S0ix in Linux

S0ix has certain platform-specific constraints. There are some cases where the platform supports S0ix, however, S0ix residency cannot be established. This section describes some Best Known Methods (BKMs) to help users analyze the failure and provide necessary debug logs before reporting bugs.

Check BIOS Setting

Most commercial BIOS do not allow this configuration, but some BIOS can be configured to enable/disable S0ix capability. In this case, the “Low Power S0 Idle capability ” option must be enabled in the BIOS setup.

Typically, S0ix BIOS setup configuration settings can be shared between Windows* OS and Linux* OS. You may need additional BIOS option settings if a special ingredient device does not support S0ix entry.

Check Linux grub program

Ubuntu* 17.10 and older versions failed to execute grub on Intel® SoC with the ACPI FADT Low Power S0 idle bit was set. There was a known grub program issue and the root cause was determined to be the IPT/8254 clock timer. After enabling the Low Power Idle Capability option in BIOS setup, the IPT/8254 clock timer is disabled, which prevents the grub program from executing. To resolve this issue, patch the grub code as described at this link: http://lists.gnu.org/archive/html/grub-devel/2017-09/msg00019.html, or upgrade the OS to a newer version.

Check PC10 residency failure

If you cannot verify PC10 or only see bad PC10 residency with < 50%, try the following:

Check for any known BIOS issue limiting deep PCx residency entry by searching for a turbostat debug log with “pkg-cstate-limit” keyword:

~$ sudo turbostat --debug 2> tmp.log ~$ grep “pkg-cstate-limit” tmp.log

Check whether GFX DC9 requests and exits from dmesg log with drm.debug=0xe kernel parameter appended. Limit polling for GFX with drm_kms_helper.poll=0 kernel parameter appended. Check whether any device has a Latency Tolerance Report (LTR) issue. If yes, use the PMC debug driver "/sys/kernel/debug/pmc_core/ltr_ignore" to ignore the failed device’s LTR. (The PMC debug interface solution is applicable to the Intel® Core™ platform.) Refer to the following sample code:

#!/bin/bash counter=0 until [ $counter -gt 32 ] do echo $counter > /sys/kernel/debug/pmc_core/ltr_ignore echo "LTR ignore for" $counter rtcwake -m freeze -s 10 residency=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us) echo "residency is" $residency if [ $residency -gt 0 ]; then echo "Residency is non zero!" break fi ((counter++)) sleep 2 done

Check CPU C10 residency

After booting into Linux OS, verify that the intel_idle driver is present using the sysfs interface:

~$ sudo cat /sys/devices/system/cpu/cpuidle/current_driver

intel_idle

Next, check whether CPU C10 residency is observed. Either powertop or turbostat upstream utilities can be used. The following example uses turbostat and gets C10% column residency percentage:

~$ sudo turbostat --show sysfs --quiet sleep 10 10.010065 sec POLL C1 C1E C3 C6 C7s C8 C9 C10 POLL% C1% C1E% C3% C6% C7s% C8% C9% C10% 0 0 18 12 159 1 419 0 828 0.00 0.00 0.05 0.00 0.11 0.00 2.40 0.00 97.30 0 0 0 0 0 0 18 0 44 0.00 0.00 0.00 0.00 0.00 0.00 2.25 0.00 97.72 0 0 0 3 4 0 98 0 125 0.00 0.00 0.00 0.00 0.03 0.00 4.43 0.00 95.42 0 0 9 8 145 1 143 0 348 0.00 0.00 0.32 0.01 1.24 0.01 8.64 0.00 88.69 0 0 0 0 2 0 24 0 93 0.00 0.00 0.00 0.00 0.01 0.00 8.45 0.00 91.36 0 0 0 0 3 0 24 0 17 0.00 0.00 0.00 0.00 0.02 0.00 0.85 0.00 99.10 0 0 0 0 2 0 8 0 40 0.00 0.00 0.00 0.00 0.02 0.00 0.27 0.00 99.69 0 0 9 1 1 0 19 0 15 0.00 0.00 0.28 0.00 0.00 0.00 0.74 0.00 98.95 0 0 0 0 0 0 17 0 12 0.00 0.00 0.00 0.00 0.00 0.00 0.67 0.00 99.31 0 0 0 0 1 0 15 0 14 0.00 0.00 0.00 0.00 0.00 0.00 0.59 0.00 99.37 0 0 0 0 0 0 20 0 32 0.00 0.00 0.00 0.00 0.00 0.00 0.64 0.00 99.32 0 0 0 0 0 0 15 0 20 0.00 0.00 0.00 0.00 0.00 0.00 0.58 0.00 99.40 0 0 0 0 1 0 18 0 68 0.00 0.00 0.00 0.00 0.01 0.00 0.69 0.00 99.24

Check GFX Render C6 residency

Render C6 (RC6) is a key requirement to reach PC10.

First, make sure to download and install the latest DMC firmware that supports graphics low-power idle states. This firmware provides the capability to save and restore display registers across low-power states independently from the OS or kernel. DMC firmware is available here: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git.

There are several ways to check for RC6 residency: powertop, turbostat tool, and sysfs interface. The following examples use turbostat and sysfs file:

~$ sudo turbostat --Summary --show GFX%rc6 sleep 10 10.002439 sec GFX%rc6 100.01 ~$ cat /sys/class/drm/card0/power/rc6_residency_ms 8902938

If you have issues with RC6 residency after DMC firmware is installed, refer to the Bugzilla tracker here: https://bugs.freedesktop.org/

Check GFX DMC firmware load status

Verify that the latest GFX DMC firmware is loaded using the command:

~$ cat /sys/kernel/debug/dri/0/i915_dmc_info fw loaded: yes path: i915/kbl_dmc_ver1_04.bin version: 1.4 program base: 0x09004040 ssp base: 0x00002fc0 htp: 0x00b40068

If the latest firmware is not installed, the kernel dmesg log reports an error, which may be similar to the following examples:

[ 2.702834] [drm:intel_csr_ucode_init [i915]] Loading i915/kbl_dmc_ver1_04.bin [ 2.702852] i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_04.bin failed with error -2 [ 2.702854] i915 0000:00:02.0: Failed to load DMC firmware i915/kbl_dmc_ver1_04.bin. Disabling runtime power management. [ 2.702856] i915 0000:00:02.0: DMC firmware homepage: https://01.org/linuxgraphics/downloads/firmware

To fix this issue, download a newer version of DMC firmware from: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

Check PCH IPs PG status

For an Intel® SoC to achieve its lowest power platform idle state, it must meet a set of hardware preconditions. These are called constraints, which are generally related to individual device power state (e.g. D3). For each platform device in a low power S0 idle system, the platform idle state constraints are specified in terms of minimum D-state or Device-specific state. If the D-state is described as a constraint, the constraint is met if the device transitions to either the described D-state or to a deeper D-state.

In Linux, the sysfs debug interface /sys/power/pm_debug_messages file controls the printing of debug messages from the system suspend infrastructure to the kernel log. Writing a "1" to this file enables the debug messages and writing a "0" (default) to this file disables them.

For S2idle path S0ix, if you can get PC10 residency, but cannot get SLP_S0 residency, check whether all low power idle S0 constraints meet requirements during S2idle state.

Enable the pmc_debug_messages using the command:

~$ sudo echo 1 > /sys/power/pm_debug_messages

Then put the system into S2idle with the command:

~$ sudo echo freeze > /sys/power/state

Wait for 10 - 15 seconds, wake the system from S2idle, and check the dmesg log. Search the keyword “LPI” to find all the low power S0 idle constraint power state for the S0ix-qualified PCH IP devices.

If the dmesg log does not provide useful debug info, you can use the power management controller (PMC) debug sysfs for further investigation. For details, refer to Using Power Management Controller Drivers to Debug Low Power Platform States.

For opportunistic path S0ix, first do opportunistic S0ix tuning by running the “powertop --auto-tune” command. Next, use the PMC debug sysfs interface to get the PCH IPs PG status during runtime idle by:

~$ cat /sys/kernel/debug/pmc_core/pch_ip_power_gating_status

If you find that one or more PCH IP ingredients are not power-gated, disable them in BIOS setup as workaround to see if that affects SLP_S0 residency.

If you do not have a good PMC debug log to compare for the new Intel SoC, check for any driver errors or fail messages from dmesg log. For example, if an audio-related error is detected, try to eliminate the audio error by disabling HD-Audio support from BIOS setup, boot into Linux kernel, and double-check the S0ix residency.

If you are trying to file a bug in Linux kernel bugzilla, remember to attach the PMC log for future analysis.

Known Linux issues and emerging improvements

SATA DEVSLP

Up through Linux-4.17, SATA fails to assert DEVSLP, and as a result, prevents the system from getting into S0ix. This default can be changed with powertop --auto-tune, or manually with changing “Bad” items to “Good” in the Powertop “Tunables” tab.

Display OFF needed for S0ix

Currently, display must be OFF for opportunistic idle to reach PC10. However a Panel Self-Refresh (PSR) configuration may be able to reach PC10, even when the panel is on. As of Linux-4.17, this feature is in development.

Thunderbolt™ port

As of Linux-4.17, when devices are connected to a Thunderbolt™ port, PC10 may be reached, but currently S0ix cannot be reached.

TLP (Linux advanced power management)

The intent of the upstream Linux kernel is that it works “out of the box” for most users. However, many distributions employ a TLP, which can help or hurt power management, depending on how it is used.

Tuned

Red Hat* Linux is known for using tuned, which can override many power management policy settings.

NVMe PCI D3

Putting NVMe devices into the PCI D3hot low-power state through the standard PCI power management interface may not be sufficient for achieving S0ix (SLP_S0) residency on multiple platforms. For that reason, Linux 5.3 will handle NVMe devices in a special way in its suspend-to-idle flow, which is based on the so-called host-managed power state control.

However, that still may not be sufficient to achieve SLP_S0 residency if PCIe ASPM is not enabled for NVMe devices or if the ASPM policy is not sufficiently aggressive. For example, in order to achieve SLP_S0 residency on the XPS13 9380, the ASPM policy needs to be " default " or " powersupersave " (via /sys/module/pcie_aspm/parameters/policy ).

Therefore, on systems with NVMe that cannot get SLP_S0 residency, it is recommended to (a) run Linux 5.3 (when it is out - or newer) and (b) set the ASPM policy to " default " (BIOS-provided settings) or " powersupersave ". Note that on some systems, TLP configuration may need to be updated to prevent it from changing the ASPM policy when going from AC to battery power and the other way around.

How to validate S0ix in a test lab

For S0ix in Linux*, we used the following test case scenarios to validate:

S2idle S0ix

S2idle S0ix enters and exits latency should be < 2 seconds, as measured by sleepgraph tool

S2idle S0ix exits with different wake up source events -- verify enabled, verify they work

S2idle S0ix with high residency should be > 95% and on repeated experiments, should always be > 95%, as measured by turbostat

S2idle S0ix enters endurance testing to verify robustness -- 2,000 iterations

Power consumption measurement during S2idle S0ix

Opportunistic S0ix

Check all the PCI devices runtime power management state

Opportunistic S0ix enters and exits

Summary

This document describes the Intel® SoC low-power S0 idle status called S0ix, which can be used to save system energy and optimize system performance. Users can expect their PC to have low power idle capability and a longer battery life compared to the legacy S3 state. Users can also experience the responsive entry and exit of S0ix compared to the traditional S3 power model, which enables users to quickly resume and go back to work.

For users and testers, we recommend using the basic troubleshooting methods in this document to triage potential S0ix blockers. For further issues that need debug and resolution, please file a bug in the Linux kernel Bugzilla with the necessary debug log attached: https://bugzilla.kernel.org/

For kernel and driver developers, S0ix failure debug is complicated and covers hardware, firmware, and software. You are welcome to optimize the S0ix debug tools and support customers to use S0ix freely.

References