FOSDEM09: "Aggressive" Linux power management

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

At FOSDEM 2009 (Free and Open Source Software Developers' European Meeting) in Brussels, there were a number of interesting talks about the state of power management in Linux. Matthew Garrett from Red Hat talked at length about aggressive power management for graphics hardware. People tend to forget that graphics hardware is more than a processor: it is not just the GPU that draws power, the graphics card's memory, outputs, and, of course, the displays themselves all draw power as well. Until now, most of the work on power management has focused on the GPU, but if you want really good power management, you have to attack the problem on all these fronts. And that's what Garrett is doing at Red Hat and shared in his FOSDEM presentation.

The power consumption of the GPU can be decreased by two techniques: clock gating or reclocking. Clock gating means that different bits of the chip are disconnected from the clock when not in use, and thus less power is drawn. However, this functionality has to be hardwired in the chip design and it must be supported in the graphics driver. And that's where Linux is still lagging behind, according to Garrett: "For a long time Linux graphics support has focused on getting a picture. We can go further now, but we just need the documentation to adapt the drivers." Clock gating has no negative effect whatsoever on the performance of the GPU.

Reclocking is another story: when the GPU is running at a frequency of 600 MHz and you reclock/underclock it to 100 MHz, this results in a massive reduction in power usage, but it also means that the performance is reduced accordingly. Garrett cited a difference of 5 W if clock gating and reclocking is combined on Radeon graphics hardware.

The second component that can be optimized is memory: each memory access draws power. So what can we do about power consumption of memory? Read less often (which is essentially reclocking) or read less memory. Reducing the memory clock can save you again around 5 W, but it introduces visual artifacts on the screen if reclocking happens while the screen is being scanned. The other interesting route (read less memory) comes down to compressing the framebuffer. Most recent Intel graphics chipsets implement this by a run length encoding (RLE) of the screen content on a line-by-line basis. Garrett notes that this means your desktop background can make a difference in battery life: vertical gradients compress very well using this scheme, but horizontal gradients do not.

Another interesting consequence of the memory component is that periodic screen updates are really bad for power consumption. According to Garrett, moving the mouse cursor around has an instantaneous increase of power consumption by 15 W. A blinking cursor draws 2 W, and also the display of seconds or a blinking colon in the system tray clock draws unnecessary power. Garrett adds philosophically: "The whole point of a blinking cursor is attracting your attention. But when you're typing, your attention is already going to your text input, and when you're not typing, it doesn't need your attention. So is it really needed to blink the cursor?"

The third component where power management can make a difference are the outputs. Just powering off an unneeded output port saves around 0.5 W. If you know for sure that you don't need the external output on your laptop, you can safely turn it off and gain a bit of battery time. However, if you need to connect an external monitor or video projector afterward, you will first need to power on the output port explicitly. It all comes down to a tradeoff between functionality and power consumption.

The last (but not least) component of graphics hardware is the displays. This is another place were reclocking can save some watts. For example, the LVDS (Low-voltage differential signaling) link to a laptop's LCD screen uses power at each clock transition. Reducing the refresh rate reduces the power consumption. While CRT screens begin to flicker if the refresh rate is too low, TFT's don't have this problem. According to Garrett, most TFT screens can be driven at 30 Hz, but then they tend to display visual artifacts. Garrett only recommends this LVDS reclocking when the screen is idle, which saves around 0.5 W. If the screen becomes active again, the system should return to a normal refresh rate of 60 Hz. Another solution is DPMS (Display Power Management Signaling): just turn off the screen when it's idle. Even a screensaver drawing a black screen draws power, while DPMS really turns off the output.

So what's the current state of this "aggressive power management"? Dynamic clock gating is implemented in most recent graphic cards. Future developments will implement even more aggressive dynamic state management: graphics hardware will power on functionality when the system needs it and power it off when it's not used. Graphics drivers and the operating system should control this without irritating the user. Garrett stresses that power management has to be as invisible as possible, otherwise the user will not be happy and stop caring about "green" computing. Garrett is now working on the Radeon code to get some prototype functionality. As it stands now, the combination of dynamic GPU and memory reclocking can save 10 to 15 W, and LVDS reclocking can save 0.5 W. For a laptop, this doesn't make a huge difference, but it is still a significant increase in battery life.

Power management in Nokia's next Maemo device

In the embedded track of FOSDEM, Peter De Schrijver of Nokia gave an insightful but very technical talk about advanced power management for OMAP3. This integrated chip platform made by Texas Instruments is based on an ARM Cortex A8 processor and has a GPU, DSP (digital signal processor) and ISP (image signal processor). Because the chip is targeted at mobile devices, some advanced power management functionality is built-in: the chip is divided in different voltage domains, and in each module the interface clock and functional clock can be turned off independently.

Nokia used an OMAP1 chip in the N770 internet tablet, and an OMAP2 chip in the N800 and N810 internet tablets. The devices use Nokia's Maemo platform, based on Debian GNU/Linux. Last year Nokia executive Ari Jaaksi revealed that their next Maemo device would use an OMAP3 chip. De Schrijver talked about the power management architecture of OMAP3, but also about the Linux support Nokia is developing for this functionality.

Power management on the OMAP3 can be subdivided into two types. On the one hand, there is active power management. It's essentially the same principle as reclocking in graphics hardware: with a lower clock frequency, the chip is running on a lower voltage, resulting in less power consumption. With dynamic voltage frequency scaling this can be handled automatically. In Linux, the frequency scaling of the CPU is implemented in the cpufreq driver, while for the core (the interconnects between different blocks of the chip and some peripherals) there is a new API call for drivers, named set_min_bus_tput() , which sets the minimum bus throughput needed by a device.

On the other hand, when the chip is idle, there are solutions such as clock control, which can be implemented in software (by a driver) or hardware (an auto idle function). Moreover, clocks of different modules of the chip can be turned off selectively: if the interface clock is off, the core can sleep; if the functional clock is off, the module can sleep. The implementation of clock control in the OMAP3 chip is done in the clock framework of the linux-omap kernel, and Nokia is adding the patches to linux-arm now.

The OMAP3 chip knows four power states per domain: "on", "inactive", "retention" and "off". In the "inactive" state, the chip works at normal voltage but the clocks are stopped, while in the "retention" state the chip works at a lower voltage. This means that the "inactive" state uses more power than the "retention" state, but has a lower wakeup latency. The shared resource framework (SRF) that determines the power state for each domain of the chip is implemented by Texas Instruments and is hidden from the driver programmer by an API. This API has to be implemented by the power management framework and has to be used by the drivers. The API documentation is not yet released, but De Schrijver said this will be added into the kernel Documentation directory soon.

The "off" mode has some challenges: while the power management framework can handle saving and restoring the state of the cpu, memory controller and other components, each driver has to handle its module. This means: reinitialize the module when the functional clock is enabled and save and restore the context and shadow registers in memory.

In his talk, De Schrijver also gave a status update of the work. The "retention" state works. Basic "off" mode works on most development boards; drivers are being adapted for "off" mode now and will be ready at the end of February. All this code is being merged in the linux-arm kernel tree, but eventually it will be merged in the mainline kernel. According to De Schrijver, all these power management techniques will be used in the next Nokia Maemo device: the long-awaited successor of the N810.