010. Triggers to Explore

Now that we know how to spot them, how do we go about actually triggering them? Almost any action could potentially leak EMI but some actions are more likely than others. Typically actions that draw or shift around large amounts of power tend to be the most detectable when oscillated. Any actions that can shift a sub-components’ clock speed or suddenly have to drive a lot of data around can also be fruitful.

Let's look at a real target and see what we can find in terms of side channels that are useful for exfiltrating data. For the target, we ordered a run of the mill Dell Precision 3430 workstation without a wireless chipset, 8Gb of 2666MHz DDR4 and a Radeon Pro WX 3100 running Ubuntu 18.04 LTS with a 5.1.11 kernel.

Let’s try a simple test, writing lots of things to memory:

#include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> inline void clflush(void* ptr) { asm volatile ("clflush (%0)" :: "r"(ptr)); } inline uint64_t qpc() { unsigned long a, d; asm volatile ("rdtsc" : "=a" (a), "=d" (d)); return a | ((uint64_t)d << 32); } uint64_t approx_100ms() { uint64_t sum = 0; for(uint64_t i = 0; i < 10; i++) { uint64_t start = qpc(); usleep(100 * 1000); // microseconds sum += qpc() - start; } return sum/10; } int main(void) { uint64_t ticks_per_100ms = approx_100ms(); volatile uint64_t *addr = (volatile uint64_t*)malloc(sizeof(uint64_t)); while(1) { //on time uint64_t target_end_time = qpc() + (ticks_per_100ms * 5); // end 500ms from now while(qpc() < target_end_time) { *addr += 1 ^ *addr; // write to memory clflush((void*)addr); // flush cache lines } // off time target_end_time = qpc() + (ticks_per_100ms * 3); // end 300ms from now while(qpc() < target_end_time); } }

This program will constantly write to the same memory address and flush the cache lines at a fixed rate. If we look through the spectrum with our antenna we’ll see various areas where we can see signs of that pattern, especially around the cellular ranges.

This signal itself is very quiet and at a frequency of over 1000 Mhz so it has limited penetration potential. You won’t be exfiltrating data more than a few feet with this one. However, it has been demonstrated that when combined with wide vector instructions it can be turned into a useful tool.

Let’s take a look at a more interesting example. Graphics cards these days can suck up a lot of power but they aim to be efficient in doing so by scaling power draw with performance requirements. This behavior is typically completely invisible to the end user outside of maybe the sound of a fan spinning up. Facilities vary from vendor to vendor as to how to adjust relevant power mode thresholds and as the machine we ordered has an ATI based GPU, we will be focusing on that.

The amdgpu driver exposes its power management interfaces conveniently through sysfs. For me as there is also an embedded Intel GPU, the card comes in as /sys/class/drm/card1 and the device subfolder contains all the power management related control files.

Precision-Tower-3430:/sys/class/drm/card1$ ls device/pp* -l -r--r--r-- 1 root root 4096 Feb 24 09:17 device/pp_cur_state -rw-r--r-- 1 root root 4096 Jan 29 13:08 device/pp_dpm_mclk -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_dpm_pcie -rw-r--r-- 1 root root 4096 Feb 24 09:52 device/pp_dpm_sclk -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_force_state -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_mclk_od -r--r--r-- 1 root root 4096 Feb 24 09:17 device/pp_num_states -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_od_clk_voltage -rw-r--r-- 1 root root 4096 Jan 29 12:30 device/pp_power_profile_mode -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_sclk_od -rw-r--r-- 1 root root 4096 Feb 24 09:17 device/pp_table

The pp_dpm_sclk file defines what the currently available shader clocks are available to the power management system and which are currently enabled.

$ cat device/pp_dpm_sclk 0: 214Mhz 1: 734Mhz 2: 921Mhz 3: 1018Mhz 4: 1098Mhz 5: 1147Mhz 6: 1183Mhz 7: 1219Mhz

Lets try a test where we put the GPU under some load with glmark2 and shift between the lowest two power states which would correspond to the 734Mhz and 214Mhz clocks.

#!/bin/bash DEVICE=/sys/class/drm/card1/device glmark2 -b :duration=10000& # Create some GPU load. # set performance control to manual. echo "manual" > $DEVICE/power_dpm_force_performance_level while true do echo "HIGH" echo "1" > $DEVICE/pp_dpm_sclk sleep 0.5 echo "LOW" echo "0" > $DEVICE/pp_dpm_sclk sleep 0.5 done

When the 214Mhz clock is enabled, we can absolutely pick it up at multiples of 214Mhz with 428Mhz being the loudest for our configuration.

This is a great carrier! It is very loud over background and is a nice low frequency of 428Mhz which allows for great signal penetration. In my tests I was able to pick this particular signal up from over 50ft away through a wall. This gives us the ability to on-off key messages one bit at a time, but that is quite slow and we can do much better. The amdgpu driver also lets you configure the actual clock values themselves in 1Mhz increments. So let’s write a script to do just that and step through the lowest 5 frequencies dwelling on each one for half a second:

#!/bin/bash glmark2 -b :duration=10000& # Create some GPU load. DEVICE=/sys/class/drm/card1/device echo "manual" > $DEVICE/power_dpm_force_performance_level while true do echo "s 0 214 700" > $DEVICE/pp_od_clk_voltage echo c > $DEVICE/pp_od_clk_voltage echo 0 > $DEVICE/pp_dpm_sclk sleep 0.5 echo "s 0 215 700" > $DEVICE/pp_od_clk_voltage echo c > $DEVICE/pp_od_clk_voltage echo 0 > $DEVICE/pp_dpm_sclk sleep 0.5 echo "s 0 216 700" > $DEVICE/pp_od_clk_voltage echo c > $DEVICE/pp_od_clk_voltage echo 0 > $DEVICE/pp_dpm_sclk sleep 0.5 echo "s 0 217 700" > $DEVICE/pp_od_clk_voltage echo c > $DEVICE/pp_od_clk_voltage echo 0 > $DEVICE/pp_dpm_sclk sleep 0.5 echo "s 0 218 700" > $DEVICE/pp_od_clk_voltage echo c > $DEVICE/pp_od_clk_voltage echo 0 > $DEVICE/pp_dpm_sclk sleep 0.5 done

By shifting this shader clock 1Mhz at a time at the lower limit we can see the side channel also start to jump around in the frequency domain:





Not only can we control the duration of a transmission to encode data but now we can start to form an alphabet using a technique called sequential multiple frequency shift keying to encode a lot more data per transmission! We can even vary the rate at which we shift from frequency to frequency to further pack additional data.

What we’ve covered so far is only looking at the presence of a signal at a given frequency and not necessarily at what data if any is encoded in the narrow signal. If we further demodulate this signal using the familiar amplitude modulation (AM) and frequency modulation (FM) techniques we can gain even more information.

For instance, let’s pin the transmission to a fixed frequency and cycle through some different GPU workloads:





You can clearly hear a correlation between different scenes and this may be the highest throughput way to encode data onto this particular side channel. If we look at the frequency domain of some of these transmissions we can clearly see a delineation:

You may think that decoding this type of modulation by hand is difficult, and indeed it is. Luckily, computers are more than capable of solving those hard problems for you by applying machine learning models to decode those types of signals into a more usable form. Performing this type of demodulation requires relatively good signal strength. While I was able to pick up the presence of the carrier from over 50 feet away, demodulating it with AM and FM left me with mostly static.