When a suspected al-Qaeda member was arrested in Berlin in May of 2011, he was found with a memory card with a password-protected folder—and the files within it were hidden. But, as the German newspaper Die Zeit reports, computer forensics experts from the German Federal Criminal Police (BKA) claim to have eventually uncovered its contents—what appeared to be a pornographic video called "KickAss."

Within that video, they discovered 141 separate text files, containing what officials claim are documents detailing al-Qaeda operations and plans for future operations—among them, three entitled "Future Works," "Lessons Learned," and "Report on Operations."

So just how does one store a terrorist’s home study library in a pirated porn video file? In this case the files had been hidden (unencrypted) within the video file through a well-known approach for concealing messages in plain sight: steganography.

Invisible ink

It has long been suspected that al-Qaeda has used steganography to hide its secrets—everything from maps and photos of potential targets to how-to manuals. Steganography (derived from the Greek for "concealed writing") is the practice of concealing a message to casual observers—the content is there in the open, and often unencrypted. The term can be applied to everything from old-school secret-agent tricks (e.g., messages written in invisible ink, microdots) to messages concealed in carefully manipulated network traffic—such as in Krzysztof Szczypiorski’s Hidden Communication System for Corrupted Networks (HICCUPS).

But in its most common modern digital form, steganography conceals plain text or whole files within an image, audio, or video file. The approach has been a favored way to pass messages through public discussion sites and bulletin boards, and was used by Anna Chapman and her bumbling ring of Russian spies to pass messages online.

In its simplest form, digital steganography can be accomplished (albeit poorly) just by opening up a JPEG file in a text editor and appending the text at the end of the content. The image will still render correctly in an image viewer or Web browser, though the file will be larger—and the "hidden" content would quickly become obvious upon analysis. Most steganography is done more subtly, using software tools that tweak the bits of the media file the message is concealed in.

The least significant bit

One approach used by steganography software is least significant bit substitution. To encode data into a "cover" file—the media file that will hide the secret message—the software will break up each byte of data to be hidden into individual binary bits. The values of those bits are then substituted for the least significant bit (the 0 or 1 that has the least impact on the value of a byte) in a sequence of bytes in the cover file. So, for example, the binary code for the letter "S"—01010011—could be hidden in eight bytes of data in an image file.

Before the steganography is applied, data that looks like this:

01110011 01110101 01110000 01100101 01110010 01101000 01100001 01110000

would be changed to this:

01110010 01110101 01110000 01100101 01110010 01101000 01101001 01110001

As you can see, in this case, only three out of eight bytes were actually changed. But even so, the subtle shifts in values could create a significant amount of "noise" in the target image, movie or audio file if they were packed closely together.





To spread changes out further and prevent easy detection, a steganography tool can use a set sequence for the changes—modifying every fifth byte, or eighth byte, for example. The bit substitutions could also be driven by a pseudorandom number generator or some other algorithm that varies how many bytes are skipped between changes. The only thing limiting the size of the increment between modified bits is the size of the cover file—the bigger the file relative to the message being concealed, the more easily the data can be hidden without obvious distortion.

Hiding in the cosine

Another way to hide data in images is to alter specific properties of pixels, rather than just changing arbitrary bits. The color of each pixel is determined by three "vector" values—in images that use the RGB "colorspace," each number represents the intensity of a particular color channel (how much red, green, or blue is added); in YCbCr-based images, there are two color vectors (Blue-to-Yellow and Red-to-Yellow) and a "luminance" vector (which determines the intensity of the color). The GPL, cross-platform SilentEye steganography tool, for example, manipulates the least significant bit of the luminance data for pixels within an image. It also provides password protection for content, and built-in AES encryption—to keep those files secure even if someone can squirrel them out of your porn collection.

These approaches to steganography are great if you’re not concerned about anyone mangling the content you’re encoding. But if you’re not in control of what form the cover content will take when it arrives—for example, if some intervening software decreases the quality level of a JPEG file, all that hard work the steganography tool did was for naught. That’s where more sophisticated tools come into play—tools that use variations in patterns in the data rather than in the data itself. For example, the discrete cosine transform (DCT) coefficients used for JPEG compression can be manipulated in such a way as to encode data into an area of an image where it will survive compression, resizing, and even cropping.

The same sort of manipulation can be used in audio as well—random changes to audio files are much more obvious to human ears than the "noise" of least-significant bit substitution is to the eye. By making subtle manipulations to the waveform of audio, steganography tools can insert data in places where it is least obvious.

Finding bits in the haystack

The task of trying to find steganography data within images is a continuous arms race. With data being concealed in increasingly sophisticated ways, the task of catching a file containing hidden data is like searching for a needle in a haystack—first, you have to know what the needle looks like.

One approach increasingly used to try to detect steganography data is the use of application "fingerprint" data—artifacts and patterns in files that show they’ve been manipulated by steganography tools. Backbone Security, for instance, has a steganography fingerprint database that contains identifying information for 1,050 digital steganography apps. And that database is being integrated into real-time scanners that can sit at the edge of a network and watch for signs of manipulated image files that may contain stolen data or other sensitive information.

Of course, every now and then, law enforcement gets lucky, and an obvious target in the form of a porn video lands in their laps.