The ability to analyze a firmware image and extract data from it is extremely useful. It can allow you to analyze an embedded device for bugs, vulnerabilities, or GPL violations without ever having access to the device.

In this tutorial, we’ll be examining the firmware update file for the Linksys WAG120N with the intent of finding and extracting the kernel and file system from the firmware image. The firmware image used is for the WAG120N hardware version 1.0, firmware version 1.00.16 (ETSI) Annex B, released on 08/16/2010 and is currently available for download from the Linksys Web site.

The first thing to do with a firmware image is to run the Linux file utility against it to make sure it isn’t a standard archive or compressed file. You don’t want to sit down and start analyzing a firmware image only to realize later that it’s just a ZIP file:

OK, it’s nothing known to the file utility. Next, let’s do a hex dump and run strings on it:

Taking a look at the strings output, we see references to the U-Boot boot loader and the Linux kernel. This is encouraging, as it suggests that this device does in fact run Linux, and U-Boot is a very common and well documented boot loader:

However, taking a quick look at the hexdump doesn’t immediately reveal anything interesting:

So let’s run binwalk against the firmware image to see what it can identify for us. There are a lot of false positive matches (these will be addressed in the up-coming 0.3.0 release!), but there are a few results that stand out:

Binwalk has found two uImage headers (which is the header format used by U-Boot), each of which is immediately followed by an LZMA compressed file.

Binwalk breaks out most of the information contained in these uImage headers, including their descriptions: ‘u-boot image’ and ‘MIPS Linux-2.4.31’. It also shows the reported compression type of ‘lzma’. Since each uImage header is followed by LZMA compressed data, this information appears to be legitimate.

The LZMA files can be extracted with dd and then decompressed with the lzma utility. Don’t worry about specifying a size limit when running dd; any trailing garbage will be ignored by lzma during decompression:

We are now left with the decompressed files ‘uboot’ and ‘kernel’. Running strings against them confirms that they are in fact the U-Boot and Linux kernel images:

We’ve got the kernel and the boot loader images, now all that’s left is finding and extracting the file system. Since binwalk didn’t find any file systems that looked legitimate, we’re going to have to do some digging of our own.

Let’s run strings against the extracted Linux kernel and grep the output for any file system references; this might give us a hint as to what file system(s) we should be looking for:

Ah! SquashFS is a very common embedded file system. Although binwalk has several SquashFS signatures, it is not uncommon to find variations of the ‘sqsh’ magic string (which indicates the beginning of a SquashFS image), so what we may be looking for here is a non-standard SquashFS signature inside the firmware file.

So how do we find an unknown signature inside a 4MB binary file?

Different sections inside of firmware images are often aligned to a certain size. This often means that there will have to be some padding between sections, as the size of each section will almost certainly not fall exactly on this alignment boundary.

An easy way to find these padded sections is to search for lines in our hexdump output that start with an asterisk (‘*’). When hexdump sees the same bytes repeated many times, it simply replaces those bytes with an asterisk to indicate that the last line was repeated many times. A good place to start looking for a file system inside a firmware image is immediately after these padded sections of data, as the start of the file system will likely need to fall on one of these aligned boundaries.

There are a couple interesting sections that contain the string ‘sErCoMm’. This could be something, but given the small size of some of these sections and the fact that they don’t appear to have anything to do with SquashFS, it is unlikely:

There are some other sections as well, but again, these are very small, much too small to be a file system:

Then we come across this section, which has the string ‘sqlz’ :

The standard SquashFS image starts with ‘sqsh’, but we’ve already seen that the firmware developers have used LZMA compression elsewhere in this image. Also, most firmware that uses SquashFS tends to use LZMA compression instead of the standard zlib compression. So this signature could be a modified SquashFS signature that is a concatination of ‘sq’ (SQuashfs) and ‘lz’ (LZma). Let’s extract it with dd and take a look:

Of course, ‘sqlz’ is not a standard signature, so the file utility still doesn’t recognize our extracted data. Let’s try editing the ‘sqlz’ string to read ‘sqsh’:

Running file against our modified SquashFS image gives us much better results:

This definitely looks like a valid SquashFS image! But due to the LZMA compression and the older SquashFS version (2.1), you won’t be able to extract any files from it using the standard SquashFS tools. However, using the unsquashfs-2.1 utility included in Jeremy Collake’s firmware mod kit works perfectly:

Now that we know this works, we should go ahead and add this new signature to binwalk so that it will identify the ‘sqlz’ magic string in the future. Adding this new signature is as easy as opening binwalk’s magic file (/etc/binwalk/magic), copy/pasting the ‘sqsh’ signature and changing the ‘sqsh’ to ‘sqlz’:

Re-running binwalk against the original firmware image, we see that it now correctly identifies the SquashFS entry:

And there you have it. We successfully identified and extracted the boot loader, kernel and file system from this firmware image, plus we have a new SquashFS signature to boot!