Build a Python app for parsing shared memory dumps

Use the struct utility to extract system data for analysis

Memory dumps reveal the recorded state of working memory at a specific point in operation. They are an important tool in system administration because they provide "forensic" evidence of the system's condition.

Before you begin

For the instructions, code samples, and code download (see Download, below) in this article, I used Python version 2.4, which you can download from the Python site (see Related topics below). You may get different results with other versions.

Before you start, make sure you are familiar with the following:

The /dev/shm implementation of traditional shared memory

Viewing shared memory data dumps manually on a Linux system

Certain dependencies (Linux file open, read, write, close concepts; using the file descriptor and the modes that the file can be opened in; basic Python structure concepts)

GNU/Linux, in general

Understanding shared memory dumps in Linux

/dev/shm is an implementation of the traditional shared memory concept. It is a widely used and accepted means of passing data between programs. In /dev/shm, one program or daemon creates a memory portion that other processes (at relevant permission levels) can access. This is a quick and easy method of sharing data between processes.

Each program creates its own file; in my examples, I use the file name devmem located at /dev/shm/devmem.

Viewing the shared memory dump manually on Linux

You cannot view the shared memory files (commonly referred as shm files) by using the cat utility generally used for file display in Linux since these shm files are in a binary format. They will look like a chunk of garbled characters if you try to view them with generic file-viewing methods. I use the hexdump ) utility to read the mem files and view them in a readable format; other utilities are available for this purpose.

For this article, the usage pattern for hexdump looks like this:

hexdump <optional switches> /dev/shm/devmem for <switches> supported

See Related topics for a link to more information on hexdump .

Defining the scenario

The scenario we'll work with is a network sniffer that analyzes the packets received by the host and stores the data in a shared mem file, /dev/shm/devmem. This data contains information about the packet received.

The file looks generally like this:

The memory file storage is /dev/shm/devmem

The devmem file format contains: 4 bytes of source address to notify who sent it 4 bytes of destination address to notify who is it going to 2 bytes of source port (in other words, the port on source that the packet used) 2 bytes of destination port (similarly, the port on destination that the packet will use) 2 bytes of protocol (the protocol that the packet is a part of) 4 bytes of time to indicate the time stamp that the packet was seen by the network snippet

1 record length = sum of the devmem specs (that is, 18 bytes)

The maximum size of memory file is 1KByte so it can contain 1024 bytes (1024 / 18 = 56 records)

If you hexdump and display the file manually on a Linux terminal, it will look something like this:

Listing 1. Displaying a dump file

# hexdump /dev/shm/devmem 0000000 0004 0000 0400 0000 fc64 0a00 00fb e000 0000010 14e9 14e9 0011 0000 0000 0000 0000 0000 0000020 0000 0000 0000 0000 0000 0000 0000 0800 0000030 1668 0000 0000 0000 0032 0000 0000 0000 0000040 0000 0000 0001 e000 0000 0000 0002 0000 0000050 0000 0000 0000 0000 0000 0000 0000 0000 0000060 0000 0000 0000 0800 0100 0000 0000 0000 0000070 0008 0000 0000 0000 fc64 0a00 fd64 0a00 0000080 2328 03ea 0006 0000 0000 0000 0000 0000 0000090 0000 0000 0000 0000 0000 0000 0000 0800 00000a0 7700 0001 0000 0000 0040 0000 0000 0000 00000b0 fd64 0a00 fc64 0a00 03ea 2328 0006 0000 00000c0 0000 0000 0000 0000 0000 0000 0000 0000 00000d0 0000 0000 0000 0800 0a00 0000 0000 0000 00000e0 0040 0000 0000 0000 fc64 0a00 fd64 0a00 00000f0 2328 03ec 0006 0000 0000 0000 0000 0000 0000100 0000 0000 0000 0000 0000 0000 0000 0800 0000110 7700 0001 0000 0000 0040 0000 0000 0000

Let's look at the steps involved in parsing the file.

Parsing the dump file

The steps to understanding the data in a memory dump file (identifying the format, parsing, and reading the file) are relatively simple:

Open the file. Read the bytes with the file descriptor. Convert data to a readable string format when necessary. Verify whether the buffer that has been read is intact and whether it has any truncations or errors. Unpack the data from the buffer. Extract the information. Print the data. Build a loop to do steps 1 to 7 on each record in a shared data dump. (You don't want to do it manually, do you?)

Let's go through the process flow in more detail.

Open the file

To open a shared memory file, use the general form fd = open(fileName, mode) . fd is the file descriptor, a pointer to the file. For this example, use the following:

fileName: /dev/shm/devmem

mode: rb (read only in binary mode)

Listing 2. Opening a shared memory file

fd = open('/dev/shm/devmem ','rb')

Read the bytes

To read the bytes using the file descriptor obtained in the previous function call, I use the following code. It reads the indicated number of bytes from the file parameter passed:

Listing 3. Opening a shared memory file

def ReadBytes(self, fd, noBytes): ''' Read file and return the number of bytes as specified ''' data = fd.read(noBytes) return data buffer = ReadBytes('/dev/shm/devmem ', 18) # Pass the file name and pass the number of bytes # Number of bytes is 18 since in the example scenario each record # is of length 18

Here, reading the bytes is not enough to extract the necessary information; it returns a buffer if the string is read. It needs to be parsed and converted to an understandable string format.

Convert the data

Python struct s can be used to handle binary data stored in files or from network connections, among other sources. The Python struct has two broad functionalities: pack and unpack .

The job of pack is to return a string containing the values v1 , v2 , ... packed according to the given format. The arguments must exactly match the values required by the format.

The role of unpack is to unpack the string (presumably packed by pack(fmt, ...) ) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format: len(string) must equal calcsize(fmt) .

The acceptable formats:

1-byte formats: b for signed char B for unsigned char

2-byte formats: h for short integer H for unsigned short

4-byte formats: l for long L for unsigned long

8-byte formats: q for long long Q for unsigned long long



For the other formats supported for packing and unpacking the buffer bytes, refer to the Python literature listed in Related topics.

Verify the buffer

To verify that the buffer of 18 bytes that has been read is intact and that it does not have any truncations or errors, you can use the calcsize function to check if the byte size is still 18 as expected when read. You can use the Python assert function for this purpose.

Listing 4. Verifying that the buffer is correct

self.assertEqual(len(buffer), struct.calcsize('llllh')) # 4 l's is 4*4 bytes = 16 bytes and h is 2 bytes so that is 18 bytes # we could use QQh which is 2*8 + 2 = 18 bytes as well

Unpack the data

Now that you have verified the buffer is indeed 18 bytes, you can unpack your data from the buffer. struct provides a helpful unpack_from function that provides the number of bytes, buffer name, and the offset at which it needs to be read:

struct.unpack_from(fmt, buffer[, offset=0])

Extract the details

In our scenario, these are the details we want to extract:

Listing 5. Details to be extracted

sourceAddress = (struct.unpack_from('B', buffer,0), struct.unpack_from('B', buffer,1), struct.unpack_from('B', buffer,2), struct.unpack_from('B', buffer,3)) destinationAddress = (struct.unpack_from('B', buffer,4), struct.unpack_from('B', buffer,5), struct.unpack_from('B', buffer,6), struct.unpack_from('B', buffer,7)) sourcePort = (struct.unpack_from('B', buffer,8), struct.unpack_from('B', buffer,9)) destinationPort = (struct.unpack_from('B', buffer,10), struct.unpack_from('B', buffer,11)) protocolUsed = (struct.unpack_from('B', buffer,12), struct.unpack_from('B', buffer,13)) timeStamp = (struct.unpack_from('B', buffer,14), struct.unpack_from('B', buffer,15), struct.unpack_from('B', buffer,16), struct.unpack_from('B', buffer,17))

Note: Depending on the platform and whether the mem structure is big endian or little endian, you may need to swap the order in which bytes are read.

Print the output

Now that you have the unpacked values from the binary buffer that you read, you can use the standard print commands to get the output necessary.

Listing 6. Printing the details

print "sourceAddress =" , (struct.unpack_from('B', buffer,0),struct.unpack_from('B', buffer,1), struct.unpack_from('B', buffer,2),struct.unpack_from('B', buffer,3)) print "destinationAddress = " , (struct.unpack_from('B', buffer,4),struct.unpack_from('B', buffer,5), struct.unpack_from('B', buffer,6),struct.unpack_from('B', buffer,7)) print "sourcePort = " , (struct.unpack_from('H',buffer,8)) print "destinationPort = " , (struct.unpack_from('H',buffer,10)) print "protocolUsed = " , (struct.unpack_from('H',buffer,12)) print "timeStamp = " , (struct.unpack_from('B', buffer,14),struct.unpack_from('B', buffer,15), struct.unpack_from('B', buffer,16),struct.unpack_from('B', buffer,17))

The expected output from Listing 6 should be in this format:

Listing 7. Output from printing

sourceAddress = ((192,), (168,), (10,), (102,)) destinationAddress = ((207,), (168,), (1,), (103,)) sourcePort = (11299,) destinationPort = (11555,) protocolUsed = (256,) timeStamp = ((1,), (12,), (0,), (1,))

Automate the process for all the records

Now, to read and print all the records from the entire shared memory file, create a loop:

Listing 8. Creating a loop to read and print all records

for element in range (0,56): #loop 18 since we know the file size and #the record length: 1024/18 = 56 records buffer = ReadBytes('/dev/shm/devmem ', 18) self.assertEqual(len(buffer), struct.calcsize('llllh')) sourceAddress = struct.unpack_from('B', buffer,0), struct.unpack_from('B', buffer,1), struct.unpack_from('B', buffer,2), struct.unpack_from('B', buffer,3)) destinationAddress = struct.unpack_from('B', buffer,4), struct.unpack_from('B', buffer,5), struct.unpack_from('B', buffer,6), struct.unpack_from('B', buffer,7)) sourcePort = struct.unpack_from('B', buffer,8), struct.unpack_from('B', buffer,9) destinationPort = struct.unpack_from('B', buffer,10), struct.unpack_from('B', buffer,11)) protocolUsed = ,struct.unpack_from('B', buffer,12), struct.unpack_from('B', buffer,13)) timeStamp = struct.unpack_from('B', buffer,14), struct.unpack_from('B', buffer,15), struct.unpack_from('B', buffer,16), struct.unpack_from('B', buffer,17)) print "sourceAddress = " , struct.unpack_from('B', buffer,0), struct.unpack_from('B', buffer,1), struct.unpack_from('B', buffer,2), struct.unpack_from('B', buffer,3)) print "destinationAddress = " , struct.unpack_from('B', buffer,4), struct.unpack_from('B', buffer,5), struct.unpack_from('B', buffer,6), struct.unpack_from('B', buffer,7)) print "sourcePort = " , struct.unpack_from('H',buffer,8)) print "destinationPort = " , struct.unpack_from('H',buffer,10)) print "protocolUsed = " , struct.unpack_from('H',buffer,12)) print "timeStamp = " , struct.unpack_from('B', buffer,14), struct.unpack_from('B', buffer,15), struct.unpack_from('B', buffer,16), struct.unpack_from('B', buffer,17))

That's all there is to it! We parsed a known format of binary mem dump in Linux, and used struct s from Python to read the binary data dump and display it in a readable format.

Downloadable resources

Related topics