Hal Pomeranz, Deer Run Associates

Many years ago, I started this series of blog posts documenting the internals of the EXT4 file system. One item I never got around to was documenting how directories were structured in EXT. Some recent research has caused me to dive back into this topic, and given me an excuse to add additional detail to this EXT4 series.

If you go back and read earlier posts in this series, you will note that the EXT inode does not store file names. Directories are the only place in traditional Unix file systems where file name information is kept. In EXT, and the classic Unix file systems it is evolved from, directories are simply special files that associate file names with inode numbers.

Furthermore, in the simplest case, EXT directories are just sequential lists of file entries. The entries aren't even sorted. For the most part, directory entries in EXT are simply added to the directory file in the order files are created in the directory.

Let's create a small directory as an example:

$ <strong>cd /tmp</strong> $ <strong>mkdir testing</strong> $ <strong>cd testing</strong> /tmp/testing $ <strong>touch this is a simple directory</strong> $ <strong>ls</strong> a directory is simple this

When I list the directory with "ls", the file names are displayed in alphabetical order. But when I look at things in my trusty hex editor, you can see the directory entries in their actual order:

I have also used highlighting to show the fields in several directory entries. Every directory must start with the "." and ".." entries. These links point to the directory itself (".") and the parent directory ("..").

Each directory entry contains five fields:

Byte 0-3: Inode number 4-5: Total entry length 6 : File name length 7 : File type 8- : File name

Directory entries are variable length, and the second field tracks the total length of the entry. Entries must be aligned on four byte boundary. So in the case of the "." entry, the total entry length is 12 bytes, even though the entry could fit into 9 bytes. Field three is the length of the file name- one byte in this case. The extra three bytes in the directory entry just contain nulls.

In classic Unix file systems, the file name length field was two bytes. But since Linux doesn't allow file names longer than 255 characters, the EXT developers decided to use only one byte for the file name length and to use the second byte to hold the file type. While the file type is also stored in the inode, having this information in the directory entry is more efficient for operations like "ls -F" where the file type is displayed along with the file name.

File type is a numeric field defined as follows:

o: Unknown 1: Regular file 2: Directory 3: Character special device 4: Block special device 5: FIFO (named pipe) 6: Socket 7: Symlink

In the case of the "." and ".." links, the file type is "2", which is "directory". This is the only case where Unix file systems allow hard links to directories.

Finally, note the entry length field of the final file entry in the directory. The entry size is 0x0FB4, or 4020 bytes. This consumes all remaining bytes to the end of the block. The last directory entry is always aligned to the end of the block. Directory entries may not cross block boundaries.

Deleting Files

Now let's observe what happens when I delete the file "simple" from our example directory:

This is the standard behavior when files are deleted in classic Unix file systems. The entry before the deleted file simply grows to consume the "deleted" entry. But the entry for the deleted file is otherwise unchanged and can be carved for.

However, this unused "slack" space from deleted directory entries can be reused when new files are added to the directory. For example, here's what happens when I add a file named "new" to our example directory:

Large Directories

One of the consistent criticisms of classic Unix file systems has been that as directories get large, performance suffers. If directories are nothing more than unsorted lists of files, searching for the entry you want means sequentially parsing a large number of directory entries. As the directory grows, the average search time increases linearly.

More modern file systems solve this issue by organizing directory entries in some sort of searchable data structure. For example, NTFS uses B-trees. Starting in EXT3, developers created a hashed tree system, dubbed "htree", for organizing directory entries. This is now standard in EXT4 and can be seen when the directory grows larger than a single block.

Here's the first block of my /usr/share/doc directory:

Things start getting really interesting after the first 24 bytes used by the "." and ".." entries. The rest of the block is used by the dx_root structure that defines the root of the htree. Technically, the "." and ".." entries are part of dx_root as well and the data structure consumes the entire first block, but the interesting fields start 24 bytes into the block:

Byte 0-23 : "." and ".." entries 24-27: Reserved (zero) 28 : Hashing algorithm used 29 : Size of dx_entry records (normally 8) 30 : Depth of tree 31 : Flags (unused, normally 0) 32-33: Max dx_entry records possible 34-35: Actual number of dx_entry records to follow 36-39: Relative block number for "zero hash" rest : dx_entry records

After four null bytes at offset 24-27, a single byte specifies the hash algorithm used by the htree. The byte codes are documented here, but 0x01 seems to be the standard, which is a hashing algorithm based on MD4.

Next comes the size of the dx_entry records, which are used to index the various blocks in the htree. These records are always 8 bytes long and will be described further below.

Bytes 32-33 document the maximum number of dx_entry records that can be stuffed into this block after the initial fields of the dx_root structure. The dx_entry records start 40 bytes into the block, so you might assume that this max value is the block size of 4096 bytes, minus the 40 bytes of dx_root fields, divided by the 8 byte size of dx_entry records. That would get you a max value of 507. The actual value is 0x01FC or 508 because the "zero hash" entry in bytes 36-39 counts as an extra dx_entry record.

Given that a single dx_root block can index over 500 htree blocks, and that those blocks can contain hundreds of file name entries, it is rare for an htree to ever need more than a single level. So in practice, the "depth of tree" byte at offset 30 is always 0x00, indicating a flat tree. The specification does allow for a nested tree, but I've never seen one in a real world application.

Bytes 34-35 are the actual number of dx_entry records to follow, again counting the "zero hash" record in bytes 36-39 as one of the dx_entry records. Each dx_entry record is a four byte hash value followed by a four byte relative block offset from the beginning of the directory file. For clarity, let's take a look at the first several dx_entry records from this example in tabular form:

Hash value Block offset "zero hash" 1 0x0F3FFEA2 16 0X1D8171F2 8 0X2C3E5760 15 0X39989908 4 ...

The dx_entry records are a lookup table sorted by hash value. The initial "zero hash" entry means that all files whose names hash to values less than 0x0F3FFEA2 can be found in block number 1 of the directory file. File names with a hash value greater than or equal to 0x0F3FFEA2 but less than 0x1D8171F2 can be found in block 16 of the file, and so on. The dx_entry records are sorted by hash value so that the EXT file system code can do binary search to find the appropriate block offset more quickly.

After the last dx_entry record, the rest of the block is slack space. Typically, this slack space contains directory entries from when the directory was small enough to fit into a single block. For example, right after the end of the last dx_entry record, you can see most of an original entry for a subdirectory (file type 0x02) named "alsa-utils". Then there's an entry for another subdirectory named "anacron" at inode 0x000C14D7 (791767). Typically, you will also find live entries for these directories in whatever htree block they got hashed into. But it's possible that these directories were later deleted, and that these entries in the directory slack may be the only record of their existence.

The rest of the directory file is "leaf blocks" of the htree. These blocks are full of normal directory entries and are simply read sequentially. The htree format was specifically designed for backwards compatibility, so that older code could still do a normal sequential search through the directory entries.

If you're thinking about carving for deleted directory files, be aware that directory files larger than one block are usually fragmented. The EXT block allocation algorithm prefers to put files into the same block group with their parent directory. By the time a directory grows larger than a single block, the nearby blocks have all been consumed.

Hal Pomeranz is an independent Digital Forensic Analyst and Expert Witness. He thinks that any day spent looking at a hex editor is a good day.