SHARE

After an extended delay, the Linux File System fsck testing results can now be presented. The test plan has changed slightly from our kickoff article previous article. We will review it at the beginning of the this article, followed by the actual results. Henry Newman will be reviewing the results and writing some observations in the next article in this series. As always we welcome reader feedback and comments. FSCK Testing Plan It has been a while since we started the fsck project to test fsck (file system check) times on Linux file systems. The lengthy delay in obtaining the results is due to the lack of hardware for testing. The original vendor could not spare the hardware for testing. A number of other vendors were contacted and due to various reasons none of them could provide the needed hardware for many, many months if at all. In the end, Henry used his diplomatic skills to save the day, persuading Data Direct Networks to help us out. Paul Carl and Randy Kreiser from DDN contacted me and agreed to provide remote access to the hardware (thank you, DDN!). Paul used a DDN SFA10K-X with 590 disks that are 450GB, 15,000 rpm SAS disks. He used a 128KB chunk size in the creation. From these disks he created a number of RAID-6 pools using an 8+2 configuration (8 data disks and 2 parity disks). Each pool is a LUN that is 3.6TB in size before formatting. The LUNS were presented to the server as disk devices such as /dev/sdb1, /dev/sdc1, /dev/sdd1, ..., /dev/sdx1 for a total of 23 LUNs of 3.6TBs each. This is a total of 82.8 TBs (raw). The LUNs were combined using mdadm and RAID-0 to create a RAID-60 configuration using the following command: mdadm -- create /dev/md1 -- chunk=1024 -- level=0 -- raid-devices=23 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1 /dev/sdx1 The result was a file system with about 72TB using " df -h " or 76,982,232,064 bytes from " cat /proc/partitions ". A second set of tests were run on storage that used only 12 of the 23 LUNs. The mdadm command is, mdadm -- create /dev/md1 -- chunk=1024 -- level=0 -- raid-devices=12 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 The resulting file system for this configuration is about 38 TBs using " df -h ". The server used in the study is a dual-socket, Intel Xeon system with Nehalem processors (E5520) running at 2.27 GHz and an 8MB L3 processor cache. The server has a total of 24GB of memory, and it was connected to the storage via a Qlogic Fibre Channel FC8 card connected to an FC switch that was connected to the storage. The server ran CentOS 5.7 (2.6.18-274 kernel). The stock configuration was used throughput the testing except for one component. The e2fsprogs package was upgraded to version 1.42, enabling ext4 file systems larger than 16TB to be created. This allows the fcsk performance of xfs and ext4 to be contrasted. Building the file systems was done close to the default behavior that many system admins will adopt -- using the defaults. The commands for building the file systems are: XFS: /sbin/mkfs.xfs -f /dev/md1

EXT4: /sbin/mke2fs -t ext4 -F /dev/md1 Mounting the file systems involved a little more tuning. In the case of XFS, I used the tuning options as stated by Dell, XFS -- rw,noatime,attr2,nobarrier,inode64,noquota . In the case of ext4, the mounting options used are defaults,data=writeback,noatime,barrier=0,journal_checksum. The journal checksum was turned on within ext4 since I like this added behavior.

Filling the File System One of the keys to the testing is how the file system is filled. This can be a very time consuming process because you must create all of the files in some sort of order or fashion. For this testing, fs_mark was used. Ric Wheeler at Red Hat has been using it for testing file systems at very large scales (over 1 billion files). Fs_mark wasn't used for testing the file system in this article, but rather, it is used to fill the file system in a specific fashion. It uses one or more base directories and then creates a specified number of subdirectories underneath them that are filled with files. You might think of this as a single-level of subdirectories. It is much more complicated to create specific subdirectory depths and number of files since that configuration depends on the specific users and situation. You could also use some sort of random approach with the hope that a random distribution approximates a real-world situation. It is virtually impossible to have a representative file system tree that fits most general situations, and the single-level deep directory tree used here should represent one extreme of file systems -- a single subdirectory level. One of the nice features of fs_mark is that it is threaded so that each thread produces its own unique directory structure with a single layer of subdirectories underneath a base directory that contains a fixed number of files. Fs_mark also allows you to specify the number of files per thread so that you can control the total number of files. Although the server has eight total cores, running eight threads (one per core) it resulted in the OS swapping. When the number of threads is reduced to three, the server did not swap, and the file creation rate was much faster than running eight threads with swapping. Using three threads causes some issues because it is an odd number. This made it impossible to determine an integer number of files per thread, as using the old file counts was not possible. The number of files per thread was changed to a reasonable integer number that is close to the original numbers of 100,000,000, 50,000,000, and 10,000,000. The numbers chosen were: 105,000,000, 51,000,000, and 10,200,000. The goal for all fs_mark commands was to fill the file system to the specified number of files while filling about 50 percent of the file system. The following fs_mark command lines were used to fill the file system for 72TB: ./fs_mark -s 400000 -L 1 -S 0 -n 35000000 -D 35000 -N 1000 -t 3 -k -d /mnt/test

./fs_mark -s 800000 -L 1 -S 0 -n 17000000 -D 17000 -N 1000 -t 3 -k -d /mnt/test

./fs_mark -s 4000000 -L 1 -S 0 -n 3400000 -D 3400 -N 1000 -t 3 -k -d /mnt/test The commands for filling the 38TB file systems were: ./fs_mark -s 200000 -L 1 -S 0 -n 35000000 -D 35000 -N 1000 -t 3 -k -d /mnt/test

./fs_mark -s 400000 -L 1 -S 0 -n 17000000 -D 17000 -N 1000 -t 3 -k -d /mnt/test

./fs_mark -s 2000000 -L 1 -S 0 -n 3400000 -D 3400 -N 1000 -t 3 -k -d /mnt/test Notice that the number of files per directory is a constant ( -N 1000 or 1,000 files). After the file system was filled using fs_mark, it was unmounted, and the file system check was run on the device. In the case of xfs, the command is, /sbin/xfs_repair -v /dev/md1 For ext4, the file system check was, /sbin/e2fsck -pfFt /dev/md1 Notice that the device /dev/md1 was the target in both cases.