When working as a systems administrator, you’ll always end up having to solve a file system full error in a hurry. Here are a few commands and hints to help you get out of it quickly on a UNIX like system.



df command

The df (disk free) command will report how much of each file system is used. You can combine this with with the sort command to get the fullest file systems at the end :

kattoo@roadrunner ~/Downloads $ df | sort -n -k 5,5 Filesystem 1K-blocks Used Available Use% Mounted on shm 2029696 0 2029696 0% /dev/shm /dev/sda6 1968588 3232 1865356 1% /tmp udev 10240 228 10012 3% /dev /dev/sda12 367628876 28899532 338729344 8% /files /dev/sda7 1989820 246300 1743520 13% /var /dev/sda9 1999932 260996 1738936 14% /usr/portage /dev/sda5 244176 52264 191912 22% / /dev/sda8 9990188 3161340 6828848 32% /usr /dev/sda11 99947736 32910756 67036980 33% /files/vservers /dev/sda10 3989912 1699372 2290540 43% /usr/portage/distfiles

The sort arguments are :

-n : order numerically

-k 5,5 : order by the 5th field

du command

When you found out that a specific file system is full, you need to quickly analyze which folders are the biggest.

You can use the du (disk usage) command. This command will report the space used by a directory and sub-directories. I usually run something like :

kattoo@roadrunner /usr/portage $ du -sk * | sort -n | tail -20 5711 x11-misc 6056 app-misc 6224 dev-ruby 6320 app-dicts 7219 profiles 7428 net-analyzer 7686 app-text 8112 media-plugins 8777 kde-base 8883 media-libs 8943 sys-apps 9764 dev-util 9986 media-sound 10955 dev-libs 11295 dev-python 11479 dev-java 11645 net-misc 17553 dev-perl 112556 metadata 1705976 distfiles

This will show the 20 largest files or directories. du will display the size of each directories and sub-directories by default. The -s switch will make du display only the grand total for each arguments.

According to the results, you might want to go on repeating this command in the sub-directories, until you’ve narrowed down enough to find the culprit.

find command

The file system full error is more often than not due to a faulty script or program which logs are running wild due to a bug. This will usually lead to a huge file somewhere in a directory, hogging all the available space on the file system.

An easy way to spot it is to use the find command. The following snippet for example will search for files bigger than 10MB in the current directory and sub-directories and print the 10 largest ones :

kattoo@roadrunner /usr/portage $ find . -type f -size +10000000c -exec ls -l \{} \; | sort -n -k 5,5 | tail -10 -rw-rw-r-- 1 portage portage 23416703 Oct 1 10:07 ./distfiles/samba-3.0.37.tar.gz -rw-rw-r-- 1 portage portage 28749072 Jan 18 2008 ./distfiles/extremetuxracer-0.4.tar.gz -rw-rw-r-- 1 portage portage 46905557 Oct 16 17:20 ./distfiles/firefox-3.5.4.source.tar.bz2 -rw-rw-r-- 1 portage portage 46914620 Dec 2 05:32 ./distfiles/firefox-3.5.6.source.tar.bz2 -rw-rw-r-- 1 portage portage 50380241 Oct 2 11:47 ./distfiles/VirtualBox-3.0.8-53138-Linux_amd64.run -rw-rw-r-- 1 portage portage 50595281 Nov 17 10:44 ./distfiles/VirtualBox-3.0.12-54655-Linux_amd64.run -rw-rw-r-- 1 portage portage 59368714 Aug 8 02:10 ./distfiles/gcc-4.3.4.tar.bz2 -rw-rw-r-- 1 portage portage 61494822 Sep 10 00:34 ./distfiles/linux-2.6.31.tar.bz2 -rw-rw-r-- 1 portage portage 125384668 Oct 1 13:58 ./distfiles/qt-x11-opensource-src-4.5.3.tar.gz -rw-rw-r-- 1 portage portage 314942420 Jun 18 2008 ./distfiles/sauerbraten_2008_06_17_ctf_edition_linux.tar.bz2 kattoo@roadrunner /usr/portage $

I also like to filter out already compressed files, in order to collect the biggest files which I could compress to save some space with something like :

find . -type f -size +10000000c \! -name "*.Z" \! -name "*.gz" \! -name "*.bz2" -exec ls -l \{} \;

Tip : Set this as an alias in your profile by adding this to your .bashrc (if you’re using bash … otherwise check your shell documentation) :

alias bigfiles='find . -type f -size +10000000c \! -name "*.Z" \! -name "*.gz" \! -name "*.bz2" -exec ls -l \{} \; | sort -n -k 5,5 | tail -10'

This will give you the 10 largest not-yet-compressed files in a single handy command.

Caveat

“Too many arguments” error :

If you are working with directory holding an important number of files, you might get a “Too many arguments” error when using a star expansion (like in the du -sk * example above).

This is actually not the du command complaining, but the shell which gets too much data when expanding the star. When this happens, you are usually better off using the find command as explained above.

Another possibility is to pipe the output of find to the xargs command. Basically xargs will take everything on the standard input and give it as arguments to the specified command.

Deleting / compressing a file which is still open

Also beware of this : if you delete or compress a file which is still being open by a process, then the space this file use won’t get freed before the process actually closes the file (I’ll explain why in a different post, this is an interesting topic on its own 🙂 ).

If you deleted or compressed the file (so basically the big file disappeared or was replaced by a compressed version), but the space doesn’t get freed (which you can check with df ), then you can bet that a process is still holding the file open. You can spot this with tools like lsof or fuser. Those tools vary greatly according to which variant of Unix you’re running. On IBM’s AIX, fuser had a handy -d option to spot files on the file system which have a link count of 0 and it will report the PID of the attached processes.

Better check those tools’ man pages before you run into this situation !

A word to the wise

Those recipes will help you to find a way out when you are already facing a file system full problem. The best is of course to avoid them in the first place … The following ideas are a good start :

Set alerts to have an early warning and let you deal with it before applications start crashing. Tools like Nagios would be your friends here, but home made scripts running from cron and sending emails might be enough.

to have an early warning and let you deal with it before applications start crashing. Tools like Nagios would be your friends here, but home made scripts running from cron and sending emails might be enough. Check the trends : You can use tools like cacti to graph the space occupation of your file systems over the time. This will let you anticipate when you’ll need to add more disks or if your log rotation and/or file archiving policies are adequate.

More ideas ? Tips to share ? Hit the comments !