A quick intro to the Unix “find” utility

One of the most powerful Unix command-line utilities is “find” — but it also has a huge number of options, and most of the documentation I’ve read on “find” is hard to follow and understand. That’s a shame, because once you understand what “find” does and how it works, you can accomplish quite a bit. I hope that this post will show you some of the basics of “find”, so that you can take advantage of it in your day-to-day work.

The basic idea is that “find” looks through a directory (and all of its subdirectories), applying one or more filters when deciding which files are interesting, and executing one or more actions on matching files.

So, what can you do with “find”?

Move any backup log older than 30 days to /tmp/

Find all of the MP4 files larger than 100MB

Find all of the documents with either “doc” or “docx” extensions anywhere in your home directory

In a directory of text files, find those containing the phrase “budget” which have not been touched in the last 30 days

(In these examples, I’m going to use the GNU version of find, which is standard on Linux machines and available for the Mac via Homebrew. Note that if you use Homebrew on the Mac, then GNU “find” will be installed as “gfind” by default. Use the –with-default-names option to “brew install” if you want to avoid this prefix.)

Note: There is a big difference between “find” and “locate”, which are often confused for one another:

“find” looks for files according to a number of criteria, and performs an action on the files matching those criteria. The search takes place when you run the program.

“locate” uses a database (typically created with the “updatedb” command) for filenames matching a pattern, and returns those filenames.

So if you know that you have a file named “important.txt” somewhere on your system, then you probably want to use “locate” — assuming, of course, you have been updating your filename database on a regular basis, typically via “cron”.

If you don’t remember the name of the file, but do remember that you modified it in the last 14 days, and that it contains the phrase “very important”, then you can use “find”.

For example, let’s say that I just want to find all of the files in the current directory and all of its subdirectories. I can say:

find . -print

This means: Look at all files and directories in the current directory (.) and contained within its subdirectories, and then print them.

Now, in GNU find, both of these arguments are optional; you can just say

find

but I don’t recommend doing so, if only because it’s a bit ambiguous. Moreover, the longer version emphasizes that “find” looks through a directory, filters through the results (although we don’t have any filters here), and then executes something (in this case, “print”). The filters and actions are specified using command-line arguments; thus, we say “-print” if we want to print the name of the file. Note that it’s not “–print” (i.e., with two “-” characters before “print”), which we might expect.

Also notice that the result includes all files, including directories and special Unix files (e.g., device files). If you want to only look at files, then you can specify the “-type” filter. For example, the following command shows all files (i.e., not subdirectories, symbolic links, or the like) under the current directory:

find . -type f -print # find regular files

What if you want to find directories? Then instead of using “-type f”, specify “-type d”:

find . -type d -print # find directories

What if I only want to find files that match a certain pattern? Then I can filter using the “-name” test and the shell’s standard characters. For example, let’s say I want to find all of the files that end with “.txt”. I can then say:

find . -type f -name "*.txt" -print

The above applies two tests —only regular files (i.e., not directories or the like) that match the pattern “*.txt” will match and be printed.

What if I want to find files that end with “.txt” or “.text”? In such cases, it might be easiest to use the “or” option, written as “-o”, that combines two tests. For example:

find . -type f \( -name '*.txt' -o -name '*.text' \) -print

The “-o” option (for logical “or” — and yes, there is also a “-a” option that’s logical “and”) allows either of the tests to succeed in order for it to declare success. However, the items on either side of “-o” must be inside of parentheses. Since parentheses in the Unix shell have their own uses, we need to preface them with backslashes, to avoid clashes between the levels of parsing. But wait — if the “\(” and “\)” are touching the arguments, then you’ll get hard-to-understand errors. So make sure that “\(” and “\)” are surrounded by whitespace, if you want to avoid trouble.

Let’s say that I want to find old files on my system. Unix filesystems keep track of file ages in three different ways:

ctime (creation time) — when was the file first created?

mtime (modification time) — when was the file last modified?

atime (access time) — when was the file last accessed/read?

Let’s say that I want to find files in the current directory (and below) that were last accessed 7 days ago. I can say:

find . -atime 7 -print

The “atime” is measured in 24-hour increments, starting with midnight of the current day. So “-atime 7” means, “last accessed 7*24 hours before midnight today.”

But wait a second — when was the last time you wanted to find files that were accessed exactly 7 days ago? It’s far more likely that you want to find files that were last accessed less than 7 days ago. In order to do that, you need to preface the number with a “-” sign:

find . -atime -7 -print

By contrast, if you want to find all of those files that were accessed more than 7 days ago, you’ll want to preface the number with a “+” sign:

find . -atime +7 -print

And of course, if you want to find files that were accessed more than 2 days ago, but less than 9 days ago, you can say:

find . -atime +2 -atime -9 -print

Depending on your needs, it might well be better to use “mtime” rather than “atime”. I’m often interested in finding files I changed recently, rather than those I read recently. The same rules apply; here’s how I would find all of those files that I last modified more than two days ago but less than 9 days ago:

find . -mtime +2 -mtime -9 -print

Notice that I’m able to combine two rules (i.e., two “atime” or “mtime” rules) without using “-a” to join them together with a logical “and”.

Another useful thing to look for is big files. What files, for example, are bigger than 2 GB? I can say the following:

$ find . -size +2G -print

(I believe that this “-size” option only works this way on GNU find. Other versions might well require that you specify the file size in blocks. It has been a while since I used non-GNU versions.)

Look familiar? That’s right; the “+2” means “greater than”, and the “G” suffix means “GB”. You can use a bunch of suffixes to the number, to indicate just how big the file should be. As you might have guessed, you can say “-2M” to mean “less than 2 MB”, which on a modern computer is just about everything, to be honest.

We can also combine these, just as we did with “atime” and “mtime”: What files are bigger than 500 MB and smaller than 5 GB?

find . -size +500M -size -5G -print

We can combine these filters with others. What files are bigger than 500 MB and smaller than 5 GB, and were last accessed no more than 30 days ago?

find . -size +500M -size -5G -atime -30 -print

You can imagine using this sort of command to find large, unused files, such as old videos that you had forgotten are on your filesystem. Indeed, what if I’m only interested in finding MP4 files that are larger than 500 MB, smaller than 5 GB, and accessed in the last 30 days? I can add another condition:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -print

There are lots of other filters you can apply, and GNU find is especially full of them. There are alternative ways to specify dates. You can search for particular types of special files. You can search for certain permissions. And so forth. But the ones I’ve shown you are the ones I’ve used most often.

But the tests are only the first part of using “find”: Once you’ve gotten a list of files, what can you do with them?

So far, we’ve seen a single action, namely “-print”. There are a few others that you might find useful.

The first is “-ls”, which runs the Unix “ls” command (with a few options that’ll show size and permissions):

find . -size +500M -size -5G -atime -30 -name "*.mp4" -ls

The above will not only print the filename (like “-print”), but will also show lots of other information about the files we’ve found. What if you want to write this list to a file? Then just use the “-fls” option, and give it a filename:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -fls big-movies.txt

It’s pretty common to want to delete files. So you can use the “-delete” option to do so. Warning: Running a program that automatically deletes files can be very dangerous. I almost never do this, because I’m always so worried that something will go wrong. Here’s how I can remove all of the backup files in my Linux /var/log directory that are more than 21 days old:

find . -name '*.gz' -mtime +21 -delete -print

Note that you can have more than one action; in this case, my first action was “-delete”, and my second was “-print”.

It’s pretty common for me to want to search through an entire directory for a file that contains particular text. In other words, I want to run the “grep” utility on each file. I can do that by using the all-purpose “-exec” action. The basic idea is as follows: You hand “-exec” a command, and the command is then ended with \; (yes, backlash + semicolon). In between, you can write whatever Unix command you want, including options. The current filename can be put into the command with the special formula {} (i.e., empty curly braces). For example, I can say:

find . -name "*.txt" -exec grep Reuven {} \;

The above will show all lines from all files containing my name. (Of course, a regular expression can be far more complex than this; if you aren’t familiar with grep or regexps, you can take my free “regular expressions crash course.”) But the output only shows the lines we would get from “grep”, which (by default) doens’t show the name of the current file if you’re running it one file at a time. For this reason, we would be wise to include the “-H” option:

$ find . -name "*.txt" -exec grep -H Reuven {} \;

While “grep” is the most common command that I run via “-exec”, you can use any program you want, including programs that you’ve written. In this way, you can really make “find” work for you, and execute custom code for each file that fits a criteria. Combine “find” with “cron”, and you have an easy way to identify files that need your attention, or that should be removed, or that you’ve been looking for and otherwise cannot find.

If there’s one drawback to “find”, it’s that the search happens in real time. There is no database through which it runs. Which means that if you’re going through a very large directory structure, you might discover that “find” takes quite a while.

And that’s about it! If you’re like me, then you’ll find (no pun intended) that these use cases cover most of what you need with the “find” utility. The documentation is extremely long, but only because “find” has many other tests and actions that you can mix and match in a variety of ways.