The Unix philosophy in a nutshell

Â To understand anyone (or anything), one must strive to first understand their (or its) underlying philosophy; to begin to understand Linux is to begin to understand the Unix philosophy. Here, we shall not attempt to delve into every minute detail; rather, an overall understanding of the essentials of the Unix philosophy is our goal.Â Also, when we use the term Unix, we very much also mean Linux!

The way that software (particularly, tools) is designed, built, and maintained on Unix slowly evolved into what might even be called a pattern that stuck: the Unix design philosophy. At its heart, here are the pillars of the Unix philosophy, design, and architecture:

Everything is a process; if it's not a process, it's a file

One tool to do one task

Three standard I/O channel

Combine tools seamlessly

Plain text preferred

CLI, not GUI

Modular, designed to be repurposed by others

Provide the mechanism, not the policy

Let's examine these pillars a little more closely, shall we?

Everything is a process â if it's not a process, it's a file A process is an instance of a program in execution. A file is an object on the filesystem; beside regular file with plain text or binary content; it could also be a directory, a symbolic link, a device-special file, a named pipe, or a (Unix-domain) socket. The Unix design philosophy abstracts peripheral devices (such as the keyboard, monitor, mouse, a sensor, and touchscreen) as files â what it calls device files. By doing this, Unix allows the application programmer to conveniently ignore the details and just treat (peripheral) devices as though they are ordinary disk files. The kernel provides a layer to handle this very abstraction â it's called the Virtual Filesystem Switch (VFS). So, with this in place, the application developer can open a device file and perform I/O (reads and writes) upon it, all using the usual API interfaces provided (relax, these APIs will be covered in a subsequent chapter). In fact, every process inherits three files on creation: Standard input ( stdin : fd 0 ): The keyboard device, by default

( ): The keyboard device, by default Standard output ( stdout : fd 1 ) : The monitor (or terminal) device, by default

( : The monitor (or terminal) device, by default Standard error ( stderr :Â fd 2 ): The monitor (or terminal) device, by default Note fdÂ is the common abbreviation, especially in code, for file descriptor; it's an integer value that refers to the open file in question. Also, note that we mention it's a certain device by defaultÂ â this implies the defaults can be changed. Indeed, this is a key part of the design: changing standard input, output, or error channels is called redirection, and by using the familiar <, > and 2> shell operators, these file channels are redirected to other files or devices. On Unix, there exists a class of programs called filters . Note A filter isÂ a program that reads from its standard input, possibly modifies the input, and writes the filtered result to its standard output. Filters on Unix are very common utilities, such asÂ cat , wc , sort , grep , perl , head , and tail . Filters allow Unix to easily sidestep design and code complexity. How? Let's take the sort filter as a quick example.Okay, we'll need some data to sort. Let's say we run the following commands: $ cat fruit.txt orange banana apple pear grape pineapple lemon cherry papaya mango $ Now we consider four scenarios of using sort ; based on the parameter(s) we pass, we are actually performing explicit or implicit input-, output-, and/or error-redirection! Scenario 1: Sort a file alphabetically (one parameter, input implicitly redirected to file): $ sort fruit.txt apple banana cherry grape lemon mango orange papaya pear pineapple $ All right! Hang on a second, though. If sort is a filter (and it is), it should read from its stdin (the keyboard) and write to its stdout (the terminal). It is indeed writing to the terminal device, but it's reading from a file,Â fruit.txt . This is deliberate; if a parameter is provided, the sort program treats it as standard input, as clearly seen. Also, note thatÂ sort fruit.txtÂ Â is identical toÂ sort < fruit.txt . Scenario 2: SortÂ any given input alphabetically (no parameters, input and output from and to stdin/stdout): $ sort mango apple pear ^D apple mango pear $ Once you type sort and press the Enter key, and the sort process comes alive and just waits. Why? It's waiting for you, the user, to type something. Why? Recall, every process by default reads its input from standard input or stdin â the keyboard device!Â So, we type in some fruit names. When we're done,Â pressÂ Ctrl + D. This is the default character sequence that signifiesÂ end-of-file (EOF), or in cases such as this, end-of-input. Voila! The input is sorted and written. To where? To the sort process's stdout â the terminal device, hence we see it. Scenario 3: Sort any given input alphabetically and save the output to a file (explicit output redirection): $ sort > sorted.fruit.txt mango apple pear ^D $ Â Â Similar to Scenario 2, we type in some fruit names and then Ctrl + D to tell sort we're done. This time, though, note that the output is redirected (via the > meta-character) to theÂ sorted.fruits.txt Â file! So, as expected is the following output: $ cat sorted.fruit.txt apple mango pear $ Scenario 4: Sort a file alphabetically and save the output and errors to a file (explicit input-, output-, and error-redirection): $ sort < fruit.txt > sorted.fruit.txt 2> /dev/null $ Interestingly, the end result is the same as in the preceding scenario, with the added advantage of redirecting any error output to the error channel. Here, we redirect the error output (recall that file descriptor 2 always refers to stderr ) to theÂ /dev/null Â special device file; /dev/null is a device file whose job is to act as a sink (a black hole). Anything written to the null device just disappears forever! (Who said there isn't magic on Unix?)Â Also, its complement isÂ /dev/zero ;Â the zero device is a sourceÂ â an infinite source of zeros. Reading from it returns zeroes (the first ASCII character, not numeric 0); it has no end-of-file!

One tool to do one task In the Unix design, one tries to avoid creating a Swiss Army knife; instead, one creates a tool for a very specific, designated purpose and for that one purpose only. No ifs, no buts; no cruft, no clutter. This is design simplicity at its best. "Simplicity is the ultimate sophistication." - Leonardo da Vinci Take a common example: when working on the Linux CLI (command-line interface), you would like to figure out which of your locally mounted filesystems has the most available (disk) space. We can get the list of locally mounted filesystems by an appropriate switch (just df would do as well): $ df --local Filesystem 1K-blocks Used Available Use% Mounted on rootfs 20640636 1155492 18436728 6% / udev 10240 0 10240 0% /dev tmpfs 51444 160 51284 1% /run tmpfs 5120 0 5120 0% /run/lock tmpfs 102880 0 102880 0% /run/shm $ To sort the output, one would need to first save it to a file; one could use a temporary file for this purpose,Â tmp, and then sort it, using the sort utility, of course. Finally, we delete the offending temporary file. (Yes, there's a better way,Â piping; refer to the,Â Combine tools seamlesslyÂ section) Note that the availableÂ space is the fourth column, so we sort accordingly: $ df --local > tmp $ sort -k4nr tmp rootfs 20640636 1155484 18436736 6% / tmpfs 102880 0 102880 0% /run/shm tmpfs 51444 160 51284 1% /run udev 10240 0 10240 0% /dev tmpfs 5120 0 5120 0% /run/lock Filesystem 1K-blocks Used Available Use% Mounted on $ Whoops! The output includes the heading line. Let's first use the versatile sed utilityÂ â a powerful non-interactive editor toolÂ â to eliminate the first line, the header, from the output of df : $ df --local > tmp $ sed --in-place '1d' tmp $ sort -k4nr tmp rootfs 20640636 1155484 18436736 6% / tmpfs 102880 0 102880 0% /run/shm tmpfs 51444 160 51284 1% /run udev 10240 0 10240 0% /dev tmpfs 5120 0 5120 0% /run/lock $ rm -f tmp So what? The point is, on Unix, there is no one utility to list mounted filesystems and sort them by available space simultaneously.Â Instead, there is a utility to list mounted filesystems: df . It does a great job of it, withoption switches to choose from. (How does one know which options? Learn to use the man pages, they're extremely useful.) There is a utility to sort text: sort . Again, it's the last word in sortingtext, with plenty of option switches to choose from for pretty much every conceivable sort one might require. Note The Linux man pages: man is short for manual;Â on a Terminal window, type man man to get help on using man. Notice the manual is divided into 9 sections. For example, to get the manual page on the stat system call, type man 2 stat as all system calls are in section 2 of the manual. The convention used is cmd or API; thus, we refer to it as stat(2) . As expected, we obtain the results. So what exactly is the point? It's this: we used threeÂ utilities, not one.Â dfÂ , to list the mounted filesystems (and their related metadata), sed , to eliminateÂ the header line, and sort , to sort whatever input its given (in any conceivable manner). df can query and list mounted filesystems, but it cannot sort them. sort can sort text; it cannot list mounted filesystems. Think about that for a moment. Combine them all, and you get more than the sum of its parts! Unix tools typically do one task and they do it to its logical conclusion; no one does it better! Note Having said this, I would like to point outÂ â a tiny bit sheepishlyÂ â the highly renowned tool Busybox. Busybox ( http://busybox.net ) is billed as The Swiss Army Knife of Embedded Linux. It is indeed a very versatile tool; it has its place in the embedded Linux ecosystemÂ â precisely because it would be too expensive on an embedded box to have separate binary executables for each and every utility (and it would consume more RAM). Busybox solves this problem by having a single binary executable (along with symbolic links to it from each of its applets, such as ls, ps, df, and sort). So, nevertheless, besides the embedded scenario and all the resource limitations it implies, do follow the One tool to do one task rule!

Three standard I/O channels Several popular Unix tools (technically, filters) are, again, deliberately designed to read their input from a standard file descriptor called standard inputÂ (stdin) â possibly modify it, and write their resultant output to a standard file descriptorÂ standard outputÂ (stdout).Â Any error output can be written to a separate error channel called standard error (stderr). In conjunction with the shell's redirection operators ( > for output-redirection and < for input-redirection, 2> for stderr redirection), and even more importantly with piping (refer section, Combine tools seamlessly), this enables a program designer to highly simplify. There's no need to hardcode (or even softcode, for that matter) input and output sources or sinks. It just works, as expected. Let's review aÂ couple ofquick examplesto illustrate this important point. Word countÂ How many lines of source code are there in the CÂ netcat.c Â source fileÂ I downloaded? (Here, we use a small part of the popular open source netcat Â utility code base.)Â We use the wc utility. Before we go further, what'sÂ wc? word countÂ (wc) is a filter: it reads input from stdin, counts the number of lines, words, and characters in the input stream, and writes this result to its stdout.Further, as a convenience, one can pass filenames as parameters to it; passing theÂ -l option switch has wc only print the number of lines: $ wc -l src/netcat.c 618 src/netcat.c $ Here, the input is a filename passed as a parameter to wc . Interestingly, we should by now realize that if we do not pass it any parameters, wc would read its input from stdin, which by default is the keyboard device. For example is shown as follows: $ wc -l hey, a small quick test of reading from stdin by wc! ^D 4 $ Yes, we typed in 4 lines toÂ stdin; thus the result is 4, written to stdout â the terminal device by default. Here is the beauty of it: $ wc -l < src/netcat.c > num $ cat num 618 $ As we can see, wc is a great example of a Unix filter. cat Unix, and of course Linux, users learn to quickly get familiar with the daily-useÂ cat utility. At first glance, all cat does is spit out the contents of a file to the terminal. For example, say we have two plain text files, myfile1.txt and myfile2.txt : $ cat myfile1.txt Hello, Linux System Programming, World. $ cat myfile2.txt Okey dokey, bye now. $ Okay. Now check this out: $ cat myfile1.txt myfile2.txt Hello, Linux System Programming, World. Okey dokey, bye now. $ Instead of needing to run cat twice, we ran it just once, by passing the two filenames to it as parameters. In theory, one can pass any number of parameters to cat: it will use them all, one by one! Not just that, one can use shell wildcards too ( * and ? ; in reality, the shell will first expand the wildcards, and pass on the resultant path names to the program being invoked as parameters): $ cat myfile?.txt Hello, Linux System Programming, World. Okey dokey, bye now. $ This, in fact, illustrates another key point: any number of parameters or none is considered the right way to design a program. Of course, there are exceptions to every rule: some programs demand mandatory parameters. Wait, there's more. cat too, is an excellent example of a Unix filter (recall: a filter is a program that reads from its standard input, modifies its input in some manner, and writes the result to its standard output). So, quick quiz, if we just run cat withÂ noÂ parameters, what would happen? Well, let's try it out and see: $ cat hello, hello, oh cool oh cool it reads from stdin, it reads from stdin, and echoes whatever it reads to stdout! and echoes whatever it reads to stdout! ok bye ok bye ^D $ Â Wow, look at that: cat blocks (waits) at its stdin, the user types in a string and presses the EnterÂ key,Â cat responds by copying its stdin to its stdoutÂ â no surprise there, as that's the job of cat in a nutshell! One realizes the commands shown as follows: cat fname is the same as cat < fname

is the same as cat > fname creates or overwrites theÂ fname Â file There's no reason we can't use cat to append several files together: $ cat fname1 fname2 fname3 > final_fname $ There's no reason this must be done with only plain text files; one can join together binary files too. In fact, that's what the utility doesÂ â it concatenates files. Thus its name; as is the norm on Unix, is highly abbreviatedÂ â from concatenateÂ to just cat. Again, clean and elegantÂ â the Unix way. Note cat shunts out file contents to stdout, in order. What if one wants to display a file's contents in reverse order (last line first)? Use the UnixÂ tac utilityÂ âÂ yes, that's cat spelled backward! Also, FYI, we saw that cat can be used to efficiently join files. Guess what: the split (1) utility can be used to break a file up into pieces.

Combine tools seamlessly We just saw that common Unix utilities are often designed as filters, giving them the ability to read from their standard input and write to their standard output.This concept is elegantly extended to seamlessly combine together multiple utilities, using an IPC mechanism called a pipe . Also, we recall that the Unix philosophy embraces the do one task onlyÂ design. What if we have one program that does task AÂ and another that does task BÂ and we want to combine them? Ah, that's exactly what pipes do! Refer to the following code: prg_does_taskA | prg_does_taskB Note A pipe essentiallyÂ is redirection performed twice: the output of the left-hand program becomes the input to the right-hand program. Of course, this implies that the program on the left must write to stdout, and the program on the read must read from stdin. An example: sort the list of mounted filesystems by space available (in reverse order). As we have already discussed this example in theÂ One tool to do one taskÂ section, we shall not repeat the same information. Option 1: Perform the following code using a temporary file (refer section, One tool to do one task): $ df --local | sed '1d' > tmp $ sed --in-place '1d' tmp $ sort -k4nr tmp rootfs 20640636 1155484 18436736 6% / tmpfs 102880 0 102880 0% /run/shm tmpfs 51444 160 51284 1% /run udev 10240 0 10240 0% /dev tmpfs 5120 0 5120 0% /run/lock $ rm -f tmp Option 2 : Using pipesâclean and elegant: $ df --local | sed '1d' | sort -k4nr rootfs 20640636 1155492 18436728 6% / tmpfs 102880 0 102880 0% /run/shm tmpfs 51444 160 51284 1% /run udev 10240 0 10240 0% /dev tmpfs 5120 0 5120 0% /run/lock $ Not only is this elegant, it is also far superior performance-wise, as writing to memory (the pipe is a memory object) is much faster than writing to disk. One can extend this notion and combine multiple tools over multiple pipes; in effect, one can build a super toolÂ from several regular tools by combining them. As an example: display the three processes taking the most (physical) memory; only display their PID, virtual sizeÂ (VSZ), resident set sizeÂ (RSS) (RSS is a fairly accurate measure of physical memory usage), and the name: $ ps au | sed '1d' | awk '{printf("%6d %10d %10d %-32s

", $2, $5, $6, $11)}' | sort -k3n | tail -n3 10746 3219556 665252 /usr/lib64/firefox/firefox 10840 3444456 1105088 /usr/lib64/firefox/firefox 1465 5119800 1354280 /usr/bin/gnome-shell $ Here, we've combined five utilities,Â ps ,Â sed ,Â Â awk ,Â sort , andÂ tail ,Â over four pipes. Nice! Another example: display the process, not including daemons*, taking up the most memory (RSS): ps aux | awk '{if ($7 != "?") print $0}' | sort -k6n | tail -n1 Note AÂ daemonÂ is a system background process; we'll cover this concept inÂ Daemon Process here:Â https://www.packtpub.com/sites/default/files/downloads/Daemon_Processes.pdf.

Plain text preferred Unix programs are generally designed to work with text as it's a universal interface. Of course, there are several utilities that do indeed operate on binary objects (such as object and executable files); we aren't referring to them here. The point is this: Unix programs are designed to work on text as it simplifies the design and architecture of the program. A common example: an application, on startup, parses a configuration file. The configuration file could be formatted as a binary blob. On the other hand, having it as a plain text file renders it easily readable (invaluable!) and therefore easier to understand and maintain. One might argue that parsing binary would be faster. Perhaps to some extent this is so, but consider the following: With modern hardware, the difference is probablynotsignificant

A standardized plain text format(such as XML)would have optimized code to parse it, yielding both benefits Remember, simplicity is key! Â

CLI, not GUI The Unix OS, and all its applications, utilities, and tools, were always built to be used from a command-line-interface (CLI), typically, the shell. From the 1980s onward, the need for a Graphical User Interface (GUI) became apparent. Robert Scheifler of MIT, considered the chief design architect behind the X Window System, built an exceedingly clean and elegant architecture, a key component of which is this: the GUI forms a layer (well, actually, several layers) above the OS, providing libraries for GUI clients, that is, applications. Note The GUI was never designed to be intrinsic to applications or the OSâit's always optional. This architecture still holds up today. Having said that, especially on embedded Linux, performance reasons are seeing the advent of newer architectures, such as the frame buffer and Wayland. Also, though Android, which uses the Linux kernel, necessitates a GUI for the end user, the system developer's interface to Android, ADB, is a CLI. A huge number of production-embedded and server Linux systems run purely on CLI interfaces. The GUI is almost like an add-on feature, for the end user's ease of operation. Note Wherever appropriate, design your tools to work in the CLI environment; adapting it into a GUI at a later point is then straightforward. Cleanly and carefully separating the business logic of the project or product from its GUI is a key to good design.

Modular, designed to be repurposed by others From its very early days, the Unix OS was deliberately designed and coded with the tacit assumption that multiple programmers would work on the system. Thus, the culture of writing clean, elegant, and understandable code, to be read and worked upon by other competent programmers, was ingrained. Â Later, with the advent of the Unix wars, proprietary and legal concerns overrode this sharing model. Interestingly, history shows that the Unix's were fading in relevance and industry use, until the timely advent of none other than the Linux OS â an open source ecosystem at its very best! Today, the Linux OS is widely acknowledged as the most successful GNU project. Ironic indeed!