Writing Simple Unix Scripts Phil Jeffrey, Mar 2007, v0.1

Preamble

C Shells and Bourne Again Shells

Which shell one should use is more a matter of dogma than anything else. Csh and tcsh are probably the more common ones in crystallography because of the enhancements in tcsh were available more widely than bash at some time in the past when we were all still working on SGIs (one of the machines that led the shift away from VAX/VMS as the usual operating system for crystallography). Program suites often come with two different startup or configuration files compatible with the different (sh, csh) shell syntaxes, however when there is only one config file version it is more likely to be written in C shell than in Bourne shell. There's a difference between using shells for command line execution and using them to write elaborate scripts. Perl and Tcl (and perhaps even Python) are better suited to such scripts. Sh/Bash zealots like to point out C shell limitations in the widely disseminated page C shell programming considered harmful but frankly there are better scripting languages to do most of the more advanced system administration functions in Unix.

Consequently, learn Perl or Tcl (or even C) if you want to do cute things with scripts, and Keep It Simple in tcsh or bash.

You can also RTFM tcsh. There are any number of shell guides and introductions if you just Google for them.

Hash Pling Slash (#!/)

#!/bin/csh -f

execute the script as input to the program listed after the #! characters

A very simple script would look like:

#!/bin/csh -f # # this is a comment # echo "hello world"

Simple Csh Syntax

# signifies a comment

the first word on the line is assumed to be an executible program or script or equivalent built-in command - the program can have its path explicitly specified (/bin/ls or ./hackit), or implicitly specified (ls). In the latter case the shell consults the path variable to find the first match to the program "ls". This is less secure so it is better to specify the path explicitly.

or script or equivalent built-in command - the program can have its path explicitly specified (/bin/ls or ./hackit), or implicitly specified (ls). In the latter case the shell consults the variable to find the first match to the program "ls". This is less secure so it is better to specify the path explicitly. "string" or 'string' designates a character string - in the double-quote case "$test" is converted to the value of the variable "test" if it exists or if it does not the shell will throws an error. The value of '$test' is in fact just $test - it does not do variable substitution inside single quote marks.

C shell variables can be set by the syntax "set name = value" and referenced by $name.

Simple constructions similar to the C programming language like: if (test) then ... endif and while (test) then ... end also work.

If you're writing shell scripts I assume you already know about redirection, but to reiterate:

> filename - redirects standard output to the filename - it will overwrite existing files, not append.

- redirects standard output to the filename - it will overwrite existing files, not append. >& filename - redirects standard output and standard error to the filename

- redirects standard output to the filename >> filename and >>& filename appends to, rather than overwrites, the file.

and >>& appends to, rather than overwrites, the file. < filename - redirects standard input from the file to the program

- redirects standard input from the file to the program << word - see below

- see below | - connect the standard output of the prior command/program to the standard input of the subsequent command/program (i.e. a pipe).

|& - same thing as | but also connects standard error to standard input

The csh/tcsh feature that we are most concerned with is how to get my data into my program. Specifically you want to get the shell to shove a series of lines into the program being executed rather than interpret them in shell syntax. One very tedious way to achieve this is to do:

echo "first line of program input" > instructions.dat echo "second line of program input" >> instructions.dat echo "third line of program input" >> instructions.dat program_name < instructions.dat

program_name << EOF-prog first line of program input second line of program input third line of program input EOF-prog

Reads the C Shell input up to a line that is identical to word. word is not subjected to variable, file name or command substitution, and each input line is compared to word before any substitutions are done on the input line. Unless a quoting \, ", ', or ` appears in word, variable and command substitution is performed on the intervening lines, allowing \ to quote $, \ and `. Commands that are substituted have all blanks, tabs, and newlines preserved, except for the final newline which is dropped. The resultant text is placed in an anonymous temporary file that is given to the command as its standard input.

I use this construction all the time in CCP4 scripts:

#!/bin/csh -f # # run SHELXC # /usr/local/shelx/macosx/shelxc se1 << EOF HREM se1rm.sca PEAK se1pk.sca INFL se1in.sca LREM se1lo.sca CELL 35.104 63.496 76.458 90. 90. 90. SPAG P212121 FIND 8 NTRY 50 EOF #

Running the Script

chmod +x my_script.csh

chmod a+x my_script.csh

If you just type "my_script.csh" the shell may or may not find the script. This is because any command that you type in the shell that isn't an absolute or relative path (e.g. /bin/ls, ./myprog, ../myotherprog) is first looked for as a shell built-in (echo is one such command) then the path variable is searched from left to right for a location containing the command "my_script.csh". Do an "echo $path" to see the contents of your path - it's a list of directories in which one is supposed to search for programs. If your path does not contain "." then it will not find such a command in the current directory, and it may find a completely different file called my_script.csh and execute that !!. So while typing "ls" is likely to execute the program /bin/ls it is not guaranteed to do so. For reasons of security and sanity it is best to use the absolute path either as

/Users/phil/Structures/examples/my_script.csh

./my_script.csh

set path = ($path .)

Getting Into Trouble With More Advanced Shell Syntax

You could create a simple disk space monitoring script:

#!/bin/csh -f # # while (1) sleep 60 df -kl end

You can simplify laborious tasks for doing things like calculating the Mean Fractional Isomorphous Difference (MFID) between all possible pairs of MTZ (.mtz) files containing single datasets:

#!/bin/csh -f # echo "" > mfid.log for file1 (*.mtz) for file2 (*.mtz) echo "Using $file1 and $file2" >> mfid.log ./mfid.csh $file1 $file2 >> mfid.log end end

Now, how do you get mfid.csh to accept the filenames as arguments ? Well the shell allows this via special variables $0, $1, $2 etc:

#!/bin/csh -f # \rm mfid_merged.mtz # cad HKLIN1 $1 HKLIN2 $2 HKLOUT mfid_merged.mtz << eof-cad RESOLUTION OVERALL 50.0 6. SYMMETRY P6122 TITLE merge two data files together LABIN FILE 1 E1=F E2=SIGF CTYPE FILE 1 E1=F E2=Q LABOUT FILE 1 E1=FN1 E2=SIGFN1 LABIN FILE 2 E1=F E2=SIGF CTYPE FILE 2 E1=F E2=Q LABOUT FILE 2 E1=FN2 E2=SIGFN2 END eof-cad

More examples could go here if I felt that they would do more good than harm.

Tests for file name existence:

if (-e $1) echo "File $1 exists" if (! -e $1) echo "File $1 does not exist"

Tests for numeric values:

if ($a > $b) echo "A ($a) is more than B ($b)" if ($a == 9) echo "A is equal to 9" if ($a <= 9) echo "A is less than or equal to 9"

String comparions:

if ("$q" == "yes") echo "Answer is yes"

If you get to here you are way beyond the point where you should have read the C Shell Field Guide.

Mathematics