There is definitely an art to writing scripts that don’t suck. It isn’t enough to just get the job done. It’s everything to make your script do the right thing, with messages, with errors, and over time (like years from when it was written).

1. Always use absolute paths for everything.

You cannot assume what your environment will be. You can’t. If your script executes via cron it’s likely it won’t have any of the environment variables you depend on. Including PATH. Or HOME. The working directory will probably not be what you expect, either, so don’t write to files in the directory you’re in without thinking about it. Especially if you’re running as root. You’ll actually have permission to write files as root, and all you’ll do is clutter up some filesystem and make me go looking for the problem later.

Usually when a developer complains to me that their script isn’t writing output a simple search (find or locate) will show their output in some completely unexpected location. Like the root filesystem. Oops.

2. Fix the environment in the script, not the default environment.

Think of OS defaults as an electronic Switzerland. There are so many competing interests that an OS has to satisfy that the defaults are generic and neutral. If your script needs environment variables, like database settings (ORACLE_HOME), put them in your script. Don’t ask your sysadmin to change the system defaults.

The reason I hate changing defaults is mainly because it’s inevitable that another script, running on the same machine, will want different defaults. Besides, if you want the defaults changed you’re probably making assumptions in the script, and I don’t like that.

3. If you don’t want to hard-code, use a configuration file.

You have four scripts that need to have the same environment? The same variables? Use a configuration file and read it in at the beginning of each.

This also gives you a chance to do cool things, like detecting the host you’re running on (/bin/hostname) and setting the variables properly for development, test, or production. That’ll make moving between environments easier.

While you’re at it, name the configuration files, and the variables inside, intelligently. Names like “config” and “config.pl” work nicely. Names like “appwebprd” are slightly more confusing.

4. Write errors to syslog.

The UNIX gods gave us a system logging service. It has its problems but it is well understood, and the logs it writes are usually rotated and handled properly by default. Many monitoring systems also watch the system’s log files, like /var/log/messages, and so things you send there will get handled.

Check out the ‘logger’ command if you want to do this from a shell script.

5. Write nothing to stdout or stderr unless you are debugging.

I don’t want to see the output of your script on the console of my server. Why? Because when I’m trying to work at the console of the server (fixing a problem) your script will write all over my terminal session. Icky! If I am not there then nobody sees the output. What good is that?

This also goes for programs you call in your script. Don’t let them write crap to stdout or stderr, either.

6. Don’t do error handling in loops.

Don’t send error email from inside a loop. Don’t write to syslog from inside a loop. Set a flag for the error and handle it once, at the end. If you want to do things differently while you’re developing, fine, but in production your error loops will fill mail spools and logs when something goes wrong. When that happens I suddenly have three or four problems, not just one.

7. Have a debug mode that is not the standard operating mode of your script.

I’m not talking about for your development environment, either. In production, if your script is malfunctioning, I’d like to be able to run it in debug mode and get useful output to see where the problem lies.

8. Throw useful errors when you choose to.

Filesystem full? Permission problem? Tell me where you were writing so I can fix it. Can’t connect to something? Tell me what it is you’re connecting to, hostname or IP and port.

Installation scripts are notorious for useless filesystem errors. I also noticed that Red Hat’s up2date script gives you information like “Requires 200 MB additional space” without telling you where it needs it. It’s only experience that tells me that it’s complaining about /var/spool, and a less experienced admin isn’t going to know that.

9. Become a daemon properly.

If your script is meant to run in the background, the “production” mode of the script should just put itself fully in the background, also known as “daemon” mode. Scripts that need to be explicitly backgrounded need more care and feeding, and if you do the right thing and add the few lines of code to become a daemon you score points with your admins.

This is also an opportunity for doing interesting things for debugging. Add a “foreground” mode that also turns on debugging, and you’ve dealt with two problems at once.

10. Write a PID file.

It’s so nice to be able to “kill `cat /var/run/yourprogram.pid`” than it is to cobble some killall or pkill command together. This should also serve as your lock file, so that two copies of the program don’t run at the same time unless you mean them to.

This also means it should delete the lock file when the script dies, so you’ll have to add a little bit of signal handling code. There are countless examples of this out there, all findable with Google.

11. Create temporary files in /tmp.

Many OSes have programs that clean /tmp automatically, so if your script leaves stuff lying around in /tmp the stuff will get cleaned up. Besides, that’s what /tmp is for. You might want to make this a variable in your configuration file, just so it’s easy to change someday.

12. Make a unique temporary file every time.

What if two copies of the program start? What if the file doesn’t get deleted properly? Use a function like mktemp() or program like /bin/mktemp.

13. Absolutely know what happens if a variable is empty.

This is especially important if you are deleting things. What happens if mktemp fails and all you have is an empty variable? I wasn’t thinking and wrote something like this recently:

$TEMPDIR = `/bin/mktemp -d`

…

rm -f $TEMPDIR/*

rmdir $TEMPDIR

Yeah… mktemp failed and my rm statement became “rm -f /*”. Great. And it was all because I was paranoid of using “rm -rf” in a script.

14. Do the right thing at system shutdown, and don’t require special shutdown procedures.

I alluded to this in #10 with the PID file cleanup. I absolutely hate rc.shutdown scripts because inevitably the script fails and then the system won’t shut down when I need it to. I also hate adding things to the rc.d directories because it’s one more thing to deal with. Generally I put scripts that need to start at boot in rc.local, and then let them die at system shutdown.

This means that if you’re a script and I’m your admin you need to catch a TERM signal, at least, and quickly do whatever you need to do before you die.

What am I missing here? Anything? These are all the annoyances I can recall from the past month or so, but maybe the developers I support have a limited repertoire of shenanigans to pull. :-)

Like this: Like Loading...