At the company where this pilot fish works, the online telephone directory is rebuilt every night using data that comes from both the Human Resources database and Microsoft Active Directory.

And that works fine — mostly. “Several times each year, a few thousand employees would be missing from the directory the next morning,” fish reports. “Then the programmer would run the batch jobs manually and, within an hour or so, the directory would be back to normal.”

Each time that happens, fish asks for the root cause of the problem. And every time, he's told it's a fluke, or a glitch in Active Directory.

Fish isn't happy with that answer, and he keeps asking. And asking. For, literally, a couple of years.

Finally, after hearing fish insist for the umpteenth time that there must be an identifiable root cause, the programmer notices that the two batch jobs required to build the directory are timed to run about 15 minutes apart — first the one that extracts data, then the one that actually creates the company phonebook.

It turns out that, on nights when the server is especially busy, it takes more than 15 minutes to finish the first batch job. But the second job starts on time — and sometimes the telephone directory rebuild job finishes before the first job has extracted all of the source information.

Fish has his root cause — and a fix.

“Adding a dependency rule to the scheduling system for the second batch job resolved the problem permanently,” says fish.

“Proving once again that there is no such thing as a glitch or a fluke.”

Sharky happens to know there is such a thing as a fluke. So don't leave me floundering — send me your true tale of IT life at sharky@computerworld.com. You can also subscribe to the Daily Shark Newsletter and read some great old tales in the Sharkives.