1 second per second is harder than it sounds

If you've never had problems keeping your clock synced, you just haven't run enough machines yet. Once you start scaling beyond a certain point, it will become obvious that even elementary timekeeping is no small feat, even with help from tools like ntpd.

First of all, to be clear, I'm not talking about the hardware clock. That really only matters when you read from it, and you generally do that once: at boot. Instead, I'm talking about the system clock -- the thing which gives values to programs which call time() or gettimeofday() or clock_gettime() or any of the other variants. It's maintained by the kernel, assuming Linux here.

What sort of situations can you have with system clock timekeeping? Here are a few I have encountered.

S0: your clock consistently ticks at 1 second per second, and is synced to the correct time, as best you can tell.

S1: your clock consistently ticks at 1 second per second, but is a little offset from the correct time: it's "running fast" or "running slow". It might say 13:01:02 when the actual time is 13:01:05.

S2: your clock ticks at 1 second per second, but is wildly offset from correct time -- over 1000 seconds, or much much more. It might be set to tomorrow, next week, next year, or the year 1970.

S3: your clock consistently ticks slightly fast or slightly slow, delivering something other than 1 second per second. It's always the same amount, though.

S4: your clock consistently ticks really fast or really slow, gaining or losing whole milliseconds (or more!) every second, but again it's always the same amount.

S5: your clock ticks fast, slow, normally, and everything else in-between, and changes constantly. Sometimes it gets more than 1 second per second, other times it gets less. Even then, the degree by which it runs too quickly or too slowly changes unpredictably.

You will eventually see all of these if you have to tend a big enough fleet. What can you do about them? That all depends on what tools you have available.

S0 is the ideal state. If, somehow, you have a clock which stays here by itself, you have an anomaly -- nothing is truly perfect when it comes to timekeeping. Still, ntpd will be quite happy to keep it there.

S1 is more likely. ntpd will adjust (slew) the system clock to make it tick slightly more quickly or slowly to "burn off" the offset. These are tiny little adjustments, on the order of parts per million. If it works, your clock will try to approach S0 but will probably be more like S3.

S2 happens far too often. If you start up ntpd and the system clock is out in the middle of nowhere, it'll refuse to fix things. By default, at startup, it'll step the clock up to 1000 seconds to fix something that's wildly wrong. It's a bit like picking up the needle on a record player and dropping it back down: you get a harsh result, but you'll probably be closer to where you want to be.

It's easy to make S2 happen by mistake. Just set your hardware clock to something insane, then reboot and start ntpd by itself without doing anything else before that. It'll see the offset and will give up. Messing up the sense of whether you use local time or UTC in your hardware clock is a great way to start this chain of events in motion.

The usual workaround is to make ntpdate run at startup just after reading the hardware clock and just before starting ntpd. ntpdate will adjust the clock from anything to anything else. Of course, if it fails, your startup will continue, and ntpd will start up, and it'll bomb. If you're reading this at a point when ntpdate has actually been retired, then just think "ntpd -q" instead.

If you live in this world, ntpd's "-g" will let it step more than 1000 seconds for that one-time adjustment. Use it at your own risk.

S3 is probably where you wind up on most systems. Your clock's pace will always vary a tiny little bit for different reasons (manufacturing variance, temperature changes, cosmic rays, the phase of the moon, and whatever you want), but ntpd will correct for it. It'll fix an error of up to 500 parts per million in either direction.

S4 is when we start getting into the realm of things which are increasingly difficult to fix with just a *clickity click* at the keyboard. This is when you have a system which has a substantial offset in the number of microseconds it gets per kernel "tick". The usual value I see on healthy machines is 10000. Run "adjtimex -p" to see what yours is using right now.

When I say unhealthy, I mean machines which are dozens or hundreds of ticks offset from 10000. They might be 9800 or even less, so every time the rest of the world gets 10000 ticks, it only gets 9800. That's a clock which is running slowly.

ntpd will not correct for this. Clock adjustment on Linux comes in two flavors: macro, in the form of the "tick" setting, and micro, in the form of the "freq" setting. ntpd will only adjust up to 500 ppm, and on a machine with USER_HZ=100, that's a "freq" value of -/+ 32768000 (65536 * 500). Outside of that (about 5 ticks either way), it won't touch it.

ntpd will probably interpret this as increasing jitter from its time sources, even though it's the local clock which has the problem. If you have one of these "time fixer" scripts which looks for insanity from ntpd and restarts it any time it goes out of sync, you will keep getting the "one time step at startup" behavior from ntpd, and your clock will just keep getting dragged along by this fixit script. That's a lot of needle-dropping onto the record which is your clock.

Now, let's say you're stuck with a box like this. You can use adjtimex yourself to set the tick value to about what it should be. As long as it's then within 500 ppm of what it's actually doing, ntpd will eventually figure it out, and it will start declaring itself to be in sync! This is a pretty dirty hack.

It's not entirely clear what the impact of doing this to your system might be. I wouldn't be surprised if programs got differing numbers of timeslices or something equally weird in such an environment.

S5 is the bottom of this pile of crazy: a clock which is so broken that it can't even decide how broken it wants to be. Sometimes it's really fast, and other times it's only slightly fast. Or maybe sometimes it's really slow, but other times it's only slightly slow. In other words, not only does it have a ridiculous variance, but even that variance varies. It jitters.

ntpd can't handle this. You also won't be able to do the S4 workaround of setting your own tick value once and letting ntpd work it out, since the resulting difference will still be too large. If this is the state of your system, I'd give up. Swap the hardware and try again on another motherboard. It's just not worth the trouble.

One thing I've deliberately omitted here is any discussion of the actual clock source used by the kernel. Whether you're talking TSC, HPET or something else, that also can change how your system behaves (and which situations you get into), but I'll have to cover that on another occasion.