You may have seen this on New Year's Eve:

Another leap second, another slew of outages. Handling time correctly is hard!https://t.co/kJepOfsKkv pic.twitter.com/Fwz2Xtpzkd — Dan Luu (@danluu) January 1, 2017

I'd heard a little about this problem, but I didn't understand how it broke code, and what to do about it. So here is an explainer.

Background

Earlier this year, the International Earth Rotation and Reference Systems Service decided to add an additional second to the year. This is due to the fact that the Earth's day-to-day rotation does not perfectly line up with a 86400 second day.

So when the clock hit 23:59:59 on New Year's Eve, it usually ticks over to 00:00:00, but this year, repeated the second 23:59:59. So if you had measurement code that looked like:

a := time.Now() // Dec 31 2016 23:59:59 and 800ms // While this operation is in progress, the leap second starts doSomeExpensiveOperation() duration := time.Since(a) // Taken at Dec 31 2016 23:59:59 (#2!) and 100ms fmt.Println(duration)

We usually assume in our code that time can't go backwards, but the duration there is -700ms ! This can cause big problems if you have code that always expects nonnegative time durations. In particular code that accepts timeouts might panic or immediately error if it gets a negative value. Cloudflare passed a negative duration to rand.Int63n , which crashed some of their servers.

But this is only a problem on leap seconds, which occur once a year at most. That's not good, but I can handle one problem a year. That would be nice, but unfortunately time on your servers can jump around quite a bit. An individual server might get a few seconds ahead or behind the rest of your servers. When it gets the correct time from the NTP server, the time on the server might jump forward or backward a few seconds. If you run any number of servers you are bound to hit this problem sooner or later.

Ok, how do I deal with it?

Good question! The Linux community implemented CLOCK_MONOTONIC as an option to clock_gettime to solve this problem. CLOCK_MONOTONIC returns an integer number of nanoseconds that only ever increases. You can use this to get more accurate (and always nonnegative) deltas than getting the system time twice and subtracting the samples.

You'll have to check whether your programming language uses CLOCK_MONOTONIC for implementing calls to get the system time. Many languages don't use it for the simple "what time is it" call, because it returns you time from a random starting point, not the epoch.

So you can't really translate between a CLOCK_MONOTONIC value and any given human date; it's only useful when you take multiple samples and compare them.

In particular

The Go standard library does not use CLOCK_MONOTONIC for calls to time.Now() . Go has a function that implements CLOCK_MONOTONIC, but it's not a public part of the standard library. You can access it via the monotime library - replace your calls to time.Now() with monotime.Now() . Note you will get back a uint64 that only makes sense if you call monotime.Now() again a little while later (on the same machine) and subtract it from the first value you got.

import "time" import "github.com/aristanetworks/goarista/monotime" func main() { a := monotime.Now() someExpensiveOperation() b := monotime.Now() // b - a will *always* be greater than 0 fmt.Println(time.Duration(b - a)) }

Porting code errata

I ported Logrole (a Twilio log viewer) to implement monotonic time. In the past it was really easy to call t := time.Now() and then, later, somewhere else in the code, call diff := time.Since(t) to get a Duration. This let you both use t as a wall clock time, and get an elapsed amount of time since t , at some unspecified point later in the codebase and in time. With CLOCK_MONOTONIC you have to separate these two use cases; you can get a delta or you can get a Time but you can't get both.

The patch was pretty messy and an indication of the problems you might have in your own codebase.

Another problem to watch out for is a uint64 underflow. If you have code like this:

if since := time.Since(aDeadline); since > 0 { return nil, errors.New("deadline exceeded") }

You might port it to something like:

now := monotime.Now() deadline := now + uint64(500*time.Millisecond) someExpensiveOperation() if monotime.Now() - deadline > 0 { return nil, errors.New("deadline exceeded") }

The problem here is you are subtracting uint64 values and if now is under the deadline, you'll underflow and get a very large uint64 value (which is greater than 0), so you'll always execute the if condition. Instead you need to avoid the subraction as well and use

if monotime.Now() > deadline {

instead.

Conclusion

It is hard to get this right, but it's probably a good practice to separate out wall clock time values from time values used for deltas and timeouts. If you don't do this from the start though you are going to run into problems. It would be nice if this was better supported in your language's standard library; ideally you'd be forced to figure out which use case you needed a time value for, and use it for that.

Liked what you read? I am available for hire.