While working on a major rewrite of the Datetime Julia library for Dates and DateTimes, increased performance was a key goal. And though the original implementation was already fast, Julia developers tend to be greedy, so naturally I wanted more.

One of the known bottlenecks at the time, was how leap seconds were being dealt with. The DateTime type accounted for leap seconds like any other second, and naturally this complicated the internals a bit. See, the internal type is just stored as a machine instant, a monotonically increasing number from zero.

julia> dt = datetime(2014,6,19) 2014-06-19T00:00:00 UTC julia> int(dt) 63538819225000

So the tricky part, comes in the algorithm to calculate this machine instant given the parts of a date (year, month, day, etc.). In a leap-seconds-don't-exist kind of world, this is trivial:

totalms = ms + 1000*(s + 60mi + 3600h + 86400*totaldays(y,m,d))

The hard part, is when trying to convert the above into leap-seconds-do-exist world.

Without going into too much detail, the solution basically involves using two cached arrays, one in the leap second timeline specifying the instant a leap second occurred; while the other is on a non-leap second timeline specifying the instant right before a leap second would occur (yes, it totally makes me think of alternate timelines).

The bottleneck problem boils down to the fact that every time you create or show a DateTime , you have to do a lookup into these leap second arrays to figure out how many leap seconds need to be corrected in either setting the machine instant, or displaying the right date parts.

The natural thing here is to employ binary search, which Base Julia provides an excellent implmentation of in the searchsorted family of functions. Basically, we calculate our totalms from above, then do binary search to figure out when the previous leap second instant occured and how many leap seconds it represents and need to be corrected for our calculated instant (in this case, we could use the searchsortedlast function to get the last leap second instant before our calculated instant). Simple enough, right? End of story? Move on?

But let's step back a minute and think about the nature of the problem. A major insight is the fact that our leap second arrays are static--nothing is being added or removed apart from library releases and when the IERS decides things are getting too out of whack with earth's rotation. So how could we use this knowledge to improve performance? Well, if we take our searchsortedlast binary search to the extreme, it would ideally look like:

const LEAPSECONDS = [...] function setleaps(totalms) if totalms < 62624707200000 # our middle value of LEAPSECONDS if totalms < 62388144000000 # cut 1st half in half again if totalms < 62230377600000 # last cut needed return 1 # only one leap second has occured else return 2 # 2 leap seconds end else ... else ... end

Basically, a hand-written searchsortedlast binary search tree using our static array values that elimates all recursive/function call overhead and is close to the metal. But who wants to write all that out by hand? And when our leap second arrays do get updated, we'll have to manually rebalance our tree, and hand-write it all over again.....yuck. But wait, I seem to recall a famous PR by Steven Johnson where he used a macro for hand-unrolling array coefficients that other libraries wouldn't dream of trying to maintain. Perhaps we too can leverage Julia's meta-programming capabilities to do our dirty work.

# Recursively build binary search tree w/ known lookup values # A is sorted array of lookup values # R is return values for each index of A # i.e. R[1] is returned for values < A[1], R[2] for < A[2], etc. function searchsortedlasttree(A,R,var_name) l = length(A) mid = iseven(l) ? l>>>1 : (l>>>1)+1 # Handle base case if mid == 1 if l == 1 return :($var_name < $(A[1]) ? $(R[1]) : $(R[2])) else # l == 2 return :($var_name < $(A[1]) ? $(R[1]) : $var_name < $(A[2]) ? $(R[2]) : $(R[3])) end end iftree = Expr(:if) iftree.args = Array(Any,3) iftree.args[1] = :($var_name < $(A[mid])) # condition iftree.args[2] = searchsortedlasttree(A[1:mid-1],R[1:mid],var_name) iftree.args[3] = searchsortedlasttree(A[mid+1:end],R[mid+1:end],var_name) return iftree end macro leapstree(a,r,var_name) # var_name is the variable name that will be passed in thru the parent function A = eval(a) # safe to eval here because 'a' is known at compile time R = eval(r) # same here ret = Expr(:block) # we'll store everything in a big block expression # First we manually handle before and after the ends of our leaps array push!(ret.args,:($var_name < $(A[1]) && return 0)) push!(ret.args,:($var_name >= $(A[end]) && return $(endof(A)*1000))) # Now we call the recursive searchsortedlasttree to get the rest push!(ret.args,searchsortedlasttree(A[2:(endof(A)-1)],R,var_name)) return ret end const SETLEAPS = [62214480000000,62230377600000,62261913600000, 62293449600000,62324985600000,62356608000000,62388144000000, 62419680000000,62451216000000,62498476800000,62530012800000, 62561548800000,62624707200000,62703676800000,62766835200000, 62798371200000,62845632000000,62877168000000,62908704000000, 62956137600000,63003398400000,63050832000000,63271756800000, 63366451200000,63476784000000] function setleaps(ms) # @leapstree as a macro ensures everything gets expanded # at compile time @leapstree(SETLEAPS,[1000:1000:((endof(SETLEAPS)-1)*1000)],ms) end

I tried to add useful comments so you can follow what's going on above, but basically we have our setleaps function that will determine how many leap seconds need to be corrected given a totalms value. setleaps is built at compile time by expanding the macro leapstree which manually builds a binary search tree given the SETLEAPS array of leap second instants and the return values for each slot. The binary search tree is built primarily by searchsortedlasttree , which is similar to the searchsortedlast implementation, except instead of returning values, it returns expressions. It is also built recursively to handle any depth of tree.

Hmmm....seems like a bit of hassle, does it really improve things much? Well, for one thing, the above code is robust, I've written it once and don't have to do anything more except tack on the next leap second to SETLEAPS the next time one is announced. Our manually built binary search tree will be rebalanced automatically. What about performance?

julia> @time [searchsortedlast(SETLEAPS, 63366451200000) for i = 1:10000000]; elapsed time: 0.50351136 seconds (80000048 bytes allocated) julia> @time [setleaps(63366451200000) for i = 1:10000000]; elapsed time: 0.067922885 seconds (80000048 bytes allocated)

Boom! Almost 10x!

Anyway, I thought this was an interesting application of meta-programming in Julia where we're basically taking static data and generating an operation on a data structure in the code itself. A recent thread in the Julia-Users forum asked about the long-term hopes of Julia achieving true C/Fortran performance (currently usually 1.2-2x). I think another strong point to the argument of Julia vs. C/Fortran is what I've shown above. Powerful language design choices like strong meta-programming functionality allows you to do things in Julia that would be almost impossible in many other languages, but that also allow for unique performance gains.

Stay tuned for more adventures in Julia land and maybe for my next post, I'll talk a little more about Julia's code introspection tools and how they can be so darn handy while developing. As a teaser, here's the generated code for our setleaps method above: