Mac OS X’s strftime(3) function has a significant problem with timezones. Intriguingly this very problem is traced back to FreeBSD and tzcode, from which the code in question originates, and due to this problem tzcode fails to process its companion database, tzdata! I’m going to report this bug anyway but it would make a good journal entry…

Start with tzdata. Tzdata is a public-domain timezone database used in the most operating systems including Linux, FreeBSD and Mac OS X. It is kept up to date, and it even tries to record even obscure historic timezone data: for instance you can see solar87 through solar89 files in tzcode, which reflects a Saudi Arabian policy of using mean solar time from 1987 to 1989. (To be exact, it is not in tzdata since it was too large and obsolete.) The ubiquity of tzdata means an error in tzdata file leads to the disaster.

FreeBSD libc chose to reuse tzcode to parse tzdata. It was in the public domain anyway and written by the same guy (Arthur David Olson) who founded the tzdata. Glibc also used the same code, but it was eventually deviated from the original code. And Mac OS X, or more precisely, Darwin picked up the code and slightly modified it to conform UNIX'03. The problem was, however, tzcode had a potentially wrong code. In Mac OS X:

$ TZ=Asia/Seoul date +'%Y-%m-%d %H:%M:%S %z %Z' 2010-07-26 00:36:24 +0800 KST $ TZ=Asia/Tokyo date +'%Y-%m-%d %H:%M:%S %z %Z' 2010-07-26 00:36:28 +0900 JST

You don’t need to look up the actual timezone of Korea and Japan to understand this bug. (Both countries use UTC+9, by the way.) But you can see the time and timezone are in disagreement, and in fact %z formatter is incorrect. This is caused by localtime.c (See FreeBSD version and Darwin version) in tzcode, which reads:

static void settzname(void) { /* snip */ for (i = 0; i < sp->typecnt; ++i) { register const struct ttinfo * const ttisp = &sp->ttis[i]; /* snip */ if (i == 0 || !ttisp->tt_isdst) timezone = -(ttisp->tt_gmtoff); } /* snip */ }

A compiled tzdata file is a binary file, which consists of a list of (transition time, new timezone rule) pairs and a list of actual rules. For example if the country X changes a timezone from UTC+8:30 to UTC+9 by July 2010, the transition list will have a entry (2010-07-01, NewRuleID) and the rule list will have a entry (NewRuleID, +540 minutes, “X Standard Time”). While the rule list do not need to be sorted (the transition list has to be) the code above assumes the last non-daylight-saving rule is the most recent rule, but this assumption failed in Asia/Seoul timezone. If you have MySQL you can easily verify this fact by using mysql_tzinfo_to_sql command.

Tzcode originally altered the timezone variable only if System V compatibility option is on. But many programs do not rely on this variable due to its variability; in fact more robust but non-portable way is to retrieve tm_gmtoff field using localtime(3). Therefore the bug was safely ignored in FreeBSD, but Darwin has modified its strftime(3) implementation to conform UNIX’03, which innocently replaced tm_gmtoff field with the current timezone, and exhibits this bug.

This weird bug seems to be first identified from the ruby spec test, since Ruby 1.8 has used the system strftime(3) routine. Personally I believed that the bug is Apple’s fault, but got a bit shocked when Olsen made the bug. So it is a snowball bug – if you cannot catch the bug at first glance, it would keep growing and hit someone else’s back.

Postscript: So did glibc avoid this bug? They did, and they show a reason that the complete rewriting is sometimes good. Well, not always good.

Update (2010-07-28): The fix for this bug is now available in tz list. I’ve also reported it to Apple.