Python time-zone handling

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Handling time zones is a pretty messy affair overall, but language runtimes may have even bigger problems. As a recent discussion on the Python discussion forum shows, there are considerations beyond those that an operating system or distribution needs to handle. Adding support for the IANA time zone database to the Python standard library, which would allow using names like "America/Mazatlan" to designate time zones, is more complicated than one might think—especially for a language trying to support multiple platforms.

It may come as a surprise to some that Python has no support in the standard library for getting time-zone information from the IANA database (also known as the Olson database after its founder). The datetime module in the standard library has the idea of a "time zone" but populating an instance from the database is typically done using one of two modules from the Python Package Index (PyPI): pytz or dateutil . Paul Ganssle is the maintainer of dateutil and a contributor to datetime ; he has put out a draft Python Enhancement Proposal (PEP) to add IANA database support as a new standard library module.

Ganssle gave a presentation at the 2019 Python Language Summit about the problem. On February 25, he posted a draft of PEP 615 ("Support for the IANA Time Zone Database in the Standard Library"). The original posted version of the PEP can be found in the PEPs GitHub repository. The datetime.tzinfo abstract base class provides ways "to implement arbitrarily complex time zone rules", but he has observed that users want to work with three time-zone types: fixed offsets from UTC, the system time zone, and IANA time zones. The standard library supports the first type with datetime.timezone objects, and the second to a certain extent, but does not support IANA time zones at all.

There are some wrinkles to handling time zones, starting with the fact that they change—frequently. The IANA database is updated multiple times per year; "between 1997 and 2020, there have been between 3 and 21 releases per year, often in response to changes in time zone rules with little to no notice". Linux and macOS have packages with that information which get updated as usual, but the situation for Windows is more complicated. Beyond that, there is a question of what should happen in a running program when the time-zone information changes out from under it.

The PEP proposes adding a top-level zoneinfo standard library module with a zoneinfo.ZoneInfo class for objects corresponding to a particular time zone. A call like:

tz = zoneinfo.ZoneInfo("Australia/Brisbane")

zoneinfo.TZPATH

will search for a corresponding Time Zone Information Format (TZif) file in various locations to populate the object. Thelist will be consulted to find the file of interest.

On Unix-like systems, that variable will be set to a list of the standard locations (e.g. /usr/share/zoneinfo , /etc/zoneinfo ) where the time-zone data files are normally stored. On Windows, there is no official location for the system-wide time-zone information, so TZPATH will initially be empty. The PEP proposes that a data-only tzdata package be created for PyPI that would be maintained by the CPython core developers. That could be used on Windows systems to provide a source for the IANA database information.

By default, ZoneInfo objects would effectively be singletons; a cache would be maintained so that repeated uses of the same time-zone name would return the exact same object. That is not specifically being done for efficiency reasons, but to ensure that times in the same time zone will be handled correctly. The existing datetime arithmetic operations only consider time zones to be equal if they are the same object, not just if they contain the same information. But caching also protects running programs from strange behavior if the underlying time-zone data changes. Effectively, the data will be read once, on first use, and never change again until the interpreter is restarted.

There is support for loading time zones without consulting (or changing) the cache, as well as for clearing the cache, which would effectively reload the time zone for any new ZoneInfo object. But getting updates to time zones mid-stream is problematic in its own right, Ganssle said:

object identity rather than object equality just adds to the edge cases that are possible. In the end, always getting “the latest data” is fraught with edge cases anyway, and the fact that datetime semantics rely onrather thanjust adds to the edge cases that are possible. I will note that there is some precedent in this very area: local time information is only updated in response to a call to time.tzset() , and even that doesn’t work on Windows. The equivalent to calling time.tzset() to get updated time zone information would be calling ZoneInfo.clear_cache() to force ZoneInfo to use the updated data (or to always bypass the main constructor and use the .nocache() constructor).

But Florian Weimer was concerned that users would want those time-zone updates to automatically be incorporated, so he sees the caching behavior as problematic. "I do not think that users would want to restart their application (with a scheduled downtime) just to apply one of those updates." Ganssle acknowledged the concern, "but there are a lot of reasons to use the cache, and good reasons to believe that using the cache won’t be a problem". He went on to note that both pytz and dateutil already behave this way and he has heard no complaints. He also gave an example of surprising behavior without any caching:

>>> from datetime import * >>> from zoneinfo import ZoneInfo >>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York")) >>> dt1 = dt0 + timedelta(1) >>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York")) >>> dt2 == dt1 True

Each call to ZoneInfo.nocache() will return a different object, even if the time-zone name is the same. So dt1 and dt2 have the same time-zone information, but different ZoneInfo objects. The two datetime objects compare "equal" ( == ) because they represent the same "wall time", but that does not mean that arithmetic operations will behave as one might expect:

>>> print(dt2 - dt1) 0:00:00 >>> print(dt2 - dt0) 23:00:00 >>> print(dt1 - dt0) 1 day, 0:00:00

March 8, 2020 is the day of the daylight savings time transition in the US, so adding one day (i.e. timedelta(1) ) crosses that boundary. In a followup message, he explained more about the oddities of datetime math that are shown by the example:

wall time is 24 hours, but the absolute elapsed time is 23 hours. This is because there’s an STD->DST transition between 2020-03-08 and 2020-03-09, so the difference inis 24 hours, but theis 23 hours. [...] So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time. dt1 will necessarily be the same zone as dt0 , because it’s the result of an arithmetical operation on dt0 . dt2 is a different zone because I bypassed the cache, but if it hit the cache, the two would be the same.

Using the pickle object-serialization mechanism on ZoneInfo objects was also discussed. The PEP originally proposed that pickling a ZoneInfo object would serialize all of the information from the object (e.g. all of the current and historical transition dates), rather than simply serializing the key (e.g. "America/NewYork"). Only serializing the key could lead to problems when de-serializing the object with a different set of time-zone data (e.g. the "Asia/Qostanay" time zone was added in 2018).

But, as pytz maintainer Stuart Bishop pointed out, serializing all of the transition data is likely to lead to other, worse problems:

If I serialize ‘2022-06-05 14:00 Europe/Berlin’ today, and deserialize it in two years time after Berlin has ratified EU recommendations and abolished DST, then there are two possible results. If my application requires calendaring semantics, when deserializing I want to apply the current timezone definition, and my appointment at 2pm in Berlin is still at 2pm in Berlin. Because I need wallclock time (the time a clock hung on the wall in that location should show). If I wanted a fixed timestamp, best practice is to convert it to UTC to avoid all the potential traps, but it would also be ok to deserialize the time using the old, incorrect offset it was stored with and end up with 1pm wallclock time. The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data.

Ganssle agreed that it makes more sense to pickle ZoneInfo objects "by reference" (i.e. by time-zone name), though providing a way to also pickle "by value" for those who need or want it would be an option. Guido van Rossum had suggested an approach where a RawZoneInfo class would underlie ZoneInfo objects. Pickling a RawZoneInfo could be done by value. Ganssle liked that idea but thought that it could always be added later if there was a need for it; dateutil.tz already gives the by-value ability, so that could be used in the interim if needed.