Calendaring events with Python

Published on 2017-04-13 , under Programming, tagged with python and timezones.

2012 passed and the world did not end. Well, scheduling the next world's doomsday can be tricky, because working with dates is so. In this post I'll try to share some basic tips and gotchas I learned the hard way while building a calendaring app, that hopefully will make this task easier for you.

One time events¶

Because of the way we track time, a single moment can happen at different times of the day for different people around the globe and a single date and time can never happen or happen twice thanks to DST or government regulations . Yes, working with dates is hard.

Dates mean nothing without the proper context, and that context is provided by what it is called a timezone.

A timezone doesn't mean just the offset, because that offset can change. Instead they have names, and are attached to a geographical location, under a certain jurisdiction, unless we are speaking of the UTC timezone, which is a very special one.

We should think UTC time as the entire world's current time you compare all other dates with. It is meant to be objective in the sense that it never was and won't ever be affected by local time changes.

Timezone information is available in the Olson database. Since timezones change every now and then, it is vital to keep your software up to date . If that is something you cannot control, be advised that the dates your system is dealing with might not be correct (i.e. in an embedded device).

Updates can be handled differently depending on the OS, program or language you use. For example, in Linux there is a package tzdata , but some programs like the browser or Postgres contain their own copies .

Python provides naive dates by default through it's datetime module, and it is someone else's responsibility to provide tz information, for example, pytz is a library that "brings the Olson tz database into Python" providing timezone classes to use with datetime objects.

>>> datetime . datetime . now () datetime . datetime ( 2017 , 4 , 4 , 10 , 36 , 57 , 800151 ) >>> datetime . datetime . utcnow () datetime . datetime ( 2017 , 4 , 4 , 13 , 37 , 1 , 833276 ) >>> pytz . utc . localize ( datetime . datetime . utcnow ()) datetime . datetime ( 2017 , 4 , 4 , 13 , 37 , 44 , 500463 , tzinfo =< UTC > )

Some attempts have been made for providing extra help to disambiguate naive dates , but still, it is highly recommended that you convert dates to UTC as soon as they enter the system and work with them that way for calculations and queries. Despite the fact that you can take naive dates as being UTC implicitly, I would suggest to still attach the UTC tz to them . This way, all the information is there, and it becomes easier to reason about dates.

For example, when logging events, you can see logs that have the same date, that would seem to be duplicated because of DST. Instead if you have them in UTC and in the ISO format, there is no place left for confusion.

'2002-10-27T01:30:00' # no timezone attached '2002-10-27T01:30:00' '2002-10-27T01:30:00-04:00' # with timezone attached '2002-10-27T01:30:00-05:00' '2002-10-27T05:30:00+00:00' # in UTC, evidently there is an hour difference '2002-10-27T06:30:00+00:00'

Even more, a sever could fire repeated crons or skip them if not configured to use UTC due to a DST switch.

When doing calculations, despite the fact that you can manipulate aware dates transparently the math is evident for the programmer if those dates are in UTC:

>>> buenos_aires datetime . datetime ( 2017 , 4 , 5 , 2 , 0 , tzinfo =< DstTzInfo 'America/Buenos_Aires' - 03 - 1 day , 21 : 00 : 00 STD > ) >>> madrid datetime . datetime ( 2017 , 4 , 5 , 6 , 0 , tzinfo =< DstTzInfo 'Europe/Madrid' CEST + 2 : 00 : 00 DST > ) >>> buenos_aires - madrid datetime . timedelta ( 0 , 3600 ) # mmm... why? >>> pytz . utc . normalize ( buenos_aires ) datetime . datetime ( 2017 , 4 , 5 , 5 , 0 , tzinfo =< UTC > ) >>> pytz . utc . normalize ( madrid ) datetime . datetime ( 2017 , 4 , 5 , 4 , 0 , tzinfo =< UTC > ) # a one hour diff, obvi! >>> pytz . utc . normalize ( buenos_aires ) - pytz . utc . normalize ( madrid ) datetime . timedelta ( 0 , 3600 ) # yeah! same results

Moreover, event durations can be counter-intuitive if you don't keep in mind the in-between jumps of DST:

>>> eastern = pytz . timezone ( 'US/Eastern' ) >>> loc_dt = datetime . datetime ( 2002 , 10 , 27 , 1 , 30 , 00 ) # date occured twice >>> end = eastern . localize ( loc_dt , is_dst = False ) # notice the is_dst flag >>> end . isoformat () '2002-10-27T01:30:00-05:00' >>> start = eastern . localize ( loc_dt , is_dst = True ) >>> start . isoformat () '2002-10-27T01:30:00-04:00' >>> end - start datetime . timedelta ( 0 , 3600 ) # same date, time and tz, but different offset

If your are rendering these kind of events in some sort of calendar, you'll have to decide if dates or duration is what determines how to represent this event in a slot. And when building these dates, the user needs to disambiguate them explicitly, providing the is_dst flag.

Also, down the line of doing calculations in local timezones, we can see that when adding timedeltas to a datetime aware object, you may end up with the wrong result.

>>> # Sunday, 7 April 2002, 02:00:00 clocks were turned forward 1 hour to >>> # Sunday, 7 April 2002, 03:00:00 local daylight time instead >>> eastern = pytz . timezone ( 'US/Eastern' ) >>> loc_dt = datetime . datetime ( 2002 , 4 , 7 , 2 , 0 , 0 ) >>> edt_dt = eastern . localize ( loc_dt ) >>> est_dt = edt_dt + datetime . timedelta ( hours = 1 ) >>> edt_dt . isoformat () '2002-04-07T02:00:00-05:00' >>> est_dt . isoformat () '2002-04-07T03:00:00-05:00' # mmm they have the same offset, this is odd >>> eastern . normalize ( est_dt ) . isoformat () '2002-04-07T04:00:00-04:00' # this is what I expected

Last, but not least, remember to never use replace() for attaching timezones. Otherwise you will very probably end up with the wrong date as a result. Use pytz's normalize() and localize() methods instead, since they use the tz table for convertions.

>>> dt = datetime . datetime ( 2002 , 4 , 7 , 2 , 30 ) # never existed in US/Eastern >>> dt . replace ( tzinfo = eastern ) datetime . datetime ( 2002 , 4 , 7 , 2 , 30 , tzinfo =< DstTzInfo 'US/Eastern' LMT - 1 day , 19 : 04 : 00 STD > ) # what? >>> eastern . normalize ( eastern . localize ( dt )) datetime . datetime ( 2002 , 4 , 7 , 3 , 30 , tzinfo =< DstTzInfo 'US/Eastern' EDT - 1 day , 20 : 00 : 00 DST > ) # much better

Moving on, now that we know that UTC aware dates everywhere is the way to go, there are some extra details to pay attention to:

It is a good idea to store the user's timezone, so that you are able to format dates in case you don't trust the clients ability to display them correctly (due to an outdated db on their side, most likely, or just emails). If events are attached to a certain location, like a flight for instance, and that location changes it's timezone, then we need to recalculate all scheduled dates for that location and notify users about it. Simply storing these dates in UTC is not enough.

So storing the timezone of origin as way to get back to and from UTC is important.

Recurring events¶

For generating a series of events you should use the dateutils.rrule package , which allows a great deal of configuration and manages corner cases like: every last day of the month.

But when it comes to creating recurring events, say every Monday at 11:00 am, the user wants those dates to always stay at 11:00 am even if there is a DST switch at some point.

The procedure is perfectly explained here and involves naive dates on purpose!

We first have to generate the occurrences regardless or the timezone settings, all at the same time. Because of the way this lib works (basically by adding timedeltas), it is that we need to feed it with naive start and/or end dates:

>>> start = datetime . datetime ( 2014 , 2 , 22 , 11 , 0 ) # Feb 22 >>> end = datetime . datetime ( 2014 , 3 , 24 , 0 , 0 ) # March 24 >>> list ( rrule ( WEEKLY , dtstart = start , until = end , byweekday = ( MO ,))) [ datetime . datetime ( 2014 , 2 , 24 , 11 , 0 ), datetime . datetime ( 2014 , 3 , 3 , 11 , 0 ), datetime . datetime ( 2014 , 3 , 10 , 11 , 0 ), datetime . datetime ( 2014 , 3 , 17 , 11 , 0 )] # all at the same time

Now we will attach the user's timezone to these dates and normalize them to UTC. You can see that the change happens on the stored dates, but the time the user will see in their local timezone stays intact.

>>> tz = pytz . timezone ( 'America/Chicago' ) # observes DST switch on March 9 >>> localized = [ tz . localize ( dt ) for dt in dates ] >>> for dt in localized : print 'Central: {} ; UTC: {} ' . format ( dt , dt . astimezone ( pytz . utc )) 'Central: 2014-02-24 11:00:00-06:00; UTC: 2014-02-24 17:00:00+00:00' 'Central: 2014-03-03 11:00:00-06:00; UTC: 2014-03-03 17:00:00+00:00' 'Central: 2014-03-10 11:00:00-05:00; UTC: 2014-03-10 16:00:00+00:00' 'Central: 2014-03-17 11:00:00-05:00; UTC: 2014-03-17 16:00:00+00:00'

You should also set the is_dst flag in when calling localize() if needed.

Quering events¶

When retriving entries for a given period, we need to think in buckets determined by the user's localized start and end boundaries for that period.

For example, if we want today's entries, for a user with an offset of UTC-3, it is important to request those dates in the UTC version of the user's 00:00 to 23:59 time lapse. Today is relative to the timezone the user is currently at, and using UTC's today is not an option since it's going to get us entries from yesterday's 21:00 to today's 20:59 potentially including or excluding incorrect results. We need to normalize the user's date range to UTC:

UTC offset Day start Day end Meaning -03:00 00:00 23:59 User's date range +00:00 00:00 23:59 UTC's date range -03:00 21:00 20:59 UTC's date range compared to user's +00:00 03:00 02:59 User's date range normalized with UTC [✓]

The following piece of code shows exactly how to query today's entries for a user. See how time.min and time.max come in handy when calculating the start and end boundaries for a date:

def today_only (): # datetime.datetime.combine returns naive dates :( local_now = user_tz . normalize ( datetime . utcnow () . replace ( tzinfo = utc )) local_today_min = user_tz . localize ( datetime . combine ( local_now , time . min )) local_today_max = user_tz . localize ( datetime . combine ( local_now , time . max )) today_min = utc . normalize ( local_today_min ) today_max = utc . normalize ( local_today_max ) return events . filter ( start >= today_min , end <= today_max )

For next/previous month or next/previous week you should use rrule which can give you those dates, since the math is not as simple as adding 30 days to get the next month's results (not every month has the same length) or calculating when the next week starts. Then you just have to follow the same aforementioned approach.

Quering past or future events, with that sole condition, is a different story. Since dates are typically stored in UTC and we care about entries before or after now, it doesn't matter which timezone the user is at, now represents the same moment in any timezone, so we don't have to translate the user's now to UTC's now.

# Assuming event dates are stored in UTC (like postgres does). # These convertions might not be needed depending on the storage engine # or frameworks you are using. def past (): now = datetime . utcnow () . replace ( tzinfo = utc ) return events . filter ( end <= now ) def future (): now = datetime . utcnow () . replace ( tzinfo = utc ) return events . filter ( start >= now )

Be extra carefull if you are caching these results afected by a date range, and make sure that the timezone is part of the cache's key. This way, if the timezone changes, because the user is on a trip for instance, the cache gets invalidated automatically for that user. Otherwise this week's promotions won't be correctly applied to customers in the case of an e-commerce site for example.

Notification events¶

When it comes to scheduling events like digest emails of notifications/news, lists of pending tasks, aggregated activities, etc, you will also need to generate a series based on the user preferences for when to receive them.

You guessed it, rrule again to the rescue! But having all future occurrences generated in advance is wasteful.

In this use case you only care about the next recurrence after now. Every now and then (i.e. every minute) you poll all scheduled reminders that expired, execute the task and calculate the next occurrence with a cron-like job .

user_tz = pytz . timezone ( 'US/Eastern' ) now = datetime . datetime . utcnow () days = [ rrule . MO , rrule . WE ] rule = rrule . rrule ( rrule . WEEKLY , dtstart = now , byweekday = days , byhour = 8 , byminute = 30 ) # Get the fist recurrence right after "now" next_notification = user_tz . localize ( rule . after ( now ))

In case the user's timezone changes, remember to recalculate next occurrence.

Timezones are like variables. They have a name and a value (the UTC offset), that changes over time thanks to some rules defined in the timezone's database.

Dates without timezones don't really represent any moment in particular.

Always use tz aware dates and specifically UTC aware dates inside your program, but keep a reference to a local timezone that makes sense in case you need to retrace changes.

For all this, it is vital to stay up to date with tz updates.

I hope you found this post useful. My idea was make it a compendium of all things related to dates I have read about, and had to work with in Python. So I suggest you to read all linked pages, they are there for a reason!