Breaking things down to Atoms

Posted by Nick Johnson | Filed under pubsubhubbub, app-engine, email, datastore, atom

Over the last few years, more and more sites have been providing RSS and Atom feeds. This has been a huge boon both for keeping up to date with content through feed readers, and for programmatically consuming data. There are, inevitably, a few holdouts, though. Notable amongst those holdouts -and particularly relevant to me at the moment - are property listing sites. Few, if any property listing sites provide any sort of Atom or RSS feed for listings, let alone a API of any kind.

To address this, I decided to put together a service for turning other types of notification into Atom feeds. I wanted to make the service general, so it can support many different formats, but the initial target will be email, since most property sites offer email notifications. The requirements for an email to Atom gateway are fairly straightforward:

Incoming email should be converted to entries in an Atom feed in such a fashion as to be easily interpretable in a feed reader

As much of the original message and metadata as possible should be preserved in the Atom feed entry

It should be possible to access the original message if desired

In addition, I had a few of my own requirements for the service:

Users should be able to generate new feeds with as little effort as possible, with or without authenticating against the site first.

Users should be able to choose the email alias, or accept a randomly generated one.

There should be no way to determine the URL of the atom feed given the email address, or vice-versa.

The system should be easily extensible to other types of incoming notification, such as XMPP or proprietary APIs.

The service should support Hubbub notifications for real-time service.

Naturally, my platform of choice was App Engine. Let's start by defining a sensible data model:

class Feed(db.Model): """Represents a feed that accumulates received messages.""" title = db.StringProperty() created = db.DateTimeProperty(required=True, auto_now_add=True) owner = db.UserProperty() @classmethod def create(cls, owner=None): """Creates a new, randomly named, feed.""" feed = cls(key_name=os.urandom(8).encode('hex'), owner=owner) return feed class Alias(db.Model): """Represents an email alias that can accept incoming mail.""" feed = db.ReferenceProperty(required=True) created = db.DateTimeProperty(required=True, auto_now_add=True) owner = db.UserProperty() @classmethod @transactional def create(cls, name, feed, owner=None): """Creates a new alias. If an aliaswith the provided name already exists, returns None. """ alias = cls.get_by_key_name(name) if not alias: alias = cls( feed=feed, key_name=name, owner=owner) alias.put() return alias else: return None @classmethod def create_random(cls, feed, owner=None): """Creates a new randomly named alias.""" mapping = None while not mapping: name = base64.b32encode(os.urandom(5)).lower() mapping = cls.create(name, feed, owner) return mapping class Message(polymodel.PolyModel): """Represents a message received for a feed.""" created = db.DateTimeProperty(required=True, auto_now_add=True) feed_name = property(lambda self: self.key().parent().name()) message_id = property(lambda self: self.key().id())

Here you see our three basic entity types: Feed, which represents an Atom feed; Alias, which represents an email alias, and Message, which represents a received message (of any type). The reason we've separated these out is twofold: First, it makes it possible to look up either one by key, rather than having to do a query. Second, it allows for later expansion to other feed types, or to multiple email aliases delivering to the same feed.

Note that Feed and Alias provide convenient methods for generating new ones with random identifiers; Alias also allows you to specify a user-selected identifier. Both classes also have optional UserProperty instances for the owner; making them optional allows us to generate 'anonymous' mappings for users who don't want to log in. Finally, note that Message is a PolyModel, this will allow us to define different types of message (eg, Email, XMPP, etc) independently. Let's define our only subclass so far, the email message:

class EmailMessage(Message): """Represents an email message.""" body = db.BlobProperty(required=True) def __init__(self, *args, **kwargs): super(EmailMessage, self).__init__(*args, **kwargs) self.message = mail.InboundEmailMessage(self.body) @classmethod def create(cls, feed, body): ret = cls(parent=feed, body=body) ret.put() return ret title = property(lambda self:self.message.subject) author_email = property(lambda self:self.message.sender) @property def content_type(self): if list(self.message.bodies('text/html')): return 'html' else: return 'text' @property def content(self): for content_type, body in self.message.bodies('text/html'): return body.decode() for content_type, body in self.message.bodies('text/plain'): return body.decode() return '' original = property(lambda self:self.body) original_content_type = 'message/rfc822' published = property(lambda self:self.created)

The EmailMessage mostly consists of a wrapper around the message body, stored in its original format. We (very carefully) override the constructor to decode the message when the entity is created or loaded, and define several properties that access subfields of the message.

Next, let's move on to the core functionality: receiving and processing email, and generating the Atom feed. First, receiving email. For that, we use App Engine's incoming email support, which requires us to register a handler for the reserved URL path /_ah/mail/(address), where (address) is the recipient address. Here's our incoming mail handler in its entirety:

class EmailHandler(webapp.RequestHandler): def post(self, name): alias = models.Alias.get_by_key_name(name) if alias: feed_key = models.Alias.feed.get_value_for_datastore(alias) message = models.EmailMessage.create(feed_key, self.request.body) # Delete any old messages defer(delete_old_messages, feed_key) # Schedule a hub ping defer(send_hubbub_ping, feed_key) else: self.error(404)

All we do here is fetch the alias based on the user part of the email address the message was sent to; use that to obtain the key for the feed, and create a new EmailMessage with the body of the message. Then, we start a deferred task to clean up any email messages that are too old to remain in the feed, and another one to ping the PubSubHubbub hub to let it know there is new data.

Next, feed generation. Here's the handler that deals with that:

class FeedHandler(BaseHandler): def get(self, feed_name): feed = models.Feed.get_by_key_name(feed_name) q = models.Message.all().ancestor(feed).order('-created') entries = q.fetch(10) self.render_template('feed.xml', { 'feed': feed, 'entries': entries, 'self_url': self.request.url, 'host_url': self.request.host_url, 'updated': max(x.created for x in entries) if entries else feed.created, })

This should also be fairly easy to follow. We fetch the feed based on the key name provided in the URL, then use that to fetch the 10 most recent messages posted to that feed. We then simply pass the feed, the messages, and some additional information to the template system to render the feed.

The only part of the system remaining is the user interface, to allow users to create new mappings, and view existing ones (if they created them when logged in). The handler for listing mappings is straightforward, so we'll just examine the create handler:

class CreateHandler(BaseHandler): def post(self): user = users.get_current_user() feed = models.Feed.create(owner=user) alias_name = self.request.POST.get('address') if alias_name: alias = models.Alias.create(alias_name, feed, user) else: alias = models.Alias.create_random(feed, user) if alias: feed.title = 'Atom feed for %s@atomify.appspotmail.com' % alias.name feed.put() self.render_template('created.html', { 'alias': alias, 'feed': feed, }) else: self.render_template('already_exists.html', { 'address': alias_name, })

Here you can see the methods we defined in the Feed and Alias models in use. We create a new, randomly named feed first; then we create an Alias, either with a name supplied by the user, if any, or with a random name. Once we've created both, we store them to the datastore and show the user a success page. If the user supplied an alias themselves, and it's already in use, we show an error page instead.

That's the core functionality of the project; you can see it in action and try it out for yourself at atomify.appspot.com, and the full source is on github. Incidentally, I wanted to call it 'atomizer', but that name is already taken, more's the pity.

Disqus