Modeling relationships in App Engine

Posted by Nick Johnson | Filed under python, app-engine, datastore, relational-modelling

One source of difficulty for people who are used to relational databases - and certain ORMs in particular - is how to handle references and relationships on App Engine. There's two basic questions here: First, what does a relationship entail, in any database system? And second, how do we use them in App Engine?

The nature of relationships

Many ORMs expose multiple 'types' of relationships as first-class entities - one-to-one, one-to-many, and many-to-many. This obscures the fact that, in reality, they are all built on the same building block, that of references. A reference is simply a field of an entity that contains the key of another entity - for example, if a Pet references an Owner, that simply means the Pet has a field that contains the key of its owner.

All relationship types simply devolve to references. A one-to-many relationship is a reference in its simplest form - each Pet has one Owner, and therefore an Owner can have multiple pets, all pointing to it. The Owner is not modified; it relies on the individual Pets naming it as the owner. One-to-one relationships are one-to-many relationships with the additional constraint that there will be only one Pet referencing each Owner; this is sometimes enforced by having each entity point to the other (using a reference in each).

Many-to-many relationships are the most complicated, but still fairly simple. There are several ways to represent such a relationship, but they all boil down to a list of pairs of references. Consider web pages: each page has many outgoing links, and many incoming links. We can represent this as a list of pairs of pages (from_url, to_url). In a relational database, this list is typically stored as a separate table, and joins are used to look up related records. We'll examine the best ways to do this on App Engine in a moment.

In summary, it often helps to discard the terminology of 'one to many' and so forth, and look at your entities from a more Object Oriented point of view. What other entities does each entity need to point to in order to model your data?

Relationships in App Engine

One to many

A one-to-many relationship is the simplest relationship to model in any system. On App Engine, this is accomplished by storing the key of the 'one' side in entities on the 'many' side. In Python, this is made simpler with the ReferenceProperty:

class Owner(db.Model): name = db.StringProperty() class Pet(db.Model): name = db.StringProperty() owner = db.ReferenceProperty(Owner)

To find a pet's owner, we simply refer to pet.owner, and App Engine automatically loads the owner entity being referred to. To find all the pets owned by an owner, we can do a query like this:

pets = Pet.all().filter('owner =', owner).fetch(100)

Even simpler, though, the ReferenceProperty automatically creates a convenient accessor on the Owner class, so we can fetch the list of pets like this:

pets = Owner.pet_set.fetch(100)

By default, the framework names this convenience method the same as the name of the property, followed by '_set', but we can change this by specifying our own name:

class Pet(db.Model): name = db.StringProperty() owner = db.ReferenceProperty(Owner, collection_name='pets') pets = owner.pets.fetch(100)

Another way we can model one to many relationships is with parent entities. Every entity in App Engine can have a parent specified when it is created; the key of the parent forms part of the key of the created entity, and cannot be modified. Here's how we'd do it with pets:

class Owner(db.Model): name = db.StringProperty() class Pet(db.Model): name = db.StringProperty() bob = Owner(name='Bob') felix = Pet(name='Felix', parent=bob) owner_of_felix = felix.parent

Note we no longer explicitly specify the relationship - it is implied by the parent entity being specified at creation time. When would you use parent entities instead of ReferenceProperties? The most important case is for transactions: On App Engine, transactions can only operate on entities in the same entity group - that is, a set of entities with the same parent. If you don't need the referencing and referenced entities to be in the same transactional domain, you probably want to use a ReferenceProperty instead. Also note that an entity may have only one immediate parent, and the parent entity's key cannot change after creation.

One to one

One to one relationships are essentially a special-case of one to many relationships. Like one to many relationships, they are usually accomplished with a ReferenceProperty on one side referring to the other entity.

Many to many

Many to many relationships are the most complex to implement in any system, and on App Engine, several solutions are available. The most obvious approach is the same as that used in relational databases: a 'join table', which contains pairs of keys from both sides of the relationship. Here it is for our pet/owner scenario:

class Owner(db.Model): name = db.StringProperty() class Pet(db.Model): name = db.StringProperty() class PetOwner(db.Model): pet = db.ReferenceProperty(Pet, collection_name='owners') owner = db.ReferenceProperty(Owner, collection_name='pets')

This is straightforward, and it's well suited if you want the relationship itself to have properties of its own - for example, when modeling links between web pages, you might want to include the link text in the relationship. Querying is a matter of retrieving the join entities, then looking up the linked entities from them. Using the reference property prefetching pattern from previous post, that looks something like this:

petowners = felix.owners.fetch(100) prefetch_refprops(owners, 'owner') owners = [x.owner for x in petowners]

Fetching entities in the other direction behaves similarly.

Another approach is to have one side of the relationship store a list of the keys of entities on the other side of the relationship. This makes the most sense when the cardinality on one side is limited (say, to a few hundred or less), when there is a natural 'owner' of the relationship, or when you want to be able to easily update the list of referenced entities in bulk. Here's an example:

class Pet(db.Model): name = db.StringProperty() class Owner(db.Model): name = db.StringProperty() pets = db.ListProperty(db.Key)

Loading the pets owned by an owner is accomplished in this fashion:

pets = db.get(bob.pets)

While finding all the owners of a pet is accomplished with a query, instead:

owners = Owner.all().filter('pets =', felix).fetch(100)

Finally, in some special cases, a hybrid approach is valuable. For a complete treatment of this, see Brett Slatkin's excellent talk, BuildingScalable, Complex Apps on App Engine.

Liked this article? Want to see more like it? Click on the 'feedback' button on the right and submit your own idea for a post, or upvote existing ones!

Disqus