API call hooks for fun and profit

Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding, hooks

API call hooks are a technique that's reasonably well documented for App Engine - there's an article about it - but only lightly used. Today we'll cover some practical uses of API call hooks.

To start, let's define a simple logging handler. This can be useful when you're seeing some odd behaviour from the datastore, especially when you're using a library that may be modifying how it works. You can also use it to log any other API, such as the URLFetch or Task Queue APIs. For this, we just need a post-call hook:

def log_api_call(service, call, request, response): logging.debug("Call to %s.%s", service, call) logging.debug(request) logging.debug(response)

To install this for the datastore, for example, we call this:

apiproxy_stub_map.apiproxy.GetPostCallHooks().Append( 'log_api_call', log_api_call, 'datastore_v3')

The arguments to Append are, in order, a name for the hook, the hook function itself, and, optionally, the service you want to hook. If you leave out the last argument, the hook is installed for all API calls.

Once a hook is installed, it remains installed as long as the runtime is loaded - including across requests, and for requests to different handlers. In order to make sure an API call hook is always available, the handler needs to be installed during the first request to a new runtime. The easiest way to do this is to install the handler at the top level of a module (outside any functions) and import that handler in your request handler module.

This is all very well, but still not particularly interesting. Let's implement something more interesting: A simple multi-tenant system, for apps that want to serve off multiple domains. When writing a multi-tenant app, it's important to segregate data owned by different users - one slip and you could leak data to the wrong domain. Using API hooks makes it possible to automatically modify datastore operations to include the domain the request was made against.

In order to simplify things, we'll encapsulate all this in a generic 'hooking' class:

class HookHandler(object): pre_call_hooks = {} post_call_hooks = {} @classmethod def install(cls, service): handler = cls(service) apiproxy_stub_map.apiproxy.GetPreCallHooks().Append( cls.__name__, handler.handle_pre_call_hook, service) apiproxy_stub_map.apiproxy.GetPostCallHooks().Append( cls.__name__, handler.handle_post_call_hook, service) def __init__(self, service): self.service = service def handle_pre_call_hook(self, service, call, request, response): assert service == self.service hook_func = self.pre_call_hooks.get(call) if hook_func and not os.environ['PATH_INFO'].startswith('/_ah'): hook_func(self, service, call, request, response) def handle_post_call_hook(self, service, call, request, response): assert service == self.service hook_func = self.post_call_hooks.get(call) if hook_func and not os.environ['PATH_INFO'].startswith('/_ah'): hook_func(self, service, call, request, response)

Note how, in handle_pre_call_hook and handle_post_call_hook, we check to see if the URL starts with '/_ah'. If we don't, our hooks will do their magic even when using the local admin console!

To use this class, we subclass it, defining a set of hook functions, and provide a dict of those hook functions. There's three things we need to hook in order to provide isolation between tenants:

Add the current domain to any entities being stored.

Add an equality filter for the current domain to query and count requests.

Check the domain is correct on any get requests.

class MultiTenantHookHandler(HookHandler): _DOMAIN_PROPERTY_NAME = '_domain' def domain(self): """Returns the domain for the current request.""" return os.environ['HTTP_HOST'] def set_domain_property(self, entity): """Sets the domain property on an entity.""" for i in range(entity.property_size()): property = entity.mutable_property(i) if property.name() == self._DOMAIN_PROPERTY_NAME: property.clear_value() break else: property = entity.add_property() property.set_name(self._DOMAIN_PROPERTY_NAME) property.set_multiple(False) property.mutable_value().set_stringvalue(self.domain()) def get_domain_property(self, entity): """Checks that an entity has the domain property set to the correct value.""" if entity: for property in entity.property_list(): if property.name() == self._DOMAIN_PROPERTY_NAME: return property.stringvalue() return None def pre_put(self, service, call, request, response): """Add the domain property to entities before they're stored.""" for i in range(request.entity_size()): entity = request.mutable_entity(i) self.set_domain_property(entity) def pre_query(self, service, call, request, response): """Add a filter to queries before they're executed.""" domain_filter = request.add_filter() domain_filter.set_op(datastore_pb.Query_Filter.EQUAL) property = domain_filter.add_property() property.set_name(self._DOMAIN_PROPERTY_NAME) property.mutable_value().set_stringvalue(self.domain()) def post_get(self, service, call, request, response): """Makes sure all fetched entities are in the appropriate domain.""" our_domain = self.domain() for entity in response.entity_list(): domain = self.get_domain_property(entity.entity()) if domain and domain != our_domain: raise Exception( "Domain '%s' attempted to read an object belonging to domain '%s'", our_domain, domain) pre_call_hooks = { 'Put': pre_put, 'Query': pre_query, 'Count': pre_query, } post_call_hooks = { 'Get': post_get, }

A lot of the code here is dedicated to dealing with the structure of the request and response objects, which are Protocol Buffers. You can determine what fields are available and what they do by looking at the compiled Protocol Buffer definitions in the SDK, such as datastore_pb.py and entity_pb.py, or by looking at the definition files. A few copies of these are available online in various third-party projects that interact with App Engine, such as in BDBDatastore.

Entities, when encoded in Protocol Buffers, have their properties stored as a list of key/value pairs, so to get or check values, we have to iterate over all of them, looking for the value we need - hence the loops over properties in the code above. The dict-like and object-like functionality you're used to with the datastore is part of the higher-level API. Other than the boilerplate, the methods above are fairly straightforward, carrying out the three functions we outlined earlier. Note that we can define a single pre_query method for both Query and Count requests, because they both have the same Request object - a Query Protocol Buffer.

Installing this hook is a simple matter of calling the install() method at import time:

MultiTenantHookHandler.install('datastore_v3')

Obviously, a complete multi-tenancy library would require a lot more functionality than what we've implemented here - but this is the core functionality, and all in less than 100 lines of code.

One more example: Suppose you're tired of the default URLFetch timeout being 5 seconds, and you want it to be the maximum of 10 seconds. Then you could do something like this:

class UrlFetchHookHandler(HookHandler): def pre_fetch(self, service, call, request, response): if not request.has_deadline(): request.set_deadline(10.0) pre_call_hooks = { 'Fetch': pre_fetch } UrlFetchHookHandler.install('urlfetch')

There are many more uses for API call hooks. They can be used to collect statistics, to perform validation and access checks, to accumulate data for usage-based billing, and to modify other API behaviours. If you have your own ideas on what they could be used for, let us know in the comments.

Hooked yet?

Disqus