New features in 1.3.1 prerelease: Cursors

Posted by Nick Johnson | Filed under python, coding, app-engine, tech, prerelease

Recently, the App Engine team announced that they'd be pre-releasing SDKs for testing and feedback, before they go live in production. With the first prerelease, 1.3.1, a number of new features are included in the SDK. Today we'll discuss cursors - how they work, and what they're useful for.

Cursors are a feature that many people have been waiting for with bated breath. As well as making pagination easier, they also provide a way around the "1000 result limit" that many people feel (in some cases correctly) makes it harder to achieve what they want to do on App Engine.

When it comes to investigating new features, there are two really useful tools: An interactive console - such as that on http://localhost:8080/_ah/admin/, http://shell.appspot.com/ or the remote_api console - and the source code. Many people forget that as an Open Source project, the App Engine SDK code is all available - and easily browseable on code.google.com.

Our first stop is google/appengine/ext/db/__init__.py. Of interest here is the cursor() method, which starts on line 1600. As you can see, when called on a query that's already been executed (with .fetch(), .get(), etc), it constructs a datastore_pb.CompiledQuery object, fills in its fields with information from the query, and returns the encoded Protocol Buffer, wrapped in base64 for easy transport. Let's give it a try in our interactive shell:

>>> class TestModel(db.Model): pass >>> db.put([TestModel() for x in range(100)]) >>> q = TestModel.all() >>> [x.key().id() for x in q.fetch(5)] [1, 2, 3, 4, 5] >>> q.cursor() 'CxoRY3Vyc29yPTUmb2Zmc2V0PTUgAAxgAA=='

About what we expected, given the source. How do we use it, though? The very next method after cursor() is with_cursor(), which, according to the docstring, will "set the start of this query to the given serialized cursor". Perfect! Let's give that a try, then:

>>> class TestModel(db.Model): pass >>> q = TestModel.all() >>> [x.key().id() for x in q.fetch(3)] [1, 2, 3] >>> r = TestModel.all().with_cursor(q.cursor()) >>> [x.key().id() for x in r.fetch(3)] [4, 5, 6]

Easy! This will work for any query at all, and makes it possible to pick up where you left off, simply by storing the cursor string, and using it in a subsequent query.

But what's in these mysterious query strings? Well, we already know they're constructed from datastore_pb.CompiledQuery protocol buffers. Let's write a function that'll let us peek inside one:

import base64 from google.appengine.datastore import datastore_pb def cursor_to_ascii(cursor): pb = datastore_pb.CompiledQuery(base64.urlsafe_b64decode(cursor)) return pb.ToASCII()

Using it on our earlier cursor, we get:

PrimaryScan { start_key: "cursor=5&offset=5" start_inclusive: false } keys_only: false

As you can see, the internals of a cursor store pretty much the same information you'd store if you were doing pagination yourself - only, with the datastore doing it for you, everything is much easier, and likely more efficient to boot. Finally, let's try it on a slightly more complex query, one for TestModel.all().filter("foo =", bar"):

PrimaryScan { start_key: "shell\000TestModel\000foo\000\232bar\000\200" start_inclusive: true } keys_only: false

Apart from the obvious difference, you'll note this seems to be a different format to the first one. That's because the first one was generated by the dev_appserver, while this one was generated on shell.appspot.com. As you can see, they use slightly different notations - but then, that shouldn't matter, since you certainly shouldn't be relying on the internal format of these data structures for anything except informational purposes!

One caveat for early adopters: A near perfect storm of different minor bugs make testing this in interactive consoles - remote_api, shell.appspot.com and the dev_appserver console - more problematic than it should be. And, of course, cursors, like all other prerelease features, are likely to only work on the dev appserver. But then, that's why it's called a prerelease.

Disqus