This is a single archived entry from Stefan Tilkov’s blog. For more up-to-date content, check out my author page at INNOQ, which has more information about me and also contains a list of published talks, podcasts, and articles. Or you can check out the full archive.

QCon SF 2009: Max Ross, Mapping Relational Data Patterns to the App Engine Datastore Stefan Tilkov, Nov 19, 2009 These are my unedited notes from Max Ross's talk about Mapping Relational Data Patterns to the App Engine Datastore at QCon SF 2009. Datastore is transactional, natively partitioned, hierarchical, schema-less, based on BigTable – not a relational database

Goals: Simplify storage by simplifying development, management

Even though Datastore is based on the ridiculously scalable BigTable, you don't need to have scalability problems to benefit from it

Scale always matters - the problem is not in the second step, it's the first step

Free to get started (not only for the first 30 days), pay only for what you need

Let someone else manage upgrades, redundancy, connectivity

Let someone else handle problems

Detailed post-mortem of GAE downtime available somewhere

Scale automatically to any point on the scale curve

Trying to get people out of the business of managing their database in production

Basic entity: Kind, Entity group, key, age, + any number of properties

Datastore is schemaless - soft schema model. Much of the stuff available in the DB (constraints, type checking, schema) needs to move up to the app layer (but is usually replicated there anyway)

primary benefit of the schemaless datastore: much faster iterations

soft schemas can give you type safety despite using a simple key/value store underneath

JPA annotations provide soft schema - even though targeted at creating DB information, GAE can benefit from it

JPA annotations are a data definition language (proof: relational DB schema can be created from annotations)

Primary key in the datastore contains the kind and are hierarchical, e.g. /Person:13/Pet:Ernie

Analogy: Hierarchical datastore keys are similar to composite primary keys

Surrogate keys are harder to move - dropping is often not an option. Mapping options: 1) make surrogate part of the key a property 2) make surrogate key primary key, put rest into property Transations: transactions in the Datastore apply to a single Entity Group

Entities in the same Entity Group share the same root part of the key

This makes Entity Group selection a critical design choice, with obvious effects on transactions

Too coarse hurts througput, too fine limits usefulness of transactions

Datastore does optimistic concurrency checks at the Entity Group level

[Strong relationship between data modeling and transaction processing – reminds me of the old debate on EJB 2.0 pre-final entity beans and dependent objects]

Unreleased new feature: Transactional tasks can update multiple entity groups, a task in a queue can participate in a DB transaction

Example: Deferred, transactional, async balance update (eventual consistency) as well as synchronous

Two-phase commit protocol algorithm implemented at Berkely, implemented by a Google developer (Erick Armbrust) Relationships Letting a framework manage relationships can simplify code for RDBMS, but especially for App Engine Datastore

Goal: make handling relationships with JPA as easy as possible

Google's JPA implementatin has some sensible defaults: Ownership implies entites are placed in the same Entity Group

E.g. Person with a @OneToMay to Pet (with a back reference of @ManyToOne) makes both part of the same Entity Group Queries Testing set membership – requires a join table with an RDBMS, can use a multi-value property in the GAE datastore (select from User where hobbies = 'yoga')

Other than that, no joins supported

Conflict: Google promises that query performance scales linearly with the size of the result set; not possible when cross products are needed to fulfill queries

Making good progress with a subset for join progress, not releases yet - nowhere near ready for production

RDBMS encourage cheap writes and expensive reads; datastore encourage expensive writes and cheap reads. Denormalization enouraged where it makes sense

Obvious problems with denormalized data Taking code somewhere else App engine is in general more restrictive

Suggestion: Decide early whether or not portability matters to you

Shows examples of portable code - somewhat ugly

Congratulations, you have already sharded your data model Key takeaways App engine datastore simplifies persistence

JPA adds typical RDBMS features to the datastore

Important to understand how the datastore is different

Easier to move apps off than on

If portability is important, plan for it

http://gae-java-persistence.blogspot.com Q&A Q. Does the shown transaction example really solve the problem? A. No, not to the full extent. lot of Google's billing software is built without multi-row transactions

Q. Is JPA a good model when starting from scratch? A. Many people like the low-level API, then start building an ORM on top of it … possibly better to start using an existing one.

Q. What kind of apps are on GAE? A. Not really known, many backend applications for iPhone apps, Facebook, … Obama virtal town hall meeting peaked at 700 req/s

Q. Export features? A. Some bulk import/export, but there should be more

Q. Caching? A. No direct support for JPA caching using memcached, but should be pluggable

Q. Is Python going to be replaced by Java? A. Absolutely not, the Java team rather has to fight to be accepted as an equal citizen

Q. Restrictions on some JDK features relevant? A. No.

Q. Staging area? A. No, not yet.

Q. JDO? A. GAE supports both, datanucleus supports both; JPA was chosen randomly for this talk today.

Q. Can apps be run offline? A. You can run the app SDK locally, but it won't scale; but stub implementations are pluggble and they could be replaced.