Data Modeling in Performant Systems

I have been working on Words With Friends, a high traffic app, for over six months. Talk about trial by fire. I never knew what scale was. Suffice to say that I have learned a lot.

Keeping an application performant is all about finding bottlenecks and fixing them. The problem is each bottleneck you fix leads to more usage and a new bottleneck. It is a constant game of cat and mouse. Sometimes you are the cat and sometimes, well, you are not.

Most of the time, the removal of those bottlenecks is about moving hot data to places that can serve it faster. Disks are slow, memory is fast, enter more memcached.

Over time, you work and work to move hot data into memory and simplify your data access to fit into memory. Key here, value there. Eventually, you get to a place where you have simplified how you access your data into simple key/value lookups.

Games get marshaled into a key named "Game:#{id}" . Joins are simplified to selecting ids and caching the array of ids into a key such as "User:#{id}:active_game_ids" or "User:#{id}:over_game_ids" . In turn, those arrays are turned into objects by un-marshaling the contents of "Game:#{id}" , etc.

Your data model morphs from highly relational to key/value because key/value is fast and memcached can withstand a bruising.

Do it once, and you know how to do it in the future. The problem is by the time you get to this data model, it is kind of bolted on/in to your app.

What if you could design it this way from the beginning? What if you had no option but to think through your data model in keys and values? Need your data in two different ways? Put it in two different places, etc, etc.

I have good news. Now you can.

A Little History

Not long into my tenure with WWF, we were hitting a lot of walls and there was a lot of talk about NoSQL. Mongo? Membase? Cassandra? Riak?

Which one will work best for the problem at hand? What if we could try them all really easily by just changing which place the data went to? What if we could try out more than one at once?

I sat down one weekend and started thinking about the app and realized what I just talked about above. Along the way, our data access changed from relational to key lookups. This made me think about a hash.

Hashes are so versatile, and yet, so constrained. Hashes are for reading, writing and deleting keys, just like key/value stores. I did a bit of GitHub searching and stumbled across moneta, by Yehuda Katz.

Moneta immediately struck me as brilliant. I was shocked there was no activity around it. If you only allow yourself to read, write and delete with the same API, you can make nearly any data store talk the correct language.

I fiddled with it and forked it, but in the end, it was not quite what I was looking for. I liken it to my first house. I like the house, but having lived in it for six years, I know exactly what I want out of my next house.

The folks at Newtoy (now Zynga with Friends) had mentioned that they wanted to build their own object mapper and name it ToyStore—such a great name.

In a fit of inspiration over the 4th of July weekend, I cranked out attributes and initialization, relying heavily on ActiveModel. It was really fun. I emailed the crew when the next work day came around and they were stoked.

It began to occupy some of my work-related time and Geoffrey Dagley started helping me with it. Over the next few weeks, Geof and I hammered out validations, serialization, callbacks, dirty tracking, and much more.

Everything was built on the premise that the only acceptable methods that could be used to read, write and delete data were read, write and delete.

Adapter: The Common Interface

Over time Brandon Keepers got involved and ToyStore started looking pretty legit. We switched from using Moneta as the base to something I whipped together in a few hours, Adapter.

Defining an adapter is as simple as telling it how the client reads, writes and deletes data. You also have to define a clear method for convenience and to stick close the Ruby hash API.

The client can be anything that you want to have a unified interface. For example, this is how you would create an adapter to store things in a ruby hash.

Adapter.define(:memory) do def read(key) decode(client[key_for(key)]) end def write(key, value) client[key_for(key)] = encode(value) end def delete(key) client.delete(key_for(key)) end def clear client.clear end end

key_for ensures that most things can work as a key. encode and decode allow one to hook some kind of serialization in, whatever you fancy, be it Marshal, JSON, or whatever you can imagine.

By defining those methods, we can now get an instance of this adapter and connect it to a client. In the example above, the client is just a plain ruby hash, but in other adapters, it could be an instance of Redis (adapter), Memcached (adapter), or maybe a Riak bucket (adapter).

adapter = Adapter[:memory].new({}) # sets {} to client adapter.write('foo', 'bar') adapter.read('foo') # 'bar' adapter.delete('foo') adapter.fetch('foo', 'bar') # returns bar and sets foo to bar # [] and []= are aliased to read and write adapter['foo'] = 'bar' adapter['foo'] # 'bar'

Adapters can also be defined using a block (like above), a module, or both (module included first, then block so you can override module with block).

Adapters can also define atomic locking mechanisms, see the memcached and redis adapters for their locking implementations. The more opaque the object, the more you need to lock. Or, in the case of riak, the adapter can handle read conflicts.

ToyStore: The Mapper Fixings on top of Adaper

Once you have secured how your data layer speaks the adapter interface you can use the real power, ToyStore.

Lets say you want to store your users in redis. Create your class, include the Toy::Store, and set it to store in redis.

require 'toystore' require 'adapter/redis' class User include Toy::Store store :redis, Redis.new attribute :email, String end

From there, you can go to town, defining attributes, validations, callbacks and more.

class User include Toy::Store store :redis, Redis.new attribute :email, String validates_presence_of :email before_save :lower_case_email private def lower_case_email self.email = email.downcase if email end end user = User.new pp user.valid? user.email = 'John' pp user.save pp user pp User.get(user.id) user.destroy pp User.get(user.id)

Change your mind? Decide that you do not want to use Redis? Fancy Riak? Simply change the store to use the riak adapter and you are rolling.

require 'toystore' require 'adapter/riak' class User include Toy::Store store :riak, Riak::Client.new['users'] attribute :email, String end

Boom. You just completely changed your data store in a couple lines of code. Practical? Yes and no. Cool? Heck yeah.

What all does Toy::Store come with out of the box? So glad you asked.

Attributes – attribute :name, String (or some other type) Can be virtual which works just like attr_accessor but all the power of dirty tracking, serialization, etc. Also, can be abbreviated which means :first_name could be the method you use, but in the data store the attribute is :fn. Save those bytes! Allows for default values and defaults can be procs.

– attribute :name, String (or some other type) Can be virtual which works just like attr_accessor but all the power of dirty tracking, serialization, etc. Also, can be abbreviated which means :first_name could be the method you use, but in the data store the attribute is :fn. Save those bytes! Allows for default values and defaults can be procs. Typecasting – Same type system as MongoMapper. One day they will share the exact same type system in its own gem, for now duplicated.

– Same type system as MongoMapper. One day they will share the exact same type system in its own gem, for now duplicated. Callbacks – all the usual suspects.

– all the usual suspects. Dirty Tracking – save, create, update, destroy

– save, create, update, destroy Mass assignment security – attr_accessible and attr_protected

– attr_accessible and attr_protected Proper cloning

Lists – arrays of ids. If user has many games, user would have list :games which stores in game_ids key on user and works just like an association.

– arrays of ids. If user has many games, user would have list :games which stores in game_ids key on user and works just like an association. Embedded Lists – array of hashes. More consistent than MongoMapper, which will soon reap the benefits of the work on Toy Store embedded lists.

– array of hashes. More consistent than MongoMapper, which will soon reap the benefits of the work on Toy Store embedded lists. References – think belongs_to by a different (better?) name. Post model could reference :creator, User to add creator_id key and relate creator to post.

– think belongs_to by a different (better?) name. Post model could reference :creator, User to add creator_id key and relate creator to post. Identity Map – On by default. Should be thread-safe.

– On by default. Should be thread-safe. Read/write through caching – If you specify a cache adapter (say memcached), ToyStore will write to memcached first and read from memcached first, populating the cache if it was not present.

– If you specify a cache adapter (say memcached), ToyStore will write to memcached first and read from memcached first, populating the cache if it was not present. Indexing – Need to do lookups by email? index :email and whenever a user is saved the user data is written to one key and the email is written as another key with a value of the user id.

– Need to do lookups by email? index :email and whenever a user is saved the user data is written to one key and the email is written as another key with a value of the user id. Logging

Serialization ( XML and JSON )

( and ) Validations

Primary key factories

It pretty much has you covered. Adapters for redis, memcached, riak, and cassandra already exist. Expect a Mongo one soon. Have to make a few tweaks to adapter. Yep, even Mongo.

What are other adapters that could be created? Membase? Just start with the memcached adapter and override key_for . Git? File system? REST? MySQL?! I love it!

The Future

The future is not picking a database and forcing all your data into it. The future (heck, now even) is the right database for the job and your application may need several of them.

All this said, in no way do I think ToyStore is going to take the world by storm. It is a different way to build applications. This way comes with great power, but great confusion as well.

Currently, each model is serialized into one key in the store, based on how the adapter does encode/decode. Eventually, I would like to add the ability to store different attributes in different keys. For example, maybe you want active_game_ids to be stored in a key by itself so you don’t have to constantly save the entire user object.

I can also see a use for being able to store an attribute not just a different key, but a different store entirely. Store your user objects in Riak, but active_game_ids in a Redis set. This is where it would get really powerful.

At any rate, I am very excited about this project and I think it has a lot of potential. I would also like to add that MongoMapper is here to stay.

In fact, I learned from my mistakes on MongoMapper when building ToyStore and will be back-porting those learned experiences very soon. Expect a flurry of activity over the next little while.

Closing Thanks

Huge thanks to Newtoy (now Zynga with Friends) for allowing Geof and I to open source this. Several pieces of ToyStore were built on their dime and I really appreciate their contribution to the Ruby and Rails community!

As is typical with new projects, there are probably rough spots and good luck finding documentation. I have included a bevy of examples and the tests do a superb job at explaining the functionality of each method/feature.

Let me know what your thoughts are and be sure to kick the tires!

Roundup of Links