SimpleDB simply sucks

I mean, I really like the idea of Amazon's SimpleDB. Auto-indexing is great. Scalable is great. The price is great. Forget that their REST API is a joke that will have to change as soon as people start losing or corrupting their data. Why the fuck are they throwing out relational theory? The database barely even supports 1st normal form! You can't do any kind of aggregate operation -- no count, min, max, avg, grouping! There are no integrity constraints anywhere!

Take a look at the example they give:

ID Category Subcat. Name Color Size Make Model Item_01 Clothes Sweater Cathair Sweater Siamese Small, Medium, Large Item_02 Clothes Pants Designer Jeans Paisley Acid Wash 30x32, 32x32, 32x34 Item_03 Clothes Pants Sweatpants Blue, Yellow, Pink Large Item_04 Car Parts Engine Turbos Audi S4 Item_05 Car Parts Emissions 02 Sensor Audi S4 Item_06 Motorcycle Parts Bodywork Fender Eliminator Blue Yamaha R1 Item_07 Motorcycle Parts, Clothing Clothing Leather Pants Small, Medium, Large Black

Let's ignore that item_07 has the Color & Size backwards. You'll note that Color and Size are multi-valued attributes. If you look up Multi-Valued Attributes in any relational textbook, they'll usually say something like: "Abandon all hope, ye who enter here."

Lately, however, even the diehards are allowing for nested relations & complex types inside domains, so this in and of itself isn't a bad thing if you treat them as nested relations. With that interpretation, this table is intended to manage "names & options for a particular item". It is interpretable in (at least) 1NF. I can retrieve "item_01", for example, I know that this Cathair Sweater comes in Siamese and Small, Medium, and Large.

But, the danger is if you treat this as a table for, oh, say, ordering items! One needs to know if this is a Small or a Large sweater. The only way to get to 1NF is to treat {ID, Color, Size} as a compound primary key. All of your multi-valued attributes become implicitly a part of your primary key! But there are no integrity constraints, so you better make sure your code and gateway API understands that in the above ITEMs table the primary key for item_01 through item_03 and item_06 through item_07 is {ID, Color, Size} and for item_04 & item_05 it is just {ID} -- for now!



So, while it is possible to treat SimpleDB with some level of logic, beware that it's not necessarily designed to be logical.

I also am looking forward to the nightly SimpleDB extracts to MS Excel or SQLite, or Oracle or MySQL so I can, you know, query my data for trends & business analysis. On the bright side, maybe this is Amazon's way of convincing you to build your data warehouse early.



Posted by stu at December 19, 2007 04:51 AM

