Many database students rejoiced when a paper was published showing a shortcut to fifth normal form (pdf). From the abstract:

A key is simple if it consists of a single attribute. It is shown that if a relation schema is in third normal form and every key is simple, then it is in projection-join normal form (sometimes called fifth normal form), the ultimate normal form with respect to projections and joins.

What the hell is fifth normal form and why do we want it? Well, it deals with cases where we can avoid redundancy when information can be reconstructed from smaller bits of information and ... and ... and ...

OK, so that's not helping. In fact, the vast majority of explanations on the Web aren't helping, so I'll explain how to fake database normalization. I'll even avoid big words.

For those of you who know nothing about databases, this will help you tremendously. For those of you who know databases, try not to throw your coffee mug at the screen. You might break something. What I present here is so incredibly limited that it won't help for larger systems, but for smaller systems, having a partially normalized database (yeah, yeah, I know, it's like partially alive) is far better than what we actually see in the real world.

This brief introduction to normalization starts with the following:

It's common for people new to relational databases to approach the design of the tables like they would a flat, Excel-like spreadsheet: to design the tables as if they were isolated, with all the data needed for a particular functionality crammed into a single table.

I see this a lot. If you've been a programmer for any length of time and you interact with databases, you need to move on from this. I won't always go into the why of normalizing a database, but I'll give you some tips.

Years ago at one company I went on a leave and when I came back, I discovered that the company had hired a DBA at a considerable salary. I was overjoyed, until I found out that he had sweet-talked his way into the job and had no understanding of databases (he was fired after several months of agony). For example, we were creating a proposal for a company that sold pears and they had a "pear recipe" database on their Web site. Customers could log in and search for recipes by ingredient. Our "DBA" said we could import their database directly because it was already normalized. The recipes table had columns ingredient1 , ingredient2 , through ingredient8 .

If you're an experienced programmer, even if you don't know how to design a database you're probably thinking "what if you have more than 8 ingredients?" (As it turns out, ingredient8 had CSV values!).

At its simplest, our database has at least three entities (but we'll call 'em "nouns"): customers , recipes and ingredients . Congratulations! Now you know that you need at least three tables, one for each noun. You can figure out for yourself what data is in those tables.

But how do we know which ingredients go with which recipe? Because one recipe can have one or more ingredients and one ingredient can be on one or more recipes, we have a "many to many" relationship. Every "many to many" relationship gets its own table, often called a lookup table. It might look like this:

.--------------------. | recipe_ingredients | |--------------------| | recipe_id | | ingredient_id | '--------------------'

Each of the ID fields should contain a foreign key constraint against the table that ID should be in. A foreign key constraint says "this value must exist in the corresponding column in the table it refers to". That prevents you from inserting an ingredient ID which doesn't exist in the ingredients table. It also prevents you from deleting, say, a recipe or ingredient that's in the recipe_ingredients table.

But how much of each ingredient do we want? Well, maybe a recipe calls for 2 cups of sugar. '2' is the quantity, 'cups' is the measure and 'sugar' is the ingredient. We haven't modeled measures, so now we'll need a fourth table called measures containing values like "cup", "teaspoon", "gallon", "gram" and so on. And now our lookup table might look like this:

.--------------------. | recipe_ingredients | |--------------------| | recipe_id | | ingredient_id | | measure_id | | quantity | '--------------------'

We put the measure_id and ingredient in this lookup table because they're specific to this ingredient for this recipe.

So let's say that a customer can only view three free recipes before they have to buy recipes. Each purchase is an order (ooh, look another noun). So now we have an orders table, too. If we assume that one customer can have zero, one, or many orders, but each order can have only one customer, than we have a "one-to-many" relationship from customer to orders. In that case, each order should have the id of the customer it belongs to:

.--------------------. | orders | |--------------------| | order_id | | customer_id | '--------------------'

Again, the customer_id should have a foreign key constraint against the customers table. You will have other information in that table, such as the order date, but we're keeping this simple.

But how do we know which recipes a customer ordered? If you think about it, one order might have many recipes and one recipe might be on many orders, so we have a "many to many" relationship again (don't forget the foreign key constraints):

.--------------------. | order_recipes | |--------------------| | order_id | | recipe_id | '--------------------'

(Note: Some people like to name lookup tables with a _to_ in the name: orders_to_recipes , recipes_to_ingredients , and so on, just to make it clear by looking at a schema which tables are lookup tables.)

So our final database might have:

customers

orders

order_recipes

recipes

ingredients

recipe_ingredients

measures

In other words, a simple database that allows customers to order recipes might have seven tables (it will probably have more). In the "pears" database I mentioned above, we had three. It was a nightmare. The customer did not accept our bid due to our "overcomplicated" database.

Conclusion

The above is woefully inadequate, but it gives you a vast head-start over many developers who don't even understand the simple rules above. In short:

Every "noun" gets its own table.

"many to many" relationship get their own tables

"one to many" relationship require the table that "owns" another table have its id in that other table

Any time a table has an id that refers to a row in another table, use a foreign key constraint to make sure that id really exists

There is much, much more we could say on this topic, but others have said it far better than I. What I describe above is far from perfect, but it lets you skip a lot of jargon (many people who design perfectly adequate databases can't describe the first three normal forms). As you gain experience in database design, you'll want to read more about normalization, different forms of constraints, custom data types (well, not you MySQL fans), indexes, and far more than what I have listed here. However, if you can follow the above simple rules, you'll have a database design which is definitely better than most I have seen at the hands of inexperienced programmers. And as your system grows, you'll have a (vaguely) reasonable database to fix as you encounter limitations in your design.