Ten of the Biggest Mistakes Developers Make With Databases

Although fashions come and go in software development, some things stay remarkably constant. One of these is the use of databases. You may be wonderfully up-to-date with an AJAX Web interface or the latest whizbang Windows user interface, but under the covers, you're probably still pumping data in and out of a database, just as we all did a decade or more ago. That makes it all the more surprising that developers are still making the same database mistakes that date back to those good old days of Windows 95 and before. Perhaps it's just that most of us learn to use databases on the side, rather than really studying them. In any case, here are my nominations for the biggest mistakes that I see over and over again.

Choosing the Wrong Database

Not all databases are created equal — which means before you do anything with a database, you have to pick the appropriate database. Time and again I've seen Access databases groaning to bear the load of huge data sets that would have been child's play for SQL Server, or harried users trying to pay for and set up SQL Server to hold a few hundred rows of data. Broadly speaking, there are three tiers of databases in the market these days: desktop and embedded databases suitable for smaller tasks, "Express" versions of the major players that are good up to a few gigabytes of data, and the truly enterprise databases like SQL Server, Oracle, and DB2 that can handle just about anything you can throw at them. Before you do anything else, you need to make some realistic estimates about the amount of data that you'll be storing and pick the appropriate product to do the storage.

Choosing Too Many Databases

APIs such as ODBC, JDBC, and OLE DB have promoted the notion of database independence - the idea that you can write your application code in such a manner that you can plug any database at all in for data storage. Well, yes, but there are compromises. I've seen a lot of teams go down the rat hole of database independence, writing layers to translate all of their SQL statements to some lowest common denominator dialect that every conceivable database will support, and at the same time giving up on advanced features available in any particular database. The notion seems to be that some client in the future might want to switch to Oracle or DB2 or FoxPro or whatever, so it's best to be prepared now. On the contrary: when you're starting out with a new product, pick your storage engine and write to it. If your product is good, people will install the database you specify, and you won't be wasting untold man-hours supporting "just in case" scenarios that you'll probably never need.

Know Your Data

If I had a dollar for every time it turned out that some customer numbers had seven digits instead of six, or that the registrar's office really did allow students to register without a social security number due to privacy concerns and so the column has to be made nullable - well, I'd have a lot of dollars. Database design can't be done in a vacuum, away from the business rules. It's critical that you get the input of the actual users of the data, and hammer on them to find out for sure how big each column needs to be, what rules apply to it, what types of data it will hold, who can update it, and so on. Otherwise, you're setting yourself up for costly rework down the line. You'll learn to dread sentences starting with "Well, it looks fine, except..."

It's Just Like Excel, Right?

There's a tendency, especially among managers of smaller shops, to assume that any developer knows how to set up a database. Frankly, this perplexes me. You wouldn't assume that any given developer knows how to code in C# or set up a Web Service, so why is it that we're all supposed to be database pros? The end result is that too many databases are designed by people who have never even heard the term normalization, never mind developed any understanding of the various normal forms. I can't tell you how many times I've found everything shoved into one big table with hideous update anomalies and performance issues as a result. If you're in this situation yourself, and you're over your head, demand training - or start job-hunting. Efficient database design is something you need to learn, not discover by trial and error.

Third Normal Form is Not the Holy Grail

On the other hand, a little knowledge really can be a dangerous thing. I've seen databases that were normalized to death by well-meaning developers who insisted on putting everything in lookup tables. I recall one memorable instance in which "yes" and "no" were relegated to tblAnswers , where they could be referred to by an AnswerID foreign key from other tables. Yes, you need to know the normalization rules, but you also need to develop the skill to know when to stop normalizing, and when denormalization for performance actually makes sense.

Page 1 of 2