Reviewer: John M. Artz

In 1970 E. F. Codd published a paper in the Communications of the ACM [1] that introduced the relational data model and made an indelible mark on the evolution of database management systems. For the last 20 years, Codd has staunchly defended the relational data model against the pragmatic compromises of vendors who wished to call their products relational without delivering on the full promise of a relational database. In a 1985 Computerworld interview Codd announced his 12 criteria for determining whether a database management system could or should be called relational [2,3]. Since none of the products on the market met all 12 criteria, bitter debate ensued. Vendors tried to protect their turf (the database products) and Codd fought to protect his (the relational model), Codd's premises are certainly straightforward. He wishes to keep the relational model simple and abstract: simple so the semantics of the model do not become unwieldy for casual users, and abstract so integrity can be insured by the database management system rather than by application programmers. His rationale for these premises is also simple. Codd believes that data should be shared and accessible by a wide variety of people who may or may not be familiar with the physical peculiarities of the database. He wishes to deliver simplicity and reliability to the database user at the expense of the database product developer. This book is a feature-by-feature description of the Relational Model Version 2 (RM/V2). The original relational model, now designated RM/V1, had 12 specific requirements as stated in the Computerworld interview. Version 2 has over 300 features, only 50 of which were implicit in RM/V1. Once the industry has absorbed the shock of this new treatise, the controversy will begin again. This time, however, the debate will include a new group—those champions of the relational model who disagree with Codd. The dissension will occur in three main areas. First, Codd addresses his readers in a heavy-handed prescriptive tone that unequivocally declares the conditions for being fully relational, leaving no room for alternative views or further debate. Second, Codd has extended the model in certain ways that run counter to conventional wisdom, some of which are in direct conflict with more fundamental aspects of the model. Third, he has failed to extend the relational model in other ways that would almost certainly have gained wide acceptance while maintaining the integrity of the model. Codd states that in order to be fully relational in the 1990s a database management system must meet more than 300 stringent criteria. I have three problems with this position. First, while the industry is grateful to Codd for introducing the relational model in 1970, it is not at all clear that he alone should maintain it in 1990. Many brilliant minds have addressed relational issues in the last 20 years, and many more opinions remain to be gathered. Second, many of the features are ill-defined, so it would probably not be possible to implement all of them even if a vendor were so inclined. Finally, some of the features are counter to the premises of simplicity and abstraction and should not be implemented. Instead of dictating the features as requirements, it would have been more appropriate for Codd to offer them as his opinions and as a basis for further debate. The debate will ensue regardless, but Codd's extreme position on these features puts him at great risk for lost credibility if the industry does not ultimately agree. Another area of potential discord is the extensions to the model that are not widely accepted as appropriate directions. For example, Codd wishes to enforce closure on relational operators, so any relational operators must return a valid relational table. The rationale for this is that relational operators can be composed—the results of one operation can always be fed into another operation—but the implications are far-reaching. Relational closure requires that no relational operator return a table with duplicate rows, since a table with duplicate rows is not a valid relational table. A second implication of closure is the need for a nomenclature that will ensure that columns in any derived table are uniquely named. With certain operations this nomenclature gets very complicated. In general, I think closure within the relational operators is a good idea because it will greatly increase the expressive power of a relational language. This power does not come without a cost, and I expect many different opinions on its value. Another questionable extension is the inclusion of four-valued logic. RM/V2 allows two types of nulls: missing but applicable, and missing and not applicable. Granting that RM/V1 does not have systematic treatment for nulls, this extension is complicated, confusing, and counterintuitive. If simplicity and abstraction were goals of RM/V2, they have been badly violated by the four-valued logic. I do not agree with this extension and do not believe many other people will either. Since the conception of the relational data model, other useful data models have been identified, notably the deductive and object-oriented data models. Whether or not they are “data” models in any absolute sense, they both embody powerful ideas. The deductive data model allows the inclusion of inferential rules to extend the information in the database to include those facts that can be derived from existing facts. The object-oriented model allows data objects to have behavior as well as models. Both these concepts could make the relational model much more powerful without sacrificing simplicity or abstraction. Indeed, these ideas increase simplicity and abstraction. Unfortunately, Codd appears to be in a competitive rather than a cooperative mode and criticizes the ideas of others rather than attempting to see their value. This book has several shortcomings. While it appears to be highly systematic, with more than 300 features and 18 classes, it is really quite disorganized. The features are neither orthogonal nor independent. A feature may have implications in other features throughout the book. The discussion of a single concept may be carried from place to place throughout the book, rehashed, and even redefined; it is thus difficult for the reader to pin a concept down well enough to understand it fully. You have to read the book over and over, and then speculate as to what Codd really had in mind. Codd frequently rambles on about an idea rather than just stating it. For example, the concept of duplicate rows arises 13 times in the first chapter alone and many more times in the rest of the book. It would have been better if Codd had made his case and left it at that. I often felt that he was anticipating an argument and kept coming back to certain points in order to strengthen them. I have very mixed feelings about this book. Some of the new features are very good ideas, and some are very bad ideas. The book is badly in need of a good technical editor to organize and clarify the ideas. The book is sometimes lofty and sometimes petty. Nonetheless, I am glad Codd wrote it. With all its shortcomings, it is a very important book. It is not the last word on relational databases, and I do not believe that it will be the standard against which relational databases are compared. I do, however, believe that it will become the standard against which opposing views are compared. Whether you agree with the contents or not, this book is absolutely must reading for anyone seriously interested in relational databases in the 1990s.