Essay: The Database Model is the Domain Model Wednesday, August 23, 2006

Preface

To work with data on a semantic basis, it's often useful to specify general definitions of the elements a given portion of logic will work with. For example, an order system works with, among other elements, Order elements. To be able to define how this logic works, a definition of the concept Order is practical: We will be able to describe the functionality of the system by specifying actions on Order elements and supply with that a definition of that element Order .

This Order element contains other elements (values like the OrderID and ShippingDate ) and has a tight connection with another element, OrderRow , which in turn also contains other elements. You can even say that Order contains a set of OrderRow elements like it contains value elements. Is there a difference between the containment of the value OrderID and the containment of the set of OrderRow elements? The answer to this question is important for the way the concept of Order is implemented further when the order system is realized with program code.

The Entity's Habitat

Looking at the relational model developed by Dr. E.F. Codd[1] , they should be seen as two separated elements: Order and OrderRow , which have a relationship, and the elements itself form a relation based on the values they contain, like OrderID .

However, looking at the Domain Model, pioneered by Martin Fowler[2] and others, it doesn't have to be that way. You could see the OrderRow elements in a set as a value of Order and work with Order as a unit, including the OrderRow elements.

What's an Entity?

In 1970, Dr. Peter Chen defined the concept entity for his Entity Relationship Model[3], which builds on top of Codd's Relational Model. The concept of the entity is very useful in defining what Order and OrderRow look like in a relational model. Chen defined Entity as

Entity and Entity set. Let e denote an entity which exists in our minds. Entities are classified into different entity sets such as EMPLOYEE, PROJECT, and DEPARTMENT. There is a predicate associated with each entity set to test whether an entity belongs to it. For example, if we know an entity is in the entity set EMPLOYEE, then we know that it has the properties common to the other entities in the entity set EMPLOYEE. Among these properties is the afore-mentioned test predicate.

By using the Entity Relationship Model, we're able to define entities like Order and OrderRow and place them into a relational model, which defines our database. Using Eric Evans' definition of Entity, however, we are far away from the relational model, which I think comes down to the following definition: "An object that is tracked through different states or even across different implementations." The important difference between Evans' definition and Chen's definition of an entity is that Chen's definition is that of an abstract element; it exists without having state or even a physical representation. With Evans, the entity physically exists; it's an object, with state and behavior. With the abstract entity definitions, we're not influenced by the context in which an entity's data is used, as the interpretation of the data of an entity is not done by the entity itself (as there is no behavior in the entity) but by external logic.

To avoid misinterpretations, we're going to use the definition of Dr. Peter Chen. The reason for this choice is because it defines the abstract term for things we run into every day, both physical items and virtual items, without looking at context or contained data as the definition is what's important. It is therefore an ideal candidate to describe elements in relational models like Customer or Order . A physical Customer is then called an entity instance.

Where Does an Entity Live?

Every application has to deal with a phenomenon called state. State is actually a term that is too generic. Most applications have several different kinds of state: user state and application state are the most important ones. User state can be seen as the state of all objects/data stores that hold data on a per-user basis (that is, have "user scope") at a given time T. An example of user state is the contents of the Session object of a given user in an ASP.NET application at a given moment. Application state is different from user state; it can be seen as the state of all objects/data stores that hold data on an application scope basis. An example can be the contents of a database shared among all users of a given Web application at a given moment. It is not wise to see the user state as a subset of the application state: When the user is in the middle of a 5-step wizard, the user state holds the data of step one and two; however, nothing in the application state has changed; that will be the case after the wizard is completed.

It is very important to define where an entity instance lives: in the application state or in the user state. If the entity instance lives in the user state, it's local to the user owning the user state, and other users can't see the entity and therefore can not use it. When an entity instance is created, like an order is physically created in the aforementioned order system, it is living inside the actual application; it is part of the application state. However, during the order creation process, when the user fills in the order form, for example, the order is not actually created; a temporary set of data is living inside the user's state, which will become the order after it is finalized. We say the entity instance gets persisted when it is actually created in the application state.

You can have different types of application state: a shared, in-memory system that holds entity instances, or you can have a database in which entity instances are stored. Most software applications dealing with entity instances use some kind of persistent storage to store their entity instance data to make it survive power outages and other things causing the computer to go down, losing its memory contents. If the application uses a persistent storage, it is likely to call the data in the persistent storage the actual application state: when the application is shut down, for example, for maintenance, the application doesn't lose any state: no order entity instance is lost, and it is still available when the application is brought back up. An entity instance in memory is therefore a mirror of the actual entity instance in the persistent storage, and application logic uses that mirror to alter the actual entity instance that lives in the persistent storage.

Mapping Classes onto Tables versus Mapping Tables onto Classes

O/R Mapping deals with the transformation between the relational model and the object model: that is, transforming objects into entity instances in the persistent storage and back. Globally, it can be defined as the following:

A field in a class in the object model is related to an attribute of an entity in the relational model and vice versa.

A chicken-egg problem arises: what follows what? Do you first define the entity classes (classes representing entity definitions), like an Order class representing the Order entity) and create relational model entities with attributes using these classes, or do you define a relational model first and use that relational model when you define your entity classes?

As with almost everything, there is no clear "this is how you do it" answer to that question. "It depends" is probably the best answer that can be given. If you're following the Domain Model, it is likely you start with domains, which you use to define classes, some probably in an inheritance hierarchy. Using that class model, you simply need a relational model to store the data, which could even be one table with a primary key consisting of the object ID, a binary blob field for the object, and a couple of metadata elements describing the object. It will then be natural to map a class onto elements in the relational model after you've made sure the relational model is constructed in a way that it serves the object model best.

If you start with the relational model and you construct an E/R model, for example, it is likely you want to map an entity in your relational model onto a class. This is different from the approach of the Domain Model, for instance, because the relational model doesn't support inheritance hierarchies: you can't model a hierarchy like Person <- Employee <- Manager such that it also represents a hierarchy. It is, of course, possible to create a relational model that can be semantically interpreted as an inheritance hierarchy; however, it doesn't represent an inheritance hierarchy by definition.

This is the fundamental difference between the two approaches. Starting with classes and then working your way to the database uses the relational model and the database just as a place to store data, while starting with the relational model and working your way towards classes uses the classes as a way to work with the relational model in an OO fashion.

As we've chosen to use Chen's way of defining entities, we'll use the approach of defining the relational model first and working our way up to classes. Later on in the "The Ideal World" section we'll see how to bridge the two approaches.

Working with Data in an OO Fashion

Entity-representing classes are the developer's way to define entities in code, just as a physically implemented E/R model with tables defines the entities in the persistent storage. Using the O/R Mapping technique discussed in the previous section, the developer is able to manipulate entity instances in the persistent storage using in-memory mirrors placed in entity class instances. This is always a batch-style process, as the developer works disconnected from the persistent storage. The controlling environment is the O/R Mapper, which controls the link between entity instances in the persistent storage and the in-memory mirrors inside entity class instances.

A developer might ask the O/R Mapper to load a given set of Order instances into memory. This results in, for each Order instance in the persistent storage, a mirror inside an entity class instance. The developer is now able to manipulate each entity instance mirror through the entity class instance or to display the entity instance mirrors in a form or offer it as output of a service. Manipulated entity instance mirrors have to be persisted to make the changes persistent. From the developer's point of view, this looks like saving the manipulated entity instance data inside the objects to the persistent storage, like a user saves a piece of text written in a word processor to a file. The O/R Mapper is performing this save action for the developer. But because we're working with mirrors, the actual action the O/R Mapper is performing is updating the entity instance in the persistent storage with the changes stored in the mirror received from the developer's code.

The relationships between the entities in the relational model are represented in code by functionality provided by the O/R Mapper. This allows the developer to traverse relationships from one entity instance to another. For example, in the Order system, loading a Customer instance into memory allows the developer to traverse to the Customer 's Order entity instances by using functionality provided by the O/R Mapper, be it a collection object inside the Customer object or a new request to the O/R Mapper for Order instances related to the given Customer instance.

This way of working with entities is rather static: constructing entities at runtime through a combination of attributes from several related entities does not result in entity-representing classes, as classes have to be present at compile time. This doesn't mean the entity instances constructed at runtime through combinations of attributes (for example, through a select with an outer join) can't be loaded into memory; however, they don't represent a persistable entity, but rather a virtual entity. This extra layer of abstraction is mostly used in a read-only scenario, such as in reporting applications and read-only lists, where a combination of attributes from related entities is often required. An example of a definition for such a list is the combination of all attributes of the Order entity and the "company name" attribute from the Customer entity.

To successfully work with data in an OO fashion, it is key that the functionality controlling the link between in-memory mirrors of entity instances and the physical entity instances offers enough flexibility so that reporting functionality and lists of combined set of attributes are definable and loadable into memory without needing to use another application just for that more dynamic way of using data in entity instances.

Functional Research as the Application Foundation

To efficiently set up the relational model, the mappings between entity definitions in the relational model and entity representing classes and these classes itself, it is key to reuse the results from work done early in the software development project, the Functional Research Phase. This phase is typical for a more classical approach to software development. In this phase, the functional requirements and system functionality are determined, defined in an abstract way and documented. Over the years, several techniques have been defined to help in this phase; one of them is NIAM[4], which is further developed by T.A. Halpin[5] to Object Role Modeling (ORM). NIAM and ORM make it easy to communicate functional research findings with the client in easy to understand sentences like "Customer has Order" and "Order belongs to Customer." These sentences are then used to define entities and relationships in an abstract NIAM/ORM model. Typically, a visual tool is used for this, such as Microsoft Visio.

The Importance of Functional Research Results

The advantage of modeling the research findings with techniques like NIAM or ORM is that the abstract model both documents the research findings during the functional research phase and at the same time it is the source for the relational model the application is going to work with. Using tools like Microsoft Visio, a relational model can be generated by generating an E/R model from an NIAM/ORM model, which can be used to construct a physical relational model in a database system. The metadata forming the definition of the relational model in the database system can then be used to generate classes and construct mappings.

The advantage of this is that the class hierarchy the developers work with has a theoretical basis in the research performed at the start of the project. This means that when something in the design of the application changes, such as a piece of functionality, the same path can be followed: the NIAM model changes, the relational model is adjusted with the new E/R model created with the updated NIAM model, and the classes are adjusted to comply to the new E/R model. The other way around is also true: to find a reason for code constructs the developer has to work with. For example, for code constructs to traverse relationships between entity instance objects, you only have to follow back the path from the class to the functional research results and the theoretical basis for the code constructs is revealed. This strong connection between a theoretical basis and actual code is key to a successful, maintainable software system.

Functional Processes as Data Consumers and Location of Business Logic

As the real entity definitions live in the relational model, inside the database, and in-memory instances of entities are just mirrors of real instances of entities in the database, there is no place for behavior, or Business Logic rules, in these entities. Of course, adding behavior to the entity classes is easy. The question is whether this is logical, when entity classes represent entity definitions in the relational model. The answer depends on the category of the Business Logic you want to add to the entity as behavior. There are roughly three categories:

Attribute-oriented Business Logic Single-entity-oriented Business Logic Multi-entity-oriented Business Logic

Attribute-oriented Business Logic is the category that contains rules like OrderId > 0. These are very simple rules that act like constraints placed on a single entity field. Rules in this category can be enforced when an entity field is set to a value.

The category of single-entity-oriented Business Logic contains rules like ShippingDate >= OrderDate , and those also act like constraints. Rules in this category can be enforced when an entity is loaded into an entity object in memory, saved into the persistent storage, or to test if an entity is valid in a given context.

The multi-entity-oriented Business Logic category contains rules spanning more than one entity: for example, the rule to check if a Customer is a Gold Customer. To make that rule true, it has to consult Order entities related to that Customer and Order Detail entities related to these Order entities.

All three categories have dependencies on the context the entity is used in, although not all rules in a given category are context-dependent rules. Attribute-oriented Business Logic is the category with the most rules that are not bound to the context the entity is used in, and it is a good candidate to add to the entity class as behavior. Single-entity-oriented Business Logic) is often not a good candidate to add to the entity class as behavior, because much of the rules in that category, which are used to make an entity valid in a given context, can and will change when the entity is used in another context. Rules in the multi-entity-oriented Business Logic category span more than one entity and are therefore not placeable in a single entity, besides the fact they're too bound to the context in which they're used.

Pluggable Rules

To keep an entity usable as a concept that isn't bound to a given context, the problem with context-bound Business Logic rules in the category attribute-oriented Business Logic and the category single-entity-oriented Business Logic can be solved with pluggable rules. Pluggable rules are objects that contain Business Logic rules and that are plugged into an entity object at runtime. The advantage of this is that the entity classes are not tied to a context they are used in but can be used in any context the system design asks for: just create per-context a set of pluggable rules objects, or even more per-entity, and depending on the context state, rules can be applied to the entity by simply setting an object reference. The processes that decide which rules objects to plug into entities are the processes maintaining the context the entity is used in: the processes representing actual business processes that are called functional processes.

Functional Processes

In the previous section, The Importance of Functional Research Results, the functional research phase was described, and the point was made concerning how important it is to keep a strong link between researched functionality and actual implementation of that functionality. Often a system has to automate certain business processes, and the functional research will describe these processes in an abstract form. To keep the link between research and implementation as tight as possible, it's a common step to model the actual implementation after the abstract business process, resulting in classes that we'll call functional processes because they're more or less data-less classes with sole functionality.

The functionality processes are the ideal candidates in which to implement multi-entity-oriented Business Logic rules. In our example of the Gold Customer, a process to upgrade a Customer to a Gold Customer can be implemented as a functional process that consumes a Customer entity object and its Order entity objects, updates some fields of the Customer entity, and persists that Customer entity after the upgrade process is complete. Furthermore, because functional processes actually perform the steps of a business process, they are also the place where a context is present in which entities are consumed and the only correct place to decide which rules to plug into an entity object at a given time T for a given context state.

The Ideal World Using NIAM/ORM for Classes and Databases

For most people, reality is not always in sync with what we expect to be an ideal world, and everyday software development is no exception to that. In the functional research paragraph, physical relational model metadata were used to produce mappings and classes to get to the entity class definitions for the developer to work with. A more ideal approach would be if the NIAM/ORM model could also be used to generate entity classes directly, avoiding a transformation from metadata to class definition. It would make the ultimate goal, where the design of the application is as usable as the application itself, appear to be one step closer.

When that ideal world will be a reality, or even if it will be a reality, is hard to say as a lot of the factors that influence how a software project could be made a success can be found in areas outside the world of computer science. Nevertheless, it's interesting to see what can be accomplished today, with techniques developed today, like model-driven software development.

References

[1] Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13 #6, 1970.

[2] Fowler, Martin. Patterns of Enterprise Application Architecture. Boston, MA: Addison-Wesley, 2003.

[3] Chen, P. "The entity-relationship model[md]toward a unified view of data." ACM Transactions on database systems, vol.1 no.1, 1976.

[4] Halpin, T.A. and G.M. Nijssen. Conceptual Schema and Relational Database Design: A Fact-Oriented Approach. New Jersey: Prentice Hall, 1989.

[5] Halpin, Terry. Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. San Francisco: Morgan Kaufmann, 2001.

