Good Database Design Starts Here

Rules to follow when you start a new database project: usage, model selection, process and testing.

Traditionally, databases are used to store data. In large organizations, data warehouses will house data in the transactional level as well as multiple aggregated levels. Often, stored data is used for reporting, analysis and transformation.

You often hear of ETL, short for extract, transform, load, three database functions are combined into one tool to pull data out of database and place it into another. ETL is used to managed data between individual systems and data warehouses.

Uses of Databases

Data Scientists

With the advent of big data, even data scientists often use databases to access data at all levels. Data scientists often set up intermediate tables for aggregation and data cleaning. Then, they output this data for analysis. Once they finish with the data, they may load the data back into the database for reporting purposes.

Developers

It goes without saying that developers must have a good understanding of database concepts. Database skills such as SQL and performance tuning are often necessary for a developer to successfully implement systems. In these systems, transactional management is often extremely important. To be able to manage the timely data retrieval and data update is the job for a good database designer as well as a job for the developer.

Clients

Clients will use the databases through Applications. These Applications are developed by either software engineers or developers to help the clients access and retrieve data. Applications that clients use will often go through systems for caching, transformation and aggregation of the data. Even reporting Applications often have multiple middle layers where the data needs to be managed. These middle layers will retrieve and update the data from the database once or twice a day for efficiency.

Clients may be diverse. Clients can be interested in transactional data, reporting data and specialized data.

Balancing the needs of different clients is critical to good database design.

Information Flow — Original Image from Jun Wu

What does a good database design look like?

A good database design will have these specific features that yield immense benefits to the end users. These end users include clients as well as technical professionals that access the databases daily.

Data Retrieval and Data Update Benchmarks — A good database design will allow Applications to hit data retrieval and data update benchmarks such as: How many seconds it takes to click on buttons on the UI? How many seconds it takes to generate a report? How many seconds it takes to transfer data from A to B?

Database is not only there to store data but also there to serve Application needs.

Integration with Middle Layer — A good database design will integrate with the middle layers of systems seamlessly. Today’s systems will have middle layers to manage data to serve to Applications. These middle layers cut down on the access directly to the databases. They also provide for more timely access of the data. At the end of the day or multiple times a day or in real time, these middle layers access the database for data retrieval and data management. A good database design will work with the middle layer to optimize the most used functions of the middle layer. Often, a good database model is central to both integration as well as database optimization.

Integration with Downstream Systems— A good database design will also integrate seamlessly with downstream Systems. How often do downstream systems receive data from the database (nightly, weekly, monthly or realtime)? Is there a middle layer for downstream systems to access the data from the database? Do data need to be aggregated before data can be sent down to downstream Systems? These are just some of the questions that needs to be answered in the database design stage to avoid complications later on.

Simplicity and balance— A good database design is simple and balanced. I’ve seen enterprise database designs that can not be understood by the average software engineer. To me, that simply rings alarm bells.

When the database design gets out of hand, it’s probably time to rethink the design.

Rather than using one database for all the functionalities of the given Application, it’s absolutely okay to use multiple databases with different designs to facilitate different functions within the Application. Then, a middle layer can help to manage the data from different databases. Does your transactional focused Applications serve up data differently than your reporting focused Applications? Do most of your clients live downstream? Do you have a reporting Application that pretends to be a transactional Application as well?

Data Model Selection

One of the first action items for the database designer after reviewing specifications and client requirements is to select a data model that will be used by the database.

Frequently, data model will also determine the kind of database used: Mongoldb, Mysql, Postgresql, Nosql etc..

Here’s a list of popular data models:

Hierarchical Database Model

Network Model

Relational Model

Entity-relationship Model

Object Model

Star Schema

Knowing what data model to use when is intrinsic to good database design.

Planning Ahead

The database designing process should involve planning ahead. Often, it starts with system requirement, specification, design and then prototype. Key features to include during planning are: validation of each stakeholder’s data requirements, object needs, functional needs for data retrieval and aggregation, etc. This also includes the validation of each downstream stakeholder’s data requirements: data feed frequency and requirement.

Client requirements should drive most technical decisions.

Normalization

Normalization is defined as the set of methods to break down tables into their constituent parts until each table represents just one “thing”. The additive nature of summing up all the parts to make a whole is the very reason why SQL was invented. This additive nature will contribute to ease of use and ease of management. Normalization will also mean data access efficiency. Normalization will impact the performance of any processes that directly access the data. 3rd Normal Form is considered the standard.

Normalize data for easy-to-use and easy-to-maintain.

Naming Standards

Naming standards is extremely important for database design. You don’t want to second guess what objects belong in this database. Consistency is key in naming convention. Pick one convention that works for your organization. Use the same convention in different databases that you may have. Should we name a table: customer_detail, customerDetail or CustomerDetail?

Balancing clarity and consistency is key to picking the right naming convention.

Documentation

It goes without saying that a large part of database design is to produce the right documentation for the database. Each object must be accurately represented in the documentation. When I see out-dated documentation that does not include the most recent tables and fields, I cringe.

This probably means processes have been developed without any sign-off from the clients.

That is a big red flag in system management. Documentation for any database has to be comprehensive: details on all objects within the database, processes that use the databases and the reasonings behind database design decisions.

Data Integrity

Data integrity is important in maintaining any database. In each of your tables, have you set up rules that will allow for data integrity such as: nullability, string length, assignment of foreign keys etc.. Business rules need to be applied on the Application side to check for data integrity from the GUI level. However, it’s important for the database to have checks in place for all your objects if the Applications and the Systems were to fail on those checks. Trust me, failures happen all the time.

If discovery of these mistakes are made at the management level affecting management reports (way downstream), then these system failures can cause havoc on both the business and the IT side.

Once bad data populate way downstream, cleaning up the mess will involve many hours from IT support.

Mechanisms for Data Access

Direct entry into your database objects to retrieve the data is never the best option. All client systems: applications, middleware, batch systems should access data through stored procedures or other additional functional layers in the Database. By using an intermediary functional layer to access the data, it will be easier for maintenance, encapsulation, security and performance.

Isolating database objects so that functional changes are strictly maintained in the functional layers will allow for better data integrity and optimal performance.

Testing

Seasoned database professionals will tell you that testing is a must. Testing is much more than just functional testing of your stored procedures, constraints and generic functionality. A large part of testing involves the help of clients from upstream and downstream Systems or Applications.

Testing should involve:

Process Testing — testing the workflow from the client’s side.

Technical Testing — testing the functionalities inside the database for data management.

Performance Testing — testing the functionalities with a data large and small to figure out optimal performance and benchmarks.

Downstream Testing — testing functionalities for ETL.

However, the real test is really when the system goes into production. There’s nothing like real data to help to discover the real “bugs” that can creep up.