Strategy #2 — Sync all data in a centralized database

If you have a data warehouse you may be feel tempted to use it but, here’s some details you have to pay attention.

#2.1 Different types of data source

If you have components of different data source types in your architecture, for instance, some of them using PostgreSQL, MySQL others using MongoDB, Redis, Cassandra, Neo4j… It can be hard to sync all of them in a single centralized database due to different formats.

#2.2 Racing conditions

The sync takes some time to occur, then when a service tries to access information that was not yet synced, that service will work with an outdated version of the data and you will start to experience some random problems. The common solution is to add some delay to request that information. Actually it’s not a real solution because it brings other problems.

#2.3 Changes in the service’s schema

When you change the schema of some service, it will be synced, and then, other services that relies in the old format will break. You can minimize this problem by writing a giant integration test that runs after changes on each component, but a change in one service should not impact other service (See Componentization via Services).

#2.4 Change the type of data source

You may need to change the type of your data source, for example from a document to a relational (here at GetNinjas we had a case like that). This change will impact all the services using this data.

#2.5 Dynamic calculated information

There are some cases that the information is not persisted but is calculated before the use, like the URL of an user’s avatar. This type of information lives inside the application in places like Decorators, Presenters, config files, not persisted in the database and obviously it will not be synced, so, in this strategy you will have to duplicate the logic to accomplish the same result URL.

#2.6 Duplication of common queries

Let’s say you have a query that returns the top five similar products and other service want to use it. In a synced database strategy you will need to duplicate that query, the problem with it is obvious, when the query is changed, you will need to change all versions too.

#2.7 Data format

It's a good practice to store the data in raw (without format) and format them when displaying. Using an API to integrate systems you can also return data formatted and the logic to format that data is not spread out the entire architecture. In this strategy you will have to save the data formatted or reimplement the logic.

Sync databases in one big place can be a good choice for analysis purpose, but to share data between services, not so much.