Micro-service architecture, has a lot of advantages. However, it also adds up a lot of complications in the infrastructure. Consider sharing of data between two or more parts of your system.

In a single codebase/deployed state/monolith architecture, the data could directly be shared between two parts of the system using function calls, or accessing the datastore directly. Since, there is just one state of code present which handles the entire product.

However, in a micro-service architecture, the code & data reside in different machines/services/containers/urls hence, there are multiple challenges:

What should be the model (design/structure) of the data being shared between services?

How to effectively communicate updates through services?

Where should the updates be made, can any service make an update to any data? or only the owning (service which is the primary owner of the data) service?

How and for how long to cache any data from a different service?

How to keep the system Highly available, Consistent, low on latency and loosely coupled at the same time.

F or the later part of the document, we will consider two services, Owning service, which is the owner of the data being shared & a Dependent service which needs to consume the data from another owning service

There are multiple solutions to these, each one addresses a different part and depending on the use case, we should pick the solution which works best for us.

Use a common datastore

Use the same datastore(SQL/NoSQL/Graph) between all services or atleast services which are dependent (but again, if they are very much dependent, should they be separate services, should be our first question).

Advantages

Highly consistent , the services will always have the most updated data with each other at all times.

, the services will always have the most updated data with each other at all times. Low latency , no network trip added across services to fetch the data.

, no network trip added across services to fetch the data. Simple to implement. This is a major win for this method, as projects generally start from the same codebase and datastores. Small companies/projects should seriously consider this, saves time & money (use them better elsewhere).

Disadvantages

Tight coupling between services, any change in the schema has to be communicated to each service.

between services, any change in the schema has to be communicated to each service. A lot of checks & flags are required for backward compatibility, which becomes a huge overhead over time.

Not highly available (single point of failure), any maintenance or outages, cascades right through the entire system, since every thing relies on the same datastore.

Pull data -> Cache -> Sync updates

Have separate datastores for each services. The dependent service can pull the data on demand from the owning service via an API call and then cache it. The updates of data are pushed to all the dependent services by the owning service via API calls.

Advantages

Loose coupling for dependent services, as they can independently do changes in their schema, without affecting the others

as they can independently do changes in their schema, without affecting the others The system becomes some what Available, since failure of the dependent service, does not affect the owning service (can opt in to ignore failures), the vice versa is not true though.

since failure of the dependent service, does not affect the owning service (can opt in to ignore failures), the vice versa is not true though. In case of cache hits, the latency is pretty low.

Disadvantages

High network latency is added between operations in case of a cache miss, as data has to be fetched over network.

is added between operations in case of a cache miss, as data has to be fetched over network. Eventually consistent

Tight coupling for the owning service, since it has to keep track of all dependents to effectively communicate updates.

for the owning service, since it has to keep track of all dependents to effectively communicate updates. After a downtime in a dependent service, all the updates have to pulled from the owning service, thereby bombarding it with queries. If multiple services are down, this could result in an internal DOS attack.

Always pull the data

Have separate datastores for each services. The dependent service pulls the data every time on demand from the owning service via an API call. No updates are pushed from the owning service ever.

Advantages

Loose coupling between both the services.

between both the services. Highly consistent , as always up to date data is pulled.

, as always up to date data is pulled. Easy to implement.

Disadvantages

High latency since, every operation requires a fresh data fetch

since, every operation requires a fresh data fetch Not highly available as owning service’s disruptions or downtimes, basically bring the dependent service down

Pull data -> Subscribe to a topic for updates

Have separate datastores for each services.

Any data required by the dependent services can be fetched for the first time and then cached. All updates/addition of the data are published to a channel/queue as an event/message using a message broker. The dependent services can subscribe to these channels and update the state of their data accordingly.

Each event/message should have a correlation ID which can be used to trace requests across services. Also, all events/messages published should have timestamps denoting last update or cache invalidations etc.

Events/messages which are not processed have to be pushed to a Dead Letter Queue

Advantages

Loose coupling between the services. Owning service does not need to know anything about the dependent ones. Any new dependent service can plug into the system at any time by just subscribing to a channel.

between the services. Owning service does not need to know anything about the dependent ones. Any new dependent service can plug into the system at any time by just subscribing to a channel. Highly Available , as no data sync is required between services. Hence if the dependent service is down, all the events will stay in the queue waiting. Once the service is up, it can process the events in order.

, as no data sync is required between services. Hence if the dependent service is down, all the events will stay in the queue waiting. Once the service is up, it can process the events in order. Low latency , owning service does not need to push updates. Dependent service are subscribed to channels, hence will always have the most up to date data.

, owning service does not need to push updates. Dependent service are subscribed to channels, hence will always have the most up to date data. Eventually Consistent, depending on the use can be a disadvantage as well.

Disadvantages

Quite complex to implement and maintain.

Requires the system to scale well, since throughput increases by a lot, as every update is being published.

Needs a lot of monitoring & alarms on the infrastructure.

The last approach is ideally the best, however, you should look at all the advantages and disadvantages and use the one that suits best for your current scale and infrastructure.