Imagine you have a fully operational system, and you want to create an event-driven pipeline on top of it. If you have seen the future from day one, and designed a decoupled system with events triggered for all user operations — then you have great starting point, as each event can reflect the data before and after its occurrence. But what if you simply don’t have the events in place?

A naive solution for this problem would be to ask every developer to go back to any feature ever developed, and distribute an event exactly in the right place. You see why this solution is not realistic — not only could it take months of work, but that work may be error-prone. A Sisyphean task, which would not be appreciated by developers, to say the least.

So what’s the alternative? Change-data-capture (CDC) came to the rescue. CDC can be viewed as a design pattern for identifying and collecting database record changes. Every DML operation in your database is captured, and that allows you to seize any insert, update, or delete operation and use them for your own purposes.

It’s recommended to deliver database record changes to a streaming platform of your choice — Apache Kafka, in our case. You may use one of the many Kafka Connectors to continuously sink data from a database to a Kafka topic. Connectors for relational and NoSQL databases can also be used to identify record changes, and in turn send their details to Kafka.