The way in which we handle data and build applications is changing. Technology and development practices have evolved to a point where building systems in isolated silos is somewhat impractical, if not anachronistic. As the cliché of a cliché goes, data is the new oil, the lifeblood of an organization—however you want to analogize it, the successful handling and exploitation of data in a company is critical to its success.

In this article, we’re going to see a practical example of a powerful design pattern based around event-driven architectures and an event streaming platform. We’ll discuss why companies are identifying real-time requirements and adopting this instead of legacy approaches to integrating systems.

Part of this shift comes from the blurring of lines between analytic systems and transactional systems. In the old days, we built transactional systems on which our business ran, and then a separate team would be responsible for extracting the data and building analytics (or BI, or MIS or DSS, depending on your age) on that. It was done in an isolated manner and was typically a one-way feed. Analytics teams took the data, on which processing was done, to cleanse and enrich it before building reports.

Now, we have the beginnings of an acceptance that data is not only generated by systems but when combined with other data and insights can actually be used to power systems. It’s about enabling the integration of data across departments, teams and even companies, making it available to any application that wants to do so and building this in such a way that is flexible, scalable and maintainable in the long term. And with today’s technology, we can do so in real time, at scale.

For example, retail banks are faced with the challenge of providing a compelling customer experience to a new generation of users, whose expectations are shaped by AI-powered mobile applications from Google and Apple. In order to meet this challenge, they need to combine everything they know about their customers with state-of-the-art predictive models, and provide the results to mobile apps, web apps and physical branches. ING Bank and Royal Bank of Canada use event-driven architectures to power modern systems that do exactly that. Let’s dive into the technologies and design patterns that enable them to transform their user experience and development process at the same time.

ETL but not as you know it

First, a bit of history about ETL. It got its start back in the 1970s as more businesses were using databases for business information storage. The more data they collected, the greater the need for data integration. ETL was the method chosen to help them take the data from all these sources, transforming it, and then loading it to the desired destination. Hence the name ETL (extract, transform, load).

Once data warehouses became more common in the 80s and 90s, users could access data from a variety of different systems like minicomputers, spreadsheets, mainframes and others. Even today, the ETL process is still used by many organizations as part of their data integration efforts. But as you will see, ETL is undergoing a fascinating change.

Consider a simple example from the world of e-commerce: On a website, user reviews are tracked through a series of events. Information about these users such as their name, contact details and loyalty club status is held on a database elsewhere. There are at least three uses for this review data:

Customer operations: If a user with high loyalty club status leaves a poor review, we want to do something about it straight away to reduce the risk of them churning. We want an application that will notify us as soon as a review meeting this condition is met. By doing so immediately, we can offer customer service that is far superior than if we had waited for a batch process to run and flag the user for contact at a later date.

Operational dashboards showing a live feed of reviews, rolling aggregates for counts, median score and so on—broken down by user region, etc.

Ad hoc analytics on review data combined with other data (whether in a data lake, data warehouse, etc.): This could extend to broader data science practices and machine learning use.

All of these use cases require access to the review information along with details of the user.

One option is to store the reviews in a database, against which we then join a user table. Perhaps we drive all the requirements against the data held in this database. There are several challenges to this approach, however. Because we are coupling together three separate applications to the same schema and access patterns, it becomes more difficult to make changes to one without impacting the others. In addition, one application may start to impact the performance of another—consider a large-scale analytics workload and how this could affect the operational behavior of the customer ops alerting application.