Do you sometimes feel that you are not the first person in the world who implements sending data from Kafka to PostgreSQL, ElasticSearch, Redis etc.? Or fetching the data from these sources and putting it into Kafka. As always, when implementing such use cases on your own, you will encounter difficulties and bugs which you will need to deal with. But probably someone implemented something similar before, tested, fixed bugs and maybe even deployed to a production system. So why not re-use it? We may save plenty of time and focus for example on business logic. Sounds great, right? This is where Kafka Connect comes into play.

In this blog, I would like to mostly focus on showing you practical usage of Kafka connectors and briefly present some basics needed to start working with it. However, first we should know what Kafka connect is:

Kafka Connect, an open-source component of Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. Using Kafka Connect you can use existing connector implementations for common data sources and sinks to move data into and out of Kafka.

I think the definition from Confluent docs is pretty simple, clear and explains the main concepts of Kafka Connect. So let’s now see how to use it.

How to use Kafka connectors

In the following example (you can find all the source files here) we will be generating mock data, putting it into Kafka and then streaming to Redis. To achieve that, we will use two connectors: DataGen and Kafka Connect Redis. Both are available in the Confluent Hub.

What we need to do first is to set up the environment. Here you may find YAML file for docker-compose which lets you run everything that is needed using just a single command:

docker-compose up -d --build

Let’s take a closer look at this YAML file. As you may notice, the first five sections are responsible for starting Zookeeper, Kafka broker, Redis, Confluent Control Center (a nice web-based tool for managing and monitoring Apache Kafka) and creating topics. However, the most interesting part for us is the Connect section.

Connect section

As you may notice, the aforementioned section is responsible for:

building a Docker image based on Dockerfile. Our custom Docker image will extend Confluent’s Kafka Connect image (cp-kafka-connect-base) and contain two connectors, taken from Confluent Hub.

setting required environment variables. A detailed description of the required variables used in the Kafka connect image can be found here

When all containers are running, we can configure the connectors using a dedicated REST API, just by making two calls :

DataGen configuration

DataGen configuration

Kafka Connect Redis configuration

Redis Sink configuration

Now it would be good to verify whether our solution is working as expected. We can do it using the Confluent Control Center, which gives us the possibility to monitor our Kafka broker, topics, consumers, connectors etc. To access it, navigate to the host and port where it’s running — in our case it would be http://localhost:9021/.

“To-redis” topic details

Available connectors (Redis Sink, DataGen)

Redis Sink consumption details

To check if new keys are available in Redis, use the following command:

docker exec my-redis sh -c "redis-cli -n 1 KEYs '*' "

So in a full working example, we should be able to see newly emerging messages in “to-redis” topic and new keys (with the User_ prefix) in our Redis data store.

Should we use Kafka Connect in a real-life project?

The example above should give you a general overview of how we can use Kafka connect in our projects. But the question is: should we do this? The answer would always be project-specific, but going through the following list of advantages and disadvantages could help you to make the right decision.

Advantages :

we can stream data from Kafka topics to a target store (Sink Connectors) or stream updates in a data store to Kafka topics (Source Connectors), without making any line of code, which we would need to maintain

integrates optimally with the stream data platform

connectors can be managed through Confluent Control Center

we may use a REST API to create, update and delete Kafka connectors

we can easily integrate with Confluent Schema Registry, which provides centralized schema/APIs management and compatibility

a nice and clean way of deployment (for instance if we are using Kubernetes, we can define Helm charts for our dockerized connectors)

Disadvantages :

the available connectors may have some limitation (sometimes we will need to deep dive into the connector’s documentation and check does it will meet with our requirements)

separation of commercial and open-source features is very poor

connectors documentation could be poor and may have some lacks

often no information about which connectors have been battle-tested in a real-world application

Conclusion

In my opinion, Kafka Connect is a great framework which helps to create modularly, loosely coupled and scalable system architecture contained components with clearly defined responsibilities. Using one of the existing connectors we can significantly reduce implementation time and probably avoid bugs. So, following the “work smart, not hard” maxim, Kafka Connect is worth to consider.