Picture by @laughayette

Over the last few years Apache Kafka has been adopted by a lot of companies. Some claim that it is one of the most popular tools in the world. Kafka has a few applications, ranking from simple message passing, via inter-service communication in microservices architecture to whole stream processing platform applications. Today let’s see which companies use Kafka and what are their use cases for it.

Activision

Do you know the computer game series called Call of Duty? Activision is a company that created it. In one of their presentations it is shown what problems they had with Kafka and how they overcame them.

Summary

Activision has over 1000 topics in their Kafka Cluster and they handle between 10k and 100k messages per second. Various informations is being sent, including gameplay stats like shooting events and death location. Naming the topics was a challenge, but they came to the conclusion that the name should not express who produces or consumes the data, but rather should indicate the data type. Activision leverages various data formats and has its own Schema Registry written with Python and based on Cassandra. They use message envelops constructed with Protobuf.

Video

Slides

Tinder

Tinder, a dating app, leverages Kafka for multiple business purposes. Various processes are based on Kafka Streams. Among them you can find:

notifications scheduling for onboarding users (e.g. to upload a profile photo),

analytics,

content moderation,

recommendations,

user activation,

user timezone update process,

notifications,

and others.

Tinder sends over 86B events per day, what gives around 40TB data/day (info from 2018). Kafka allowed them to save over 90% comparing to AWS SQS/Kinesis. To get more information, take a look at their presentation from Kafka Summit 2018.

Watch the video at Confluent’s website.

Slides

Pinterest

Pinterest is being visited monthly by 200M+ users. There are over 100B+ pins and 2B+ ideas are searched monthly. Kafka is leveraged for multiple processes. Every click, repin or photo enlargement results in Kafka messages. Kafka Streams are used for content indexing, recommendations, spam detection but, what is most important, also for real-time ads budgets calculations.

Watch the video at Confluent’s website.

Slides

Uber

Uber requires a lot of real-time processing. They handle trillion+ (info from 2017!) messages per day over tens of thousand of topics. This amount results in data volume calculated in petabytes. Many processes are modeled using Kafka Streams, even so important ones like customer and driver matching, together with ETAs calculations or the auditing.

From technical point of view, Uber leverages their REST Proxy which is a fork of the Confluent one. They improved performance and reliability. Kafka is used mostly in at least once manner, so no data is lost. Batching capabilities are used to achieve better throughput. Data is divided into regional Kafka Clusters, which data is later replicated using their own tool called uReplicator.

Video

Slides

Netflix

Netflix leverages multi-cluster Kafka clusters together with Apache Flink for stream processing. They handle trillions of messages per day. What is interesting Netflix has chosen to use two replicas per partition, additionally enabling unclean leader election. This improves the availability but can cause a data loss. That is one of the reasons Netflix created their own tracing tool Inca, which can detect lost data. It offers related metrics and validate if pieces of infrastructure delivers the required processing guarantees (e.g. at least once).

To get to know more about Inca, take a look at the Netflix blogpost.

LinkedIn

Apache Kafka originates at LinkedIn. It was actually created to solve their challenges with systems related to monitoring, tracing and user activity tracking. Nowadays LinkedIn handles 7 trillion messages per day, divided into 100 000 topics, 7 M partitions, stored over 4000 brokers. They leverage REST Proxy for non-Java clients and Schema Registry for the schema management. LinkedIn has their own patches and releases of Kafka, so that they can get some features earlier, before they get accepted to the official packages.

Latest info about LinkedIn and how they leverage Kafka, can be found in the blogpost “How LinkedIn customizes Apache Kafka for 7 trillion messages per day” from October 2019.

Conclusions

As you can see Kafka is used by various companies, often in business processes involving large amounts of data. It is able to scale almost linearly handling billions or trillions of messages. Quite often you can observe that Kafka Streams or Schema Registry are being used together with Kafka. If you’d like to see more of who uses Kafka, then take a look at Powered By section at Kafka documentation.