Why is Kafka so popular?
February 15, 2021•304 words
Kafka is popular for three main reasons:
- It adds an abstraction layer between producers and consumers of data streams.
- It is highly performant and allows real time data processing.
- It scales infinitely by adding more machines to the cluster.
Abstraction layer between producers and consumers
Without an abstraction layer, producers would need to know where and how to send data to consumers or consumers would need to constantly poll producers for new data. With so many different types of use cases, it can be hard to figure out a format or cadence that works for all.
Using Kafka simplifies the number of integrations required as more services come online. In a scenario where there were 3 producers and 3 consumers, there would be 9 integration links without Kafka (O(N * M)) and only 6 with Kafka (O(N + M)).
It also means that services don't need to be online for the data to be pushed / pulled - Kafka acts as a buffer for messages to be processed.
It is highly performant / real time data processing
Event data and messages can be processed via Kafka in real-time (<10ms). This enables powerful scenarios such as processing user activity to power recommendation systems or alerting.
It scales infinitely
Big Fortune 500 companies are able to use Kafka to process millions of messages per second. Once set up, a cluster can scale to the workload's needs. New nodes in the cluster efficiently pick up work from other nodes.
Kafka is not only a viable transport solution, it is also capable of storing data for a long time (you just need the disk resources). You can replace entire consuming services, have them process old messages and compare the results against the old service to determine which is better.
https://www.infoq.com/news/2020/07/confluent-infinite-storage
https://www.infoq.com/articles/democratizing-stream-processing-apache-kafka-ksql/
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/