Beyond Kafka: The Rise of Kafka-Compatible Alternatives

Sanjeev Mohan
8 min readSep 25, 2024

--

With Confluent’s purchase of WarpStream, the world of Kafka is once again at a crossroads. If Franz Kafka were alive, he might write about Apache Kafka’s Metamorphosis in 2024. It’s been nearly 15 years since Apache Kafka was open sourced, and it shows. Sure, Kafka has been evolving, but it has been hard and costly to manage. In this blog, Rob Meyer (VP of Marketing, Estuary) and Sanjeev Mohan explore the state of Kafka and its future.

Is Kafka still relevant?

Kafka is still a popular and effective choice for handling real-time data as businesses generate vast amounts of data that must be processed, stored, and analyzed in real-time to remain competitive. Its core strength — reliable, high-throughput streaming at scale — continues to serve critical real-time data needs across industries.

However, Kafka’s heavy infrastructure and operational overhead led many people to ask, “what if we built a replacement for Kafka that supported the Kafka APIs?”

The industry has already witnessed multiple Kafka-like alternatives, and with competition comes innovation. These alternatives have already changed the messaging landscape: each Kafka implementation and the technologies on top of them support many different use cases.

Kafka’s Origins

Let’s start with a quick recap of Kafka’s history.

Jay Kreps, Neha Narkhede, and Jun Rao originally developed Kafka at LinkedIn in 2010 to handle the growing need for reliable, scalable data streaming across its platform. It was open-sourced in 2011 and became part of the Apache Software Foundation in 2012, quickly gaining adoption for its distributed, high-throughput architecture that enabled real-time event streaming at scale. In 2014, key Kafka contributors founded Confluent, a company dedicated to developing Kafka and offering managed services.

Its design, inspired by messaging systems like Apache ActiveMQ and Oracle’s AQ, combined the best aspects of messaging queues and log storage, making it a versatile and efficient tool for real-time data processing. Over the years, Kafka evolved from a messaging system to a full-fledged streaming platform, enabling businesses to process, store, and analyze real-time data flows. Kafka Streams was introduced in 2016 for real-time data processing. Tiered Storage was added in 2020, which allowed better and cheaper scalability by decoupling storage from compute. Freight Clusters was the most recent addition in May 2024. It writes directly to cloud storage for true decoupled storage-compute.

The Rise of Kafka Alternatives

The first Kafka-like competitor was Apache Pulsar, which was developed by Yahoo and contributed to the Apache Pulsar project in 2016. It supported both log-based messaging (like Kafka) and queue-based messaging (like IBM MQSeries and many other traditional messaging products). Pulsar 2.0 added tiered storage, its first step in decoupling storage and compute, in 2018. Pulsar also decouples serving (reads) from storage (writes) so that they can scale independently, which is more efficient than Kafka.

In 2019, Amazon announced Amazon Managed Streaming for Apache Kafka (Amazon MSK), which was based on real Kafka code but supported on AWS cloud. Other hyperscalers followed suit.

Also in 2019, Redpanda delivered a Kafka API-compatible, simpler, lighter-weight, and faster product. It is a single-binary architecture with no external dependencies like ZooKeeper, making it easier to deploy and manage, and reducing operational overhead. It is also written in C++, while Kafka is written in Java and Scala. Redpanda has several other optimizations that contribute to its great performance, such as its use of the Seastar framework, which allocates one thread to each CPU core.

Soon after Redpanda’s founding, StreamNative and OVHCloud contributed Kafka on Pulsar (KoP) to open-source in March 2020. KoP enables Pulsar brokers to support the Kafka protocol.

Confluent responded to these events first with Tiered Storage. This makes Kafka easier to scale by turning each broker’s message storage into something more like virtual memory for a remote store. It’s a first step in decoupling storage and compute to make software more cloud-native. Then in 2023, Confluent released Kora, a cloud-native version of Kafka as a managed cloud service.

But it wasn’t enough. RedPanda provided another thing Confluent wasn’t providing; the ability to drop cloud-native Kafka into your own cloud of choice in your account, what most people now call “bring your own cloud” (BYOC). In the BYOC model, clients are able to maintain the data plane in their own account, while the control plane runs in the vendor’s account.

It’s clearly what the market wanted out of Kafka. RedPanda is now valued at over $500M after growing revenues 300% in 2024. Confluent, which IPOd with a valuation of over $11 Billion, is at a near low of $6.5 Billion or so after growing revenues slightly over 25% YoY and approaching $1 Billion in revenues within 12 months.

Kafka Transforms

If anyone was at Confluent’s industry event called Current they would have seen the exponential growth in the companies offering solutions for real-time data. Not only is this ecosystem growing rapidly, but it is also transforming. Just this year several things happened:

  • March: Confluent launched TableFlow, Kafka streaming into Iceberg, which starts to make Confluent more directly relevant to the lakehouse and data management business.
  • May: Confluent introduces Freight Clusters, truly decoupled storage-compute that can cut the (networking) costs of Kafka by up to 90%, but results in much higher latency.
  • May: StreamNative releases Ursa, Kafka-compatible streaming with lakehouse storage.
  • September: Confluent acquires Warpstream to add a BYOC offering, and possibly to help eliminate a competitor to Confluent’s Freight Cluster offering.
  • September: Estuary announced Dekaf for real-time analytics.

Kafka has transformed into an ecosystem of Kafka variants built around the Kafka API and client wire protocol, something that’s bigger than the Apache Kafka code. In fact, this ecosystem looks a lot like the PostgreSQL ecosystem, where you can choose the variant that best meets your needs.

The New Kafka Ecosystem Landscape

You may be thinking “like Postgres? There are over 50 Postgres variants, and it’s componentized.” But the number of Kafka variants has already reached double digits, and many variants are pretty extensible.

Figure 1 shows a partial, ever-growing list of variants and the technologies added by these Kafka providers on top of Kafka for stream processing, ETL, and analytics. This list does not include the much larger list of event streaming platforms (ESP), streaming or analytical databases, or unified real-time platforms (URP) that do provide stream processing, analytics, and ETL as well. For that list you can read Untangling the streaming landscape: the rise of the Unified Real-time Platforms. These Kafka variants are used in many cases to feed URPs, ESPs, and streaming/analytics databases.

Please note that this diagram is meant to be representative.

First, there’s Confluent, which has three variants: open source Kafka, Kora which is their cloud-native version of Kafka, and the recently acquired Warpstream for BYOC.

Then there are the outside Kafka alternatives to Kafka. Just like Kora and Warpstream, they were created because open-source Kafka is hard to maintain and scale, and is costly. Each company has solved some of Kafka’s problems in a different way. They’re all worth a good, long look.

  • Azure EventHubs — Kafka and other messaging support integrated with other Microsoft technology
  • Apache Pulsar, which is Kafka-compatible, and vendors like StreamNative, Datastax, and OVHCloud
  • Redpanda — positioned as faster, cheaper, simpler Kafka
  • Upstash — serverless Redis, vector db, Kafka, and HTTP-based messaging in one

There’s also hosted Kafka, often as part of larger offerings, such as:

  • Aiven
  • Amazon Managed Service for Kafka (MSK)
  • Datell
  • DoubleCloud
  • Google Cloud Managed Service for Apache Kafka
  • Instaclustr (NetApp)
  • Keen
  • OVHcloud
  • Temok

Any one of these vendors could do something similar to what Confluent did by releasing their own cloud-native version of Kafka, and move customers onto it.

Finally, there are the Kafka-like technologies that offer Kafka API compatibility, mostly to connect to the Kafka API ecosystem, but also to simplify specific use cases. In short, the Kafka API has started to become a protocol anyone can use, with or without Kafka.

Innovation on top of Kafka

There have been several projects that were built on top of Kafka for stream processing (see Figure 1).

Apache Samza was one of the first. Samza slowed down. But its innovation and concepts eventually led to Kafka Connect (for connectivity) and Kafka Streams (for processing). Apache Flink has also emerged as an alternative. Debezium, first released in 2019, added change data capture (CDC) sources to Kafka.

But more recently less Kafka-centric technologies started to enter the Kafka ecosystem as well. Estuary Flow is modern real-time data integration. But its origins go back to the early days of Kafka, even before Pulsar. The founders of Estuary first built Gazette, an open source cloud-native messaging project, in 2014. It was one of the first to decouple message logs, which it calls collections, from brokers and store them in cloud storage. It’s a best practice that was only recently implemented by Warpstream and Confluent Freight Clusters. The same founders then built Estuary Flow, a real-time CDC and ETL cloud service that is also open source, on Gazette.

Estuary just released Dekaf, which adds 100% Kafka consumer API compatibility that makes it easy for any destination to connect to Estuary as if it were a Kafka cluster. Dekaf is literally like decaf coffee; Kafka messaging without Kafka.

Estuary implemented the Kafka API so that it could take advantage of all the Kafka support without requiring the overhead of Kafka. Flow can now more easily replace Debezium, custom code, and ELT or ETL technologies with a single data pipeline. But Dekaf also allows Flow to be used as a last mile in-between Kafka deployments and destinations to manage all the data pipeline details that are important for real-time analytics.

Several real-time analytics vendors — including Bytewax, ClickHouse, Materialize, SingleStore, StarTree, and Tinybird — are partnering with Estuary because Flow allows them to access all the different real-time and batch-based sources and features of Estuary through their existing Kafka API support.

What’s next for Kafka?

Perhaps the right question is what’s next for the Kafka API, which is really the de facto standard.

The newer vendors are commoditizing Kafka itself, making it simpler and cheaper. Confluent and other Kafka vendors will need to grow beyond Kafka. This trend isn’t new. Application integration was also built on JMS messaging, and eventually turned it into hidden plumbing. Several data integration vendors use Kafka as well. Eventually the only thing developers may see is the Kafka API.

What will happen to Confluent?

In Confluent Cloud, the data plane and the control plane run in its own account. However, Confluent’s acquisition of WarpStream gives them the latter’s state-of-art BYOC solution. But, Confluent was already building its own BYOC offering called Freight Clusters.

So, the question is why the need to acquire WarpStream? Beyond accelerating its BYOC market growth, it may also help Confluent get into other markets as well, such as managed data lakes. This is akin to how by adding data management, Databricks grew beyond its Apache Spark offering. In today’s market, no company wants to be known as the provider of a single product.

Confluent is marketing Tableflow, which writes Kafka streams into Iceberg as a destination. Confluent has a big opportunity as a real-time data platform as it launched many enterprise-ready features at Confluent Current ’24. Hopefully, Confluent will become a major streaming data management and AI vendor as they look to grow beyond Kafka.

What’s not clear is how the Kafka API will evolve. New functionality is rapidly evolving around the Kafka API. Will Kafka fold some of these new features in and make them part of the de facto standard? Will Confluent continue to drive Kafka in the right direction? Will it expand around the Kafka API or will these new technologies create entirely new standards?

Perhaps unlike Franz Kafka’s Metamorphosis, Apache Kafka’s metamorphosis will get a sequel.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.