Apache Kafka has come a long way from being just a scalable data ingestion layer for data lakes. Today, it’s the backbone of real-time transactional applications. In many organizations, Kafka serves as the central nervous system that connects both operational and analytical workloads. Over time, the architecture has shifted significantly from brokers managing all storage, to Tiered Storage, and now, toward a new paradigm: Diskless Kafka refers to a Kafka architecture where no local disk storage is used by brokers. Instead, all event data is stored directly in cloud object storage such as Amazon S3, Google Cloud Storage, or Azure Blob Storage.
This shift redefines Kafka’s role; not just as a messaging platform, but as a scalable, long-term, and cost-efficient storage layer for event-driven architectures. This post explores that journey, the business value behind it, and what it means to operate Kafka without brokers.
Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various data Kafka architectures and best practices.
Kafka is now more than just an open-source project. It is become the de facto standard protocol for streaming data. Many companies still use the open-source Apache Kafka or solutions built on top of it. However, others are adopting Kafka-compatible services and products that separate the protocol from traditional broker and storage infrastructure.
This approach enables producers and consumers to continue using Kafka’s familiar APIs while relying on alternative storage solutions behind the scenes. In this new world, Kafka brokers may no longer be required for certain workloads.
As outlined in the Data Streaming Landscape, the Kafka protocol has become the foundation of modern data streaming platforms leveraging and event-driven architecture. And as storage and retrieval methods evolve, the focus shifts from infrastructure management to protocol consistency.
Some of these innovations eventually return to the open-source project. Diskless Kafka, for instance, might be added to Apache Kafka. Several KIPs are under discussion to evolve Kafka’s storage model:
All three KIPs reflect growing momentum to modernize Kafka’s storage. Though, they also show how complex and long the path to adoption can be.
But let’s go one step back first.
The introduction of Tiered Storage marked a turning point in Kafka’s evolution. It separates short-term and long-term storage by allowing Kafka to offload older data from local disks to object storage.
Business Value of Tiered Storage for Apache Kafka:
Tiered Storage helped many organizations lower their total cost of ownership while expanding the functional value of Kafka. But the journey didn’t stop there.
Tiered Storage started as a proprietary feature and is now available through an open interface in Apache Kafka. “Why Tiered Storage for Apache Kafka is a BIG THING” explores the evolution and concepts in more detail.
The next stage is more radical: Diskless Kafka.
In this model, Kafka brokers disappear completely. Producers and consumers still interact using the Kafka protocol, but the storage and control plane are entirely reimagined.
How It Works:
This approach removes the operational burden of managing Kafka brokers while maintaining API compatibility. It changes the game.
WarpStream already explained in May 2024 why Diskless Kafka is better for the end user when showcasing its architecture.
Companies are already pioneering brokerless Kafka models. Some have operated them in production for several quarters. Others are just getting started or are new startups focused entirely on this architecture.
WarpStream offers a Kafka API-compatible solution without brokers, relying fully on object storage. Deployed directly in a customer’s cloud account, it dramatically lowers infrastructure and operational costs.
WarpStream also emphasizes security and zero trust architecture based on the Bring Your Own Cloud (BYOC) concept to allow deployments within private environments such as a customer’s VPC or on-premises infrastructure. Learn more in their documentation.
I explored the benefits of Bring Your Own Cloud (BYOC) in a dedicated blog post.
Confluent has implemented this architecture within its serverless Confluent Cloud. By separating compute and storage, customers get near-infinite scalability and pay only for what they use. In some cases, this has led to up to 90% cost reduction compared to traditional clusters.
The ecosystem is growing fast, with differentiation emerging through architecture, cost models, and security approaches.
Meanwhile, more startups are entering this space. Some, like Buf or AutoMQ, already offer Kafka-compatible services built entirely on object storage, while others are just beginning to explore diskless Kafka implementations.
Aiven created KIP-1150: Diskless Topics to bring brokerless Kafka into the open source framework, following the same collaborative approach seen with Tiered Storage.
Object Store-Only Kafka without the need for brokers brings tangible benefits:
This architecture is NOT for every use case. It’s most suitable when latency requirements are moderate and workloads are centered around analytics or historical processing.
Diskless Kafka is ideal for:
The last point is particularly noteworthy. Diskless Kafka is not limited to analytical workloads. Because object storage operates differently than traditional disk systems, it can support strict durability and consistency guarantees, making it a strong fit even for critical operational and transactional applications. My article about “Multi-Region Kafka using Synchronous Replication for Disaster Recovery with Zero Data Loss (RPO=0)” explores the WarpStream implementation for this scenario.
Diskless Kafka is NOT ideal for:
That’s the main disadvantage. In summary, if you don’t need very low latency and have access to object storage, diskless Kafka might be the better choice from a value and TCO perspective.
When talking about low latency, also keep in mind that Kafka and similar competing technologies were never built for hard real-time and deterministic, safe-critical systems. Use cases such as robotics or autonomous systems are built in embedded systems with programming languages like C or Rust. Kafka is great for connecting this systems with the rest of the IT infrastructure leveraging low latency in milliseconds.
Always define what ‘real-time’ means for your use case. From a latency perspective, Diskless Kafka is sufficient for most scenarios.
Most organizations won’t replace Kafka brokers entirely. Instead, they will adopt a multi-cluster strategy to align architecture with workload requirements:
Enterprise architectures with multiple Kafka cluster are becoming the standard, not an exception! Organizations run multiple clusters optimized for specific use cases, all unified by the Kafka protocol. This enables seamless integration and consistent tooling. In my blog “Apache Kafka Cluster Type Deployment Strategies“, I explored various deployment scenarios such as multi-cloud, hybrid, disaster recovery, aggregation, edge, and more.
Whether using fully managed offerings like Confluent Cloud, brokerless alternatives like WarpStream, or hybrid deployments, teams can align infrastructure choices with their latency, cost, and scalability goals.
The shift to diskless Kafka is more than a technical evolution. It’s a strategic transformation. Kafka’s core value is moving away from broker infrastructure toward protocol standardization. The protocol has become the foundation that unifies real-time and historical processing, regardless of the underlying storage or compute architecture.
Kafka brokers and Object Store-Only Kafka deployments will coexist. This flexibility in storage backend allows organizations to support a wide range of workloads – operational, analytical, real-time, and historical – while maintaining one consistent protocol. Managed services will continue to dominate due to their ability to reduce operational complexity, and hybrid or edge deployments will become more common in industries like manufacturing, automotive, and energy.
Startups are pushing the boundaries with Kafka-compatible solutions that bypass traditional brokers entirely. At the same time, Kafka contributors are advancing efforts to modernize storage through multiple competing KIPs for diskless Apache Kafka. KIP-1150 from Aiven proposes diskless Kafka, KIP-1176 from Slack introduces fast-tiering via cloud-based Write-Ahead Log (WAL), and KIP-1183 from AutoMQ outlines a vendor-specific approach to shared storage. While each proposal targets similar goals – decoupling Kafka from local disks – they take different technical paths, adding to the complexity and extending the timeline for consensus and adoption.
Still, this diversity of approaches highlights a broader shift: Kafka is evolving from a tightly coupled broker-based system toward a protocol-centric architecture. Recognizing all three proposals offers a more balanced view of this transition, even if the community ultimately consolidates around one direction.
Companies that embrace this shift to Diskless Kafka will benefit from lower infrastructure costs, easier operations, and highly scalable streaming platforms. All of this comes without sacrificing compatibility or vendor neutrality – thanks to the Kafka protocol-first approach.
Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various data Kafka architectures and best practices.
Apache Kafka is the backbone of real-time data streaming. Choosing the right deployment model -…
The automotive industry is rapidly shifting toward a software-defined, data-driven future. Real-time technologies like Apache…
Pinterest uses Apache Kafka and Flink to power Guardian, its real-time detection platform for spam,…
Agentic AI goes beyond chatbots. These are autonomous systems that observe, reason, and act—continuously and…
Global supply chains face constant disruption. Trade conflicts, wars, inflation, and shifting regulations are making…
The shift from Lambda to Kappa architecture reflects the growing demand for unified, real-time data…