Apache Kafka is a powerful foundation for event-driven architectures. It enables true decoupling between producers and consumers. A single event can be consumed by multiple services, each evolving independently. The pull-based consumption model in Kafka is a key enabler of this flexibility, giving consumers full control over how and when to process data.
This architecture works well in most deployments. In fact, in many real-world scenarios, having multiple independent consumers per Kafka topic is the right pattern. It aligns with microservices best practices and supports autonomous teams building independent applications leveraging domain-driven design (DDD).
However, as data volumes grow and the number of consumer applications increases, organizations start hitting scalability and operational limits. Companies like Wix and Uber have published excellent insights into the problems they encountered, and how they adapted Kafka consumption for their scale.
This article highlights those challenges, explores emerging solutions, and outlines what data streaming vendors could offer to make large-scale consumption more efficient – without losing the benefits of Kafka’s core architecture.
Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various data Kafka architectures and best practices.
While Kafka brokers scale horizontally with ease, consumers introduce complexity at scale. Let’s look at the key challenges.
Wix observed that spinning up many microservices, each with its own Kafka consumer group, significantly increased cluster load and costs. More consumer groups meant more partition assignments, more metadata overhead, and more compute required across the platform.
This model also led to resource fragmentation. Each service needed its own scaling logic, retry handling, and monitoring. The operational burden grew rapidly.
Kafka guarantees message ordering within a partition, which can create bottlenecks. A single slow or failing message – often called a poison pill – can block all following records in that partition. This is known as head-of-line blocking.
Without parallelism or proper isolation, one bad message can impact SLAs for critical pipelines.
Handling failures at scale is difficult. Many companies implement dead letter queues (DLQs) to isolate problematic messages. But Kafka doesn’t have a built-in DLQ mechanism. Teams have to build custom logic to identify failures, forward them to DLQs, monitor the flow, and reprocess data.
Managing DLQs across hundreds of services becomes an architectural burden and an operational risk.
Handling failures and implementing dead letter queues at scale demand thoughtful solutions. The blog post “Error Handling via Dead Letter Queue in Apache Kafka” outlines common DLQ patterns, trade-offs, and resilient design approaches for Kafka-based systems. It features in-depth case studies from Uber, CrowdStrike, Santander Bank, and Robinhood, showcasing how each organization tackled error handling in production environments.
Kafka consumers pull messages at their own pace. This is ideal for true decoupling using a domain driven design with microservices and data products. It remains the best approach for many scenarios, regardless of whether the communication protocol is streaming, request-response, or message queue.
But it creates challenges when downstream systems are under pressure. For example, a consumer may pull and process messages faster than a database or external API can handle.
Without explicit rate-limiting or buffering, services may crash or trigger cascading failures.
To address these issues, large-scale Kafka users are adopting two main strategies: centralized proxy layers and enhanced consumer libraries. In some scenarios, last-mile fan-out approaches are used instead for mobile, web, or IoT clients that require internet-scale delivery.
Kafka’s pull-based model enables flexibility, but some organizations are evolving their consumption architecture for greater efficiency. Both Uber and Wix introduced a push-based consumer proxy to decouple ingestion from processing. More details:
Before I go on with the details: Don’t (try to) solve a problem that you do NOT have yet!
This challenge is mainly seen at extreme scale deployments. Uber processes trillions of messages and multiple petabytes of data per day. Wix runs 7% of the internet’s websites. This is extreme scale with different challenges compared to what most organizations face.
How It Works:
Here is the high level architecture from Uber’s Kafka Consumer Proxy using gRPC and message queues:
Benefits:
Trade-offs:
Here is a deeper look into Wix’ push-based Kafka Consumer Proxy using gRPC:
This pattern works well for companies with dedicated platform teams and extreme consumption throughput. Wix reported a 30% reduction in Kafka costs. Uber improved fault isolation and throughput by decoupling processing from Kafka itself.
But this solution isn’t a fit for every organization. For small and medium Kafka deployments, building a consumer proxy adds unnecessary complexity, overhead, and risk.
For many teams, improving how consumers process messages is more practical than introducing a proxy. This is where Confluent’s Parallel Consumer open source library comes into play.
What It Does:
Benefits:
Trade-offs:
Here is the Kafka consumer architecture without the parallel consumer:
And the solution architecture using the Confluent’s open source Parallel Consumer library for Apache Kafka:
The Parallel Consumer is a strong fit for teams that want to (and can) stay in the Kafka consumption model but need better performance and flexibility when connecting to slower external services like a database or HTTP request-response API.
Another important pattern extends beyond internal services: last-mile fan-out. Many organizations need to deliver curated Kafka data directly to millions of web browsers, mobile apps, or connected IoT devices. Specialized push-based proxies provide this capability for internet clients by maintaining long-lived WebSocket connections and pushing updates only when data changes.
In the IoT space, MQTT brokers serve a similar role, connecting Kafka pipelines to fleets of sensors, vehicles, or machines.
These delivery layers handle critical concerns like backpressure, reconnection, and state snapshots at scale. By offloading internet- and device-scale distribution to dedicated systems, Kafka consumption on the backend remains clean and efficient. This separation of concerns aligns well with organizational structures, where dedicated teams often manage mobile or IoT delivery platforms while Kafka continues to power the enterprise event backbone.
Kafka’s architecture is powerful because of its decoupling and pull-based model. These are not weaknesses – they are strengths in most scenarios and differentiators to traditional message brokers like IBM MQ or RabbitMQ. The key is to choose the right model for the right use case.
Still, there is room for improvement. Here’s what a data streaming platform could include to avoid the need of external proxies or client libraries:
Imagine a built-in service that provides:
This would eliminate the need for most teams to build their own proxies or manage retries. It would combine the efficiency of a proxy, the control of a library, and the low TCO of a managed service.
New features are valuable. But enterprises also care deeply about cost, simplicity, and risk. Both solutions discussed above – building a proxy or adopting a library – have hidden costs:
A vendor-managed solution could offer sensible defaults, simplified configuration, and end-to-end support to reduce not just technical challenges, but also the TCO of large-scale event streaming.
Confluent Cloud, as an elastic SaaS, addresses some of these challenges by removing most operational overhead and providing scale and resilience out of the box. Unlike open source Kafka, it is not as sensitive to Consumer Group fan-out and scales orders of magnitude higher thanks to its cloud-native architecture. If needed, a cloud Kafka service can be combined with the patterns explored in this article to scale consumers more efficiently.
A central Kafka proxy is a different (very relevant) problem: But it is not about scaling consumers like Uber’s or Wix’s proxies. They solve compliance, security, and governance challenges.
A (server-side) Kafka proxy sits between clients and brokers to enforce rules without changing applications or Kafka itself. This is useful for PCI/GDPR compliance, multi-tenant SaaS isolation, and protecting against insider threats.
Solutions like Kroxylicious or Confluent’s built-in proxy features follow this model. They use filter chains to handle encryption, audit, policy enforcement, and access control while passing through traffic with minimal overhead.
The key: keep proxies focused on infrastructure-level concerns, not business logic. Think of them as Kafka’s version of API management gateways, but protocol-aware.
A full blog post about use cases, products, and best practices for (server-side) Kafka proxies will come soon. Make sure to follow my newsletter.
Kafka remains the best-in-class solution for event-driven data architectures. Its pull-based model and decoupled design are core to its value. For many use cases, the native pull-based consumer model with multiple consumers per topic is the best approach.
However, at large scale, teams must re-evaluate how they consume Kafka events. Whether through a push-based proxy or smarter client libraries, organizations are evolving their consumer architecture to meet growing demands.
The next step is for a data streaming platform to abstract this complexity without sacrificing flexibility. Serverless consumption models could bring the same leap forward that managed Kafka clusters brought to event streaming infrastructure.
Choose the right solution for your scale. Keep your architecture simple – until it isn’t enough. Then, evolve with purpose.
Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various data Kafka architectures and best practices.
Point-of-Sale systems are evolving into real-time, connected platforms that go far beyond payments. Mobile solutions…
Real-time personalization requires more than just smart models. It demands fresh data, fast processing, and…
AI and autonomous networks took center stage at TM Forum Innovate Americas 2025 in Dallas.…
OSS is critical for service delivery in telecom, yet legacy platforms have become rigid and…
The Apache Kafka community introduced KIP-500 to remove ZooKeeper and replace it with KRaft, a…
Connected vehicles are transforming the automotive industry into a software-driven, data-centric ecosystem. While APIs provide…