Request-response communication with REST / HTTP is simple, well understood, and supported by most technologies, products, and SaaS cloud services. Contrarily, data streaming with Apache Kafka is a fundamental change to process data continuously. HTTP and Kafka complement each other in various ways. This post explores the architectures and use cases to leverage request-response together with data streaming in the control plane for management or in the data plane for producing and consuming events.
Request-response (HTTP) versus data streaming (Apache Kafka)
Prior to discussing the relationship between HTTP/REST and Apache Kafka, let’s explore the concepts behind both. Traditionally, request-response and data streaming are two different paradigms.
Request-response with REST/HTTP
The following characteristics make HTTP so prevalent in software engineering for request-response (aka request-reply) communication:
- The foundation of data communication for the World Wide Web
- The standard application layer protocol in the internet protocol suite, commonly known as TCP/IP
- Simple and well understood
- Supported by most open source frameworks, proprietary products, and SaaS cloud services
- Pre-defined API with GET, POST, PUT, and DELETE commands
- Typically synchronous communication, but chunked transfer encoding (i.e., streaming) is also possible
- Point-to-point message exchange between two applications (like a client and server or two independent microservices)
Data streaming with Apache Kafka
HTTP is about communication between two applications. On the contrary, data streaming is much more than just data communication between a client and a server. Hence, data streaming platforms like Apache Kafka have very different characteristics:
- No official web standard like HTTP
- Plenty of open-source and proprietary implementations exist
- An event-driven system with asynchronous communication using truly decoupled producers and consumers due to the storage of the streaming platform
- General-purpose events instead of pre-defined APIs – contract management using schemas is crucial in larger projects for API enforcement and data governance
- Continuous data processing in real-time at any scale – a fundamental change for developers that are used to web services and databases for building applications
- Backpressure handling, slow consumers, and replayability of historical events are core concepts built-in out-of-the-box
- Data integration and data processing capabilities are built into the data streaming platform, i.e., it is not just a message queue
Please note that this article specifically discusses Apache Kafka as it is the established de facto standard for data streaming. It powers most data streaming distributions like Confluent, Red Hat, IBM, Cloudera, TIBCO, and many more, plus cloud services like Confluent Cloud and Amazon MSK. Nevertheless, other frameworks and cloud services like Apache Pulsar, Redpanda, Apache Flink, AWS Kinesis, and many other data streaming technologies follow the same principles. Just be aware of the technical differences and trade-offs between data streaming products.
Request-response and data streaming are complementary
Most architectures need request-response for point-to-point communication (e.g., between a server and mobile app) and data streaming for continuous event processing.
Event sourcing with CQRS is the better design for most data streaming scenarios. However, developers can implement the request-response message exchange pattern natively with Apache Kafka.
Nevertheless, direct HTTP communication with Kafka is the easier and often better approach for appropriate use cases. With this in mind, let’s look at use cases where HTTP is used with Kafka and how they complement each other.
Why is HTTP / REST so popular?
Most developers and administrators are familiar with REST APIs. They are the natural option for many best practices and security guidelines. Here are some good reasons why this will not change in the future:
- Avoiding technology lock-in: Sometimes, you want to embed the communication or proxy it with a more agnostic API.
- Familiarity with a known technology: Developers are familiar with REST endpoints and if they are under pressure or need a quick result, it’s quicker than learning how to use a new API.
- Supported by almost all products: Most open-source frameworks, commercial products, and SaaS cloud provide HTTP APIs.
- Security: HTTP ports are much easier to open by security teams compared to the TCP ports of the Kafka-native protocol used by client APIs from programming languages such as Java, Go, C++, or Python. For instance, in DMZ pass-through requirements, InfoSec owns the F5 proxies in the DMZ. A Kafka REST Proxy makes the integration easier.
- Domain-driven design (DDD): Often, HTTP/REST and Kafka are combined to leverage the best of both worlds: Kafka for decoupling and HTTP for synchronous client-server communication. A service mesh using Kafka with REST APIs is a common architecture.
Other great implementations exist for request-response communication. For instance:
- gRPC: Efficient request-response communication via a cross-platform open source high-performance Remote Procedure Call framework
- GraphQL Data query and manipulation language for APIs and a runtime for fulfilling queries with existing data
Nevertheless, HTTP is the first choice in most projects. gRPC, GraphQL, or other implementations are chosen for specific problems if HTTP is not good enough.
Use cases for HTTP / REST APIs with Apache Kafka
RESTful interfaces to an Apache Kafka cluster make it easy to produce and consume messages, view the cluster’s metadata, and perform administrative actions using standard HTTP(S) instead of the native TCP-based Kafka protocol or clients.
Each scenario differs significantly in its purpose. Some use cases are implemented out of convenience, while others are required because of technical specifications.
There are two major categories of use cases for combining HTTP with Kafka. In terms of cloud-native architectures, this can be divided into the management plane (i.e., administration and configuration) and the data plane (i.e., producing and consuming data).
Management plane with HTTP and Kafka
The management and administration of a Kafka cluster involve various tasks, such as:
- Cluster management: Creation of a cluster, expanding or shrinking a cluster, etc.
- Cluster configuration: Management of Kafka topics, consumer groups, key management, role-based access control (RBAC), etc.
- CI/CD and DevOps integration: HTTP APIs are the most popular way to build delivery pipelines and automate administration, instead of using Python or other alternative scripting options.
- Data governance: Tools for data lineage, data catalogs, and policy enforcement need to be configured using APIs.
- 3rd party monitoring integration: Connect metrics APIs, alerts, and other notifications into systems like Datadog, Slack, etc.
Data plane with HTTP and Kafka
Many scenarios require or prefer the usage of REST APIs for producing and consuming messages to/from Kafka, such as
- Natural request-response applications such as mobile apps: These applications and the frameworks almost always require integration via HTTP and request-response. WebSockets, Server-Sent Events (SSE), and similar concepts are a better fit for data streaming with Kafka. They are in the client framework, though often not supported.
- Legacy application and third-party tool integration: Legacy applications, standard software, and traditional middleware are often proprietary. The only integration capabilities are HTTP/REST. Extract, transform, load (ETL), enterprise service bus (ESB), and other third-party tools are complementary to data streaming with Kafka. Mainframe integration using REST APIs from COBOL to Kafka is another example.
- API gateway: Most API management tools do not provide native support for data streaming and Kafka today and only work on top of REST interfaces. Kafka (via the REST interface) and API management are still very complementary for some use cases, such as service monetization or integration with partner systems.
- Other programming languages: Kafka provides Java and Scala clients. Confluent provides and supports additional clients, including Python, .NET, C, C++, and Go. More Kafka clients exist from the community, including Erlang, Kotlin, Node.js, PHP, Ruby, and Rust. Many of these community clients are not battle tested or supported. Therefore, calling the REST API from your favorite programming language is sometimes the better and easier option. Others, such as COBOL on the mainframe, don’t even provide a Kafka client at all. Hence, a REST API is the only viable solution.
Example: HTTP + Kafka with Confluent REST Proxy
The Confluent REST Proxy has been around for a long time and is available under the Confluent Community License. Many companies use it in production as a management plane and data plane as a self-managed component in conjunction with open source Apache Kafka, Confluent Platform, or Confluent Cloud.
While not being a lawyer, the short version is that you can use the Confluent REST Proxy for free – even for your production workloads at any scale – as long as you don’t build a competitive cloud service with it (say, e.g., the “AWS Kafka HTTP Proxy”) and charge per hour or volume for the serverless offering.
The Confluent REST Proxy and REST APIs are separated into both a data plane and a management plane:
While some applications require both, in many scenarios, only one or the other is used.
The management plane is typically used for very low throughout and a few API calls. The data plane, on the other hand, varies. Many applications produce and consume data continuously. The biggest limitation of the REST Proxy data plane is that it is a synchronous request-response protocol.
What scale and volumes does a REST Proxy for Kafka support?
Don’t underestimate the power of the REST Proxy as a data plane because Kafka provides batch capabilities to scale up to many parallel REST Proxy instances. There are deployments where four REST Proxy instances can handle ~20,000 events per second, which is sufficient for many use cases.
Many HTTP use cases do not require millions of events per second. Hence, the Confluent REST Proxy is often good enough. This is even true for many IoT use cases I have seen in the wild where devices or machines connect to Kafka via HTTP.
How does HTTP’s streaming data transfer fit into the architecture?
Please note that chunked transfer encoding is a streaming data transfer mechanism available in version 1.1 of HTTP. In chunked transfer encoding, the data stream is divided into a series of non-overlapping “chunks”. The chunks are sent out and received independently of one another.
Some Kafka REST Produce APIs support a streaming mode that allows sending multiple records over a single stream. The stream remains open unless explicitly terminated. The streaming mode can be achieved by setting an additional header “Transfer-Encoding: chunked” on the initial request. Check if your favorite Kafka proxy or cloud API supports the HTTP streaming mode.
Architecture options for Kafka + Rest Proxy
Different deployment options exist for the Confluent REST Proxy:
The self-managed REST Proxy instance or cluster of instances (as a “dedicated node”) is still decoupled from the open-open source Kafka broker or commercial Confluent Server. This is the ideal option for a data plane to produce and consume messages.
The management plane is also embedded as a unified REST API into Confluent Server (as a “broker plugin”) and Confluent Cloud for administrative operations. This simplifies the architecture because no additional nodes are required for using the administration APIs.
In some deployments, both approaches may be combined: The management plane is used via the embedded REST APIs in Confluent Server or in Confluent Cloud. Meanwhile, data plane use cases are decoupled into their own REST Proxy instances to easily handle scalability and be independent of the server side.
The developer does not have to care about the infrastructure or architecture for REST APIs in Confluent Cloud. The HTTP interfaces are fully managed by the vendor.
The REST APIs of the self-managed REST Proxy and Confluent Cloud are compatible. Hybrid architectures and cloud migration are possible without implementing any breaking changes.
Data governance for data streaming and HTTP services with a Schema Registry
Data governance is an important part of most data streaming projects. Kafka deployments usually include various decoupled producers and consumers, often following the DDD principle for microservice architectures. Hence, Confluent Schema Registry is used in most projects for schema enforcement and versioning.
Any Kafka client built by Confluent can leverage the Schema Registry using Avro, Protobuf, or JSON Schema. This includes programming APIs like Java, Python, Go, or Python, but also Kafka Connect sources and sink, Kafka Streams, ksqlDB, and the Confluent REST Proxy. Like the REST Proxy, Schema Registry is available under the Confluent Community License.
Schema Registry lives separately from your Kafka brokers. Confluent REST Proxy still talks to Kafka to publish and read data (messages) to topics. Concurrently, the REST Proxy can also talk to Schema Registry to send and retrieve schemas that describe the data models for the messages.
Schema Registry provides a serving layer for your metadata and enables data governance and schema enforcement for all events. It provides a RESTful interface for storing and retrieving your Avro, JSON Schema, and Protobuf schemas. The Schema Registry stores a versioned history of all schemas based on a specified subject name strategy, provides multiple compatibility settings, and allows the evolution of schemas according to the configured compatibility settings and expanded support for these schema types. It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in any of the supported formats:
Schema enforcement happens on the client side. Additionally, Confluent Platform and Confluent Cloud provide server-side schema validation. The latter is helpful if incorrect or malicious client applications send messages to Kafka without using the client-side Schema Registry integration.
API Management and data sharing
API Management is a term from the Open API world that puts a layer on top of HTTP / REST APIs for the management, monitoring, and monetization of APIs. Solutions include Apigee, MuleSoft Anypoint, Kong, IBM API Connect, TIBCO Mashery, and many more.
Features of API gateways and API management products
API Gateway and API Management Tools provide many outstanding features:
- API Portal for creating and publishing APIs
- Enforcing usage policies and controlling access
- Technical features for data transformations
- Nurturing the subscriber community
- Collecting and analyzing usage statistics
- Reporting on performance
- Monetization and billing
These features are unavailable in any data streaming platform like Kafka, Pulsar, Kinesis, et al. on the other side, API tools like MuleSoft or Kong are not built for processing real-time data at scale with low latency.
API Management and data streaming are complementary
Hence, API Management and data streaming are complementary, not competitive! The blog post “Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?” explores this:
API == REST/HTTP for most API Management products and related API gateways. Vendors start to integrate either the Kafka API or event standards like AsyncAPI to get into event-based architectures. That’s great news!
Sharing of streaming data requires a stream data exchange instead of HTTP APIs
Data sharing becomes crucial to modern and flexible enterprise architectures that build on concepts like microservices and data mesh. Real-time data beats slow data. That’s not just true for applications but also for data replication across business units, organizations, B2B, clouds, hybrid environments, and other scenarios. Therefore, the next generation of data sharing is built on top of data streaming.
HTTP APIs make little sense in many data streaming scenarios, especially if you expect high volumes or require low latency. Hence, data sharing in real-time by linking Kafka clusters or using a stream data exchange is the much better approach:
I won’t go into more detail here. The dedicated blog post “Streaming Data Exchange with Kafka for Real-Time Data Sharing” explores the idea, its trade-offs, and some real-world examples.
Not one or the other, instead combine Kafka and HTTP/REST in your projects!
Various use cases employ HTTP/REST with Apache Kafka as a management plane or data plane. This combination will not go away in the future.
The Confluent REST Proxy can be used for HTTP(S) communication with your favorite client interface. No matter if you run open-source Apache Kafka, Confluent Platform, or Confluent Cloud. Check out the source code on GitHub or get started with an intuitive tutorial.
How do you combine data streaming with request-response principles? How do you combine HTTP and Kafka? What proxy are you using? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.