Categories: Apache KafkaDisaster RecoveryEdgeHybrid CloudQConServerless

Disaster Recovery with Kafka across the Edge and Hybrid Cloud (QCon Talk)

I spoke at QCon London in April 2022 about building disaster recovery and resilient real-time enterprise architectures with Apache Kafka. This blog post summarizes the use cases, architectures, and real-world examples. The slide deck and video recording of the presentation is included as well.

What is QCon?

QCon is a leading software development conference held across the globe for 16 years. It provides a realistic look at what is trending in tech. The QCon events are organized by InfoQ, a well-known website for professional software development with over two million unique visitors per month.

QCon in 2022 uncovers emerging software trends and practices. Developers and architects learn how to solve complex engineering challenges without the product pitches.

There is no Call for Papers (CfP) for QCon. The organizers invite trusted speakers to talk about trends, best practices, and real-world stories. This makes QCon so strong and respected in the software development community.

Disaster Recovery and Resiliency with Apache Kafka

Apache Kafka is the de facto data streaming platform for analytical AND transactional workloads. Multiple options exist to design Kafka for resilient applications. For instance, MirrorMaker 2 and Confluent Replicator enable uni- or bi-directional real-time replication between independent Kafka clusters in different data centers or clouds.

Cluster Linking is a more advanced and straightforward option from Confluent leveraging the native Kafka protocol instead of additional infrastructure and complexity using Kafka Connect (like MirrorMaker 2 and Replicator).

Stretching a single Kafka cluster across multiple regions is the best option to guarantee no downtime and seamless failover in the case of a disaster. However, it is hard to operate and only recommended (i.e., consistent, stable, and mission-critical) across distances with enhanced add-ons to open-source Kafka:

QCon Presentation: Disaster Recovery with Apache Kafka

In my QCon talk, I intentionally showed the broad spectrum of real-world success stories across industries for data streaming with Apache Kafka from companies such as BMW, JPMorgan Chase, Robinhood, Royal Caribbean, and Devon Energy.

Best practices explored how to build resilient enterprises architecture with disaster recovery with RPO (Recovery Point Object) and RTO (Recovery Time Objective) in mind. The audience learns how to get your SLAs and requirements for downtime and data loss right.

The examples looked at serverless cloud offerings integrating to the IoT edge, hybrid retail architectures, and the disconnected edge in military scenarios.

The agenda looks like this:

Resilient enterprise architectures
Real-time data streaming with the Apache Kafka ecosystem
Cloud-first and serverless Industrial IoT in automotive
Multi-region infrastructure for core banking
Hybrid cloud for customer experiences in retail
Disconnected edge for safety and security in the public sector

Slide Deck from QCon Talk:

Here is the slide deck of my presentation from QCon London 2022:

You are currently viewing a placeholder content from Default. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

Unblock content Accept required service and unblock content

More Information

We also had a great panel that discussed lessons learned from building resilient applications on the code and infrastructure level, plus the organizational challenges and best practices:

Video Recording from QCon Talk:

With the risk of Covid in mind, InfoQ decided not to record QCon sessions live.

Instead, a pre-recorded video had to be submitted by the speakers. The video recording is already available for QCon attendees (no matter if on-site in London or at the QCon Plus virtual event):

Qcon makes conference talks available for free sometime after the event. I will update this post with the free link as soon as it is available.

Disaster Recovery with Apache Kafka across all Industries

I hope you enjoyed the slides and video on this exciting topic. Hybrid and global Kafka infrastructures for disaster recovery and other use cases are the norm, not exceptions.

Real-time data beats slow data. That is true in almost every use case. Hence, data streaming with the de facto standard Apache Kafka gets adopted more and more across all industries.

How do you leverage data streaming with Apache Kafka for building resilient applications and enterprise architectures? What architecture does your platform use? Which products do you combine with data streaming? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

Kai Waehner

bridging the gap between technical innovation and business value for real-time data streaming and applied AI.

Next Machine Learning and Data Science with Kafka in Healthcare »

Previous « Real Time Analytics with Apache Kafka in the Healthcare Industry

Published by

Kai Waehner

Tags: Cluster LinkingComplianceData Integrationdata lossDisaster RecoverydowntimekafkaKafka Connectkafka streamsKSQLksqldbmiddlewareMirrorMakerReplicatorResiliencyRPORTOSLASLOTransactions

4 years ago

Choosing an ERP for Manufacturing: How AI Is Reshaping the Vendor Landscape

ERP vendor selection for manufacturing is not a product decision. It is a strategic bet…

5 days ago

Process Intelligence

Process Intelligence Explained: Mining, Orchestration, and the Decision Gate

Process intelligence is not a single tool. It combines process mining, process orchestration, and a…

1 week ago

ERP Migration to SAP S/4HANA and Beyond: Lessons Learned from German Manufacturing

ERP modernization fails when the technology leads and the process work follows. Three German manufacturers…

3 weeks ago

Data Catalog

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog

Most organizations start their data governance journey by asking how to track where data comes…

1 month ago

Data Ownership in the Age of Agentic AI: Why SAP’s API Policy Forces a Data Integration Reckoning for Every Enterprise

Every enterprise is being told to go agentic. Meanwhile, the platforms holding your most critical…

2 months ago

Complex Event Processing

Flink CEP and Agentic AI: Real-Time Pattern Detection as the Foundation for Autonomous Decisions

AI agents fail in production when they are connected directly to raw event streams. Flink…