ETL Archives - Kai Waehner

The Rise of Kappa Architecture in the Age of Agentic AI with Data Streaming using Apache Kafka and Flink

15.7K views
9 minute read

The Rise of Kappa Architecture in the Era of Agentic AI and Data Streaming

ByKai Waehner
8. July 2025

The shift from Lambda to Kappa architecture reflects the growing demand for unified, real-time data pipelines that serve both analytical and operational needs. With the rise of Agentic AI and streaming-first systems, Kappa—powered by Apache Kafka and Apache Flink—delivers low-latency, event-driven infrastructure that supports modern applications, from scalable data products to autonomous AI agents. Open table formats and Shift Left principles further establish Kappa as the foundation for consistent, governed, and future-ready data platforms.

Enterprise Application Integration with Confliuent and Databricks for Oracle SAP Salesforce Servicenow et al

12.5K views
12 minute read

Databricks and Confluent in the World of Enterprise Software (with SAP as Example)

ByKai Waehner
12. May 2025

Enterprise data lives in complex ecosystems—SAP, Oracle, Salesforce, ServiceNow, IBM Mainframes, and more. This article explores how Confluent and Databricks integrate with SAP to bridge operational and analytical workloads in real time. It outlines architectural patterns, trade-offs, and use cases like supply chain optimization, predictive maintenance, and financial reporting, showing how modern data streaming unlocks agility, reuse, and AI-readiness across even the most SAP-centric environments.

Shift Left Architecture with Confluent Data Streaming and Databricks Lakehouse Medallion

12.9K views
9 minute read

Shift Left Architecture for AI and Analytics with Confluent and Databricks

ByKai Waehner
9. May 2025

Confluent and Databricks enable a modern data architecture that unifies real-time streaming and lakehouse analytics. By combining shift-left principles with the structured layers of the Medallion Architecture, teams can improve data quality, reduce pipeline complexity, and accelerate insights for both operational and analytical workloads. Technologies like Apache Kafka, Flink, and Delta Lake form the backbone of scalable, AI-ready pipelines across cloud and hybrid environments.

Confluent and Databricks for Data Integration and Stream Processing

17.9K views
10 minute read

Confluent Data Streaming Platform vs. Databricks Data Intelligence Platform for Data Integration and Processing

ByKai Waehner
5. May 2025

This blog explores how Confluent and Databricks address data integration and processing in modern architectures. Confluent provides real-time, event-driven pipelines connecting operational systems, APIs, and batch sources with consistent, governed data flows. Databricks specializes in large-scale batch processing, data enrichment, and AI model development. Together, they offer a unified approach that bridges operational and analytical workloads. Key topics include ingestion patterns, the role of Tableflow, the shift-left architecture for earlier data validation, and real-world examples like Uniper’s energy trading platform powered by Confluent and Databricks.

21.3K views
5 minute read

The Top 20 Problems with Batch Processing (and How to Fix Them with Data Streaming)

ByKai Waehner
1. April 2025

Batch processing introduces delays, complexity, and data quality issues that modern businesses can no longer afford. This article outlines the most common problems with batch workflows—ranging from outdated insights to compliance risks—and illustrates each with real-world examples. It also highlights how real-time data streaming offers a more reliable, scalable, and future-proof alternative.

19.6K views
8 minute read

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

ByKai Waehner
14. January 2025

Discover when Apache Flink is the right tool for your stream processing needs. Explore its role in stateful and stateless processing, the advantages of serverless Flink SaaS solutions like Confluent Cloud, and how it supports advanced analytics and real-time data integration together with Apache Kafka. Dive into the trade-offs, deployment options, and strategies for leveraging Flink effectively across cloud, on-premise, and edge environments, and when to use Kafka Streams or Single Message Transforms (SMT) within Kafka Connect for ETL instead of Flink.

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

47.0K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

39.2K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

16.9K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

The State of Data Streaming for Healthcare in 2023 with Apache Kafka and Flink

9.2K views
6 minute read

The State of Data Streaming for Healthcare with Apache Kafka and Flink

ByKai Waehner
27. November 2023

This blog post explores the state of data streaming for the healthcare industry powered by Apache Kafka and Apache Flink. IT modernization and innovation with pioneering technologies like sensors, telemedicine, or AI/machine learning are explored. I look at enterprise architectures and customer stories from Humana, Recursion, BHG (former Bankers Healthcare Group), and more. A complete slide deck and on-demand video recording are included.

Technology Evangelist

Kai Waehner

ETL

The Top 20 Problems with Batch Processing (and How to Fix Them with Data Streaming)

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Process Intelligence Explained: Mining, Orchestration, and the Decision Gate

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog