Data Integration Archives

319 views
8 minute read

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog

ByKai Waehner
18. May 2026

Most organizations start their data governance journey by asking how to track where data comes from and where it goes. They quickly discover a harder question: why can none of their existing tools answer that across all systems? Vendor-specific lineage tools like Confluent, Snowflake Horizon, and Databricks Unity Catalog each do a good job within their platform boundary. The problem is the boundary. Enterprise-wide lineage requires a platform-independent catalog layer that integrates everything and is owned by none of the platforms it connects.

Data Ownership in the Age of Agentic AI: SAP API Policy and the Data Integration Reckoning

4.1K views
7 minute read

Data Ownership in the Age of Agentic AI: Why SAP’s API Policy Forces a Data Integration Reckoning for Every Enterprise

ByKai Waehner
2. May 2026

Every enterprise is being told to go agentic. Meanwhile, the platforms holding your most critical business data are tightening control over how AI agents can access it. SAP made that move explicitly. Other software vendors are doing the same thing, more quietly. This is not a SAP problem. It is an industry-wide structural shift. And it has a known solution. Enterprises solved this decades ago with mainframes. The pattern is the same: extract once, stage the data, let agents consume freely downstream. The trigger then was cost and scale. The trigger now is agentic AI. The architecture that solves it has been in production for years.

25.4K views
27 minute read

Enterprise Agentic AI Landscape 2026: Trust, Flexibility, and Vendor Lock-in

ByKai Waehner
6. April 2026

The Enterprise Agentic AI Landscape 2026 maps every major AI vendor across two dimensions that matter most: how much you trust their AI, and how much lock-in you accept. An independent, vendor-neutral analysis covering Anthropic, OpenAI, Google, Microsoft, AWS, Mistral, SAP, Salesforce, and more.

The Enterprise Architecture Trinity around Process Orchestration Intelligence, Data Integration and Streaming, and Trusted Safe Agentic AI

3.8K views
7 minute read

The Trinity of Modern Data Architecture: Process Intelligence, Event-Driven Integration, and Trusted Agentic AI

ByKai Waehner
1. April 2026

Agentic AI without governed processes is fast but ungoverned. Event-driven integration without process intelligence moves data but not decisions. Process intelligence without live data automates the wrong outcomes. The fix is a converged architecture. This post shows what that looks like.

Confluent and Databricks for Data Integration and Stream Processing

16.8K views
10 minute read

Confluent Data Streaming Platform vs. Databricks Data Intelligence Platform for Data Integration and Processing

ByKai Waehner
5. May 2025

This blog explores how Confluent and Databricks address data integration and processing in modern architectures. Confluent provides real-time, event-driven pipelines connecting operational systems, APIs, and batch sources with consistent, governed data flows. Databricks specializes in large-scale batch processing, data enrichment, and AI model development. Together, they offer a unified approach that bridges operational and analytical workloads. Key topics include ingestion patterns, the role of Tableflow, the shift-left architecture for earlier data validation, and real-world examples like Uniper’s energy trading platform powered by Confluent and Databricks.

37.8K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Snowflake with Apache Kafka and Iceberg Connector

18.8K views
8 minute read

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

ByKai Waehner
22. April 2024

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.

SAP Datasphere and Apache Kafka as Data Fabric for ERP Integration

19.6K views
12 minute read

SAP Datasphere and Apache Kafka as Data Fabric for S/4HANA ERP Integration

ByKai Waehner
3. January 2024
2 shares

SAP is the leading ERP solution across industries around the world. Data integration with other data platforms, applications, databases, and APIs is one of the hardest challenges in the IT and software landscape. This blog post explores how SAP Datasphere in conjunction with the data streaming platform Apache Kafka enables a reliable, scalable and open data fabric for connecting SAP business objects of ECC and S/4HANA ERP with other real-time, batch, or request-response interfaces.

Business Process Automation and BPM Workflow Engine and Orchestration with Apache Kafka

29.6K views
15 minute read

Apache Kafka as Workflow and Orchestration Engine

ByKai Waehner
8. May 2023

Business process automation with a workflow engine or BPM suite has existed for decades. However, using the data streaming platform Apache Kafka as the backbone of a workflow engine provides better scalability, higher availability, and simplified architecture. This blog post explores case studies across industries to show how enterprises like Salesforce or Swisscom implement stateful workflow automation and orchestration with Kafka.

17.6K views
6 minute read

Apache Kafka for Data Consistency (and Real-Time Data Streaming)

ByKai Waehner
27. December 2022

Real-time data beats slow data in almost all use cases. But as essential is data consistency across all systems, including non-real-time legacy systems and modern request-response APIs. Apache Kafka’s most underestimated feature is the storage component based on the append-only commit log. It enables loose coupling for domain-driven design with microservices and independent data products in a data mesh. This blog post explores how Kafka enables data consistency with a real-world case study from financial services.

Technology Evangelist

Kai Waehner

Data Integration

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog

Data Ownership in the Age of Agentic AI: Why SAP’s API Policy Forces a Data Integration Reckoning for Every Enterprise

Enterprise Agentic AI Landscape 2026: Trust, Flexibility, and Vendor Lock-in

The Trinity of Modern Data Architecture: Process Intelligence, Event-Driven Integration, and Trusted Agentic AI

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog

Data Ownership in the Age of Agentic AI: Why SAP’s API Policy Forces a Data Integration Reckoning for Every Enterprise

Complex Event Processing (CEP) with Apache Flink: What It Is and When (Not) to Use It