Snowflake Archives - Kai Waehner

Shift Left Architecture at Siemens with Stream Processing using Apache Kafka and Flink

11.3K views
6 minute read

Shift Left Architecture at Siemens: Real-Time Innovation in Manufacturing and Logistics with Data Streaming

ByKai Waehner
11. April 2025

Industrial enterprises face increasing pressure to move faster, automate more, and adapt to constant change—without compromising reliability. Siemens Digital Industries addresses this challenge by combining real-time data streaming, modular design, and Shift Left principles to modernize manufacturing and logistics. This blog outlines how technologies like Apache Kafka, Apache Flink, and Confluent Cloud support scalable, event-driven architectures. A real-world example from Siemens’ Modular Intralogistics Platform illustrates how this approach improves data quality, system responsiveness, and operational agility.

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

44.2K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

Apache Kafka and Snowflake Cost Efficiency and Data Governance

17.2K views
10 minute read

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

ByKai Waehner
26. April 2024

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a “shift left architecture” where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg’s open table format enable new use cases and a best of breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink.

Snowflake with Apache Kafka and Iceberg Connector

18.8K views
8 minute read

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

ByKai Waehner
22. April 2024

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

16.2K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

6.6K views
3 minute read

When NOT to Use Apache Kafka? (Lightboard Video)

ByKai Waehner
26. March 2024
1 share

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post contains a lightboard video that gives you a twenty-minute explanation of the DOs and DONTs.

16.6K views
5 minute read

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

ByKai Waehner
28. July 2022
1 share

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.

Stream Exchange for Data Sharing with Apache Kafka in a Data Mesh

17.0K views
10 minute read

Streaming Data Exchange with Kafka and a Data Mesh in Motion

ByKai Waehner
14. November 2021

Data Mesh is a new architecture paradigm that gets a lot of buzzes these days. This blog post looks into this principle deeper to explore why no single technology is the perfect fit to build a Data Mesh. Examples show why an open and scalable decentralized real-time platform like Apache Kafka is often the heart of the Data Mesh infrastructure, complemented by many other data platforms to solve business problems.

Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka

15.4K views
9 minute read

When to Use Reverse ETL and when it is an Anti-Pattern

ByKai Waehner
30. September 2021
1 share

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.

Technology Evangelist

Kai Waehner

Snowflake

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

When to Use Reverse ETL and when it is an Anti-Pattern

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Data Ownership in the Age of Agentic AI: Why SAP’s API Policy Forces a Data Integration Reckoning for Every Enterprise

Complex Event Processing (CEP) with Apache Flink: What It Is and When (Not) to Use It

MCP vs. REST/HTTP API vs. Kafka: The Architect’s Guide to Agentic AI Integration