Snowflake Archives - Kai Waehner

Data Streaming Lake Warehouse and Lakehouse with Confluent Databricks Snowflake using Iceberg and Tableflow Delta Lake

3.1K views
12 minute read

Databricks and Confluent Leading Data and AI Architectures – What About Snowflake, BigQuery, and Friends?

ByKai Waehner
15. May 2025

Confluent, Databricks, and Snowflake are trusted by thousands of enterprises to power critical workloads—each with a distinct focus: real-time streaming, large-scale analytics, and governed data sharing. Many customers use them in combination to build flexible, intelligent data architectures. This blog highlights how Erste Bank uses Confluent and Databricks to enable generative AI in customer service, while Siemens combines Confluent and Snowflake to optimize manufacturing and healthcare with a shift-left approach. Together, these examples show how a streaming-first foundation drives speed, scalability, and innovation across industries.

Data Streaming with Apache Kafka and Flink in Healthcare and Manufacturing at Siemens Healthineers

4.1K views
6 minute read

How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare

ByKai Waehner
17. December 2024

Siemens Healthineers, a global leader in medical technology, delivers solutions that improve patient outcomes and empower healthcare professionals. A significant aspect of their technological prowess lies in their use of data streaming to unlock real-time insights and optimize processes. This blog post explores how Siemens Healthineers uses data streaming with Apache Kafka and Flink, their cloud-focused technology stack, and the use cases that drive tangible business value, such as real-time logistics, robotics, SAP ERP integration, AI/ML, and more.

Lakehouse and Data Streaming - Competitor or Complementary

6.1K views
12 minute read

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

ByKai Waehner
12. October 2024

In today’s data-driven world, understanding data at rest versus data in motion is crucial for businesses. Data streaming frameworks like Apache Kafka and Apache Flink enable real-time data processing. Meanwhile, lakehouses like Snowflake, Databricks, and Microsoft Fabric excel in long-term data storage and detailed analysis, perfect for reports and AI training. This blog post explores how these technologies complement each other in enterprise architecture.

6.5K views
8 minute read

What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and how it Competes with Snowflake and Databricks

ByKai Waehner
4. October 2024

If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate solution for any data challenge you can imagine. That’s also the impression many people get from Microsoft’s sales teams. But is it really the silver bullet it’s made out to be? This article takes a closer look exploring the glossy marketing and sales definition of the platform and then deconstructing it from a more practical perspective. Learn what Microsoft Fabric is truly built for, and how it fits into the wider data landscape, especially in comparison to other major players in the data analytics market like Databricks and Snowflake.

23.3K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Apache Kafka and Snowflake Cost Efficiency and Data Governance

8.2K views
10 minute read

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

ByKai Waehner
26. April 2024

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a “shift left architecture” where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg’s open table format enable new use cases and a best of breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink.

Snowflake with Apache Kafka and Iceberg Connector

8.9K views
8 minute read

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

ByKai Waehner
22. April 2024

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

7.5K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

Data Warehouse and Data Lake Modernization with Data Streaming

7.7K views
9 minute read

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

ByKai Waehner
15. July 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka

9.2K views
9 minute read

When to Use Reverse ETL and when it is an Anti-Pattern

ByKai Waehner
30. September 2021
1 share

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.

Technology Evangelist

Kai Waehner

Snowflake

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

When to Use Reverse ETL and when it is an Anti-Pattern

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Demo Title