data lake Archives - Kai Waehner

Lakehouse and Data Streaming - Competitor or Complementary

6.0K views
12 minute read

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

ByKai Waehner
12. October 2024

In today’s data-driven world, understanding data at rest versus data in motion is crucial for businesses. Data streaming frameworks like Apache Kafka and Apache Flink enable real-time data processing. Meanwhile, lakehouses like Snowflake, Databricks, and Microsoft Fabric excel in long-term data storage and detailed analysis, perfect for reports and AI training. This blog post explores how these technologies complement each other in enterprise architecture.

6.4K views
8 minute read

What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and how it Competes with Snowflake and Databricks

ByKai Waehner
4. October 2024

If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate solution for any data challenge you can imagine. That’s also the impression many people get from Microsoft’s sales teams. But is it really the silver bullet it’s made out to be? This article takes a closer look exploring the glossy marketing and sales definition of the platform and then deconstructing it from a more practical perspective. Learn what Microsoft Fabric is truly built for, and how it fits into the wider data landscape, especially in comparison to other major players in the data analytics market like Databricks and Snowflake.

Streaming Analytics SQL API with Apache Kafka Confluent ClickHouse Tinybird

10.6K views
8 minute read

Apache Kafka and Tinybird (ClickHouse) for Streaming Analytics HTTP APIs

ByKai Waehner
4. April 2024

Apache Kafka became the de facto standard for data streaming. However, the combination of an event-driven architecture with request-response APIs is crucial for most enterprise architectures. This blog post explores how Tinybird innovates with a REST/HTTP layer on top of the open source analytics database ClickHouse in the cloud. Integrating Kafka with Tinybird, the benefits of fully managed services like Confluent Cloud, and customer stories from Factorial and FanDuel show why Kafka and analytics databases complement each other for more innovation and faster time-to-market.

4.9K views
3 minute read

When NOT to Use Apache Kafka? (Lightboard Video)

ByKai Waehner
26. March 2024
1 share

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post contains a lightboard video that gives you a twenty-minute explanation of the DOs and DONTs.

Top Use Cases and Architectures for Data Streaming with Apache Kafka in 2023

6.9K views
7 minute read

Top 5 Data Streaming Trends for 2023

ByKai Waehner
15. December 2022

Data Streaming is one of the most relevant buzzwords in tech to build scalable real-time applications in the cloud and innovative business models. Do you wonder about my predicted TOP 5 data streaming trends in 2023 to set data in motion? Check out the following presentation and learn what role Apache Kafka plays. Learn about decentralized Data Mesh, cloud-native lakehouse, data sharing, improved user experience, and advanced data governance.

11.3K views
5 minute read

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

ByKai Waehner
28. July 2022
1 share

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.

Best Practices for Data Analytics with AWS Azure Googel BigQuery Spark Kafka Confluent Databricks

6.7K views
10 minute read

Best Practices for Building a Cloud-Native Data Warehouse or Data Lake

ByKai Waehner
21. July 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 5: Best Practices for Building a Cloud-Native Data Warehouse or Data Lake.

Case Studies for Cloud Native Analytics with Data Warehouse Data Lake Data Streaming Lakehouse

9.3K views
7 minute read

Case Studies: Cloud-native Data Streaming for Data Warehouse Modernization

ByKai Waehner
18. July 2022
43 shares

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 4: Case Studies for cloud-native data streaming and data warehouses.

Data Warehouse and Data Lake Modernization with Data Streaming

7.6K views
9 minute read

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

ByKai Waehner
15. July 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

Data Warehouse vs Data Lake vs Data Streaming Comparison

14.0K views
10 minute read

Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?

ByKai Waehner
27. June 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 1: Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?

Technology Evangelist

Kai Waehner

data lake

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

Global Field CTO

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Demo Title