AWS Archives - Kai Waehner

5.4K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024
No comments

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

Tiered Storage for Apache Kafka - Use Cases Architecture Benefits.png

6.2K views
10 minute read

Why Tiered Storage for Apache Kafka is a BIG THING…

ByKai Waehner
5. December 2023
No comments

Apache Kafka added Tiered Storage to separate compute and storage. The capability enables more scalable, reliable and cost-efficient enterprise architectures. This blog post explores the architecture, use cases, benefits, and a case study for storing Petabytes of data in the Kafka commit log. The end discusses why Tiered Storage does NOT replace other databases and how Apache Iceberg might change future Kafka architectures even more.

Data Streaming Landscape 2023 with Apache Kafka Flink and much more

6.0K views
13 minute read

The Data Streaming Landscape 2023

ByKai Waehner
21. December 2022
1 share
3 comments

Data streaming is a new software category to process data in motion. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary stream processing engines like Apache Flink and SaaS offerings have emerged. And competitive technologies like Pulsar and Redpanda try to get market share. This blog post explores the data streaming landscape of 2023 to summarize existing solutions and market trends.

Is Amazon MSK Serverless for Apache Kafka a Self-Driving Car or just a Car Engine

7.1K views
13 minute read

When NOT to choose Amazon MSK Serverless for Apache Kafka?

ByKai Waehner
30. August 2022
One comment

Apache Kafka became the de facto standard for data streaming. Various cloud offerings emerged and improved in the last years. Amazon MSK Serverless is the latest Kafka product from AWS. This blog post looks at its capabilities to explore how it relates to “the normal” partially managed Amazon MSK, when the serverless version is a good choice, and when other fully-managed cloud services like Confluent Cloud are the better option.

5.6K views
5 minute read

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

ByKai Waehner
28. July 2022
1 share
No comments

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.

Stream Exchange for Data Sharing with Apache Kafka in a Data Mesh

5.4K views
10 minute read

Streaming Data Exchange with Kafka and a Data Mesh in Motion

ByKai Waehner
14. November 2021
No comments

Data Mesh is a new architecture paradigm that gets a lot of buzzes these days. This blog post looks into this principle deeper to explore why no single technology is the perfect fit to build a Data Mesh. Examples show why an open and scalable decentralized real-time platform like Apache Kafka is often the heart of the Data Mesh infrastructure, complemented by many other data platforms to solve business problems.

Apache Kafka in the Public Sector for Smart City Infrastructure

2.1K views
6 minute read

Apache Kafka in the Public Sector – Part 2: Smart City

ByKai Waehner
12. October 2021
No comments

The public sector includes many different areas. Some groups leverage cutting-edge technology, like military leverage. Others like the public administration are years or even decades behind. This blog series explores both edges to show how data in motion powered by Apache Kafka adds value for innovative new applications and modernizing legacy IT infrastructures. This is part 2: Use cases and architectures for a Smart City.

Serverless Kafka for Data in Motion as Rescue for Data at Rest in the Data Lake

5.4K views
12 minute read

Serverless Kafka in a Cloud-native Data Lake Architecture

ByKai Waehner
25. June 2021
1 share
No comments

Apache Kafka became the de facto standard for processing data in motion. Kafka is open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use a serverless Kafka SaaS offering to focus on business logic. However, hybrid scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden. This blog post explores how to leverage cloud-native and serverless Kafka offerings in a hybrid cloud architecture. We start from the perspective of data at rest with a data lake and explore its relation to data in motion with Kafka.

How to choose the right Apache Kafka Offering - Confluent Cloudera Red Hat IBM Amazon AWS MSK

13.3K views
21 minute read

Comparison of Open Source Apache Kafka vs Vendors including Confluent, Cloudera, Red Hat, Amazon MSK

ByKai Waehner
20. April 2021
3 shares
2 comments

Apache Kafka became the de facto standard for event streaming. Various vendors added Kafka and related tooling to their offerings or provide a Kafka cloud service. This blog post uses the car analogy – from the motor engine to the self-driving car – to explore the different Kafka offerings available on the market. The goal is not a feature-by-feature comparison. Instead, the intention is to educate about the different deployment models, product strategies, and trade-offs from the available options.

Apache Kafka vs Apache Pulsar Comparison and Myths Explored

7.1K views
25 minute read

Pulsar vs Kafka – Comparison and Myths Explored

ByKai Waehner
9. June 2020
10 shares
No comments

Pulsar vs Kafka – which one is better? This blog post explores pros and cons, popular myths, and…

Technology Evangelist

Kai Waehner

AWS

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Why Tiered Storage for Apache Kafka is a BIG THING…

The Data Streaming Landscape 2023

Technology Evangelist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

The Digitalization of Airport and Airlines with IoT and Data Streaming using Kafka and Flink

Energy Trading with Apache Kafka and Flink