Data Streaming Archives

5.4K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024
No comments

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

7.0K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024
No comments

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Data Streaming with Apache Kafka for Industrial IoT in the Automotive Industry at Brose

757 views
3 minute read

Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases

ByKai Waehner
13. June 2024
No comments

Data streaming unifies OT/IT workloads by connecting information from sensors, PLCs, robotics and other manufacturing systems at the edge with business applications and the big data analytics world in the cloud. This blog post explores how the global automotive supplier Brose deploys a hybrid industrial IoT architecture using Apache Kafka in combination with Eclipse Kura, OPC-UA, MuleSoft and SAP.

Snowflake with Apache Kafka and Iceberg Connector

1.7K views
8 minute read

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

ByKai Waehner
22. April 2024
No comments

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

1.9K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024
No comments

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

Google Apache Kafka for BigQuery GCP Cloud Service

2.1K views
7 minute read

When (Not) to Choose Google Apache Kafka for BigQuery?

ByKai Waehner
10. April 2024
No comments

Google announced its Apache Kafka for BigQuery cloud service at its conference Google Cloud Next 2024 in Las Vegas. Welcome to the data streaming club joining Amazon, Microsoft, IBM, Oracle, Confluent, and others. This blog post explores this new managed Kafka offering for GCP, reviews the current status of the data streaming landscape, and shares some criteria to evaluate when Kafka in general and Google Apache Kafka in particular should (not) be used.

1.5K views
3 minute read

When NOT to Use Apache Kafka? (Lightboard Video)

ByKai Waehner
26. March 2024
1 share
No comments

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post contains a lightboard video that gives you a twenty-minute explanation of the DOs and DONTs.

ESG and Sustainability powered by Data Streaming with Apache Kafka and Flink

1.8K views
13 minute read

Green Data, Clean Insights: How Kafka and Flink Power ESG Transformations

ByKai Waehner
10. February 2024
No comments

This blog post explores the synergy between Environmental, Social, and Governance (ESG) principles and Kafka and Flink’s real-time data processing capabilities, unveiling a powerful alliance that transforms intentions into impactful change. Beyond just buzzwords, real-world deployments architectures across industries show the value of data streaming for better ESG ratings.

SAP Datasphere and Apache Kafka as Data Fabric for ERP Integration

3.4K views
12 minute read

SAP Datasphere and Apache Kafka as Data Fabric for S/4HANA ERP Integration

ByKai Waehner
3. January 2024
No comments

SAP is the leading ERP solution across industries around the world. Data integration with other data platforms, applications, databases, and APIs is one of the hardest challenges in the IT and software landscape. This blog post explores how SAP Datasphere in conjunction with the data streaming platform Apache Kafka enables a reliable, scalable and open data fabric for connecting SAP business objects of ECC and S/4HANA ERP with other real-time, batch, or request-response interfaces.

Data Streaming Landscape 2024 around Kafka Flink and Cloud

7.1K views
21 minute read

The Data Streaming Landscape 2024

ByKai Waehner
21. December 2023
1 share
No comments

The research company Forrester defines data streaming platforms as a new software category in a new Forrester Wave. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary open source stream processing frameworks like Apache Flink and related cloud offerings emerged. And competitive technologies like Pulsar, Redpanda, or WarpStream try to get market share leveraging the Kafka protocol. This blog post explores the data streaming landscape of 2024 to summarize existing solutions and market trends. The end of the article gives an outlook to potential new entrants in 2025.

Technology Evangelist

Kai Waehner

Data Streaming

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Technology Evangelist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

The Digitalization of Airport and Airlines with IoT and Data Streaming using Kafka and Flink

Energy Trading with Apache Kafka and Flink