Hadoop Archives - Kai Waehner

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

47.0K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

Kappa Architecture vs Lambda Architecture for Apache Kafka Pulsar Data Lakes

77.5K views
17 minute read

Kappa Architecture is Mainstream Replacing Lambda

ByKai Waehner
23. September 2021
4 shares

Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers. This blog post explores why a single real-time pipeline, called Kappa architecture, is the better fit. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for Lambda.

58.9K views
20 minute read

Can Apache Kafka Replace a Database?

ByKai Waehner
12. March 2020
15 shares

Can and should Apache Kafka replace a database? How long can and should I store data in Kafka?…

ByKai Waehner
13. February 2018

At OOP 2018 conference in Munich, I presented an updated version of my talk about building scalable, mission-critical…

ByKai Waehner
7. September 2017

I do a lot of presentations these days at meetups and conferences about how to leverage Apache Kafka and Kafka Streams to apply analytic models (built with H2O, TensorFlow, DeepLearning4J and other frameworks) to scalable, mission-critical environments. As many attendees have asked me, I created a video recording about this talk (focusing on live demos).

ByKai Waehner
1. May 2017

After three great years at TIBCO Software, I move back to open source and join Confluent, the company behind the open source project Apache Kafka to build mission-critical, scalable infrastructures for messaging, integration and stream processsing. In this blog post, I want to share why I see the future for middleware and big data analytics in open source technologies, why I really like Confluent, what I will focus on in the next months, and why I am so excited about this next step in my career.

ByKai Waehner
15. November 2016

Streaming Analytics Comparison of Open Source Frameworks, Products and Cloud Services. Includes Apache Storm, Flink, Spark, TIBCO, IBM, AWS Kinesis, Striim, Zoomdata, …

ByKai Waehner
20. October 2016

Build intelligent Microservices by applying Machine Learning and Advanced Analytics. Leverage Apache Hadoop / Spark with Visual Analytics and Stream Processing.

ByKai Waehner
20. October 2016

Log Analytics is the right framework or tool to monitor for Distributed Microservices. Comparison of Open source, SaaS and Enteprrise Products. Plus relation to big data components such as Apache Hadoop / Spark.

ByKai Waehner
3. March 2016

Closed Big Data Loop: 1) Finding Insights with R, H20, Apache Spark MLlib, PMML and TIBCO Spotfire. 2) Putting Analytic Models into Action via Event Processing and Streaming Analytics.

Technology Evangelist

Kai Waehner

Hadoop

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem

Kafka Streams + H2O.ai + TensorFlow (Video Recording / Live Demo)

Why I Move (Back) to Open Source for Messaging, Integration and Stream Processing

Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Services

Streaming Analytics with Analytic Models (R, Spark MLlib, H20, PMML)

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Process Intelligence Explained: Mining, Orchestration, and the Decision Gate

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog