Spark Archives - Kai Waehner

Data Streaming Landscape 2023 with Apache Kafka Flink and much more

5.1K views
13 minute read

The Data Streaming Landscape 2023

ByKai Waehner
21. December 2022
1 share
3 comments

Data streaming is a new software category to process data in motion. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary stream processing engines like Apache Flink and SaaS offerings have emerged. And competitive technologies like Pulsar and Redpanda try to get market share. This blog post explores the data streaming landscape of 2023 to summarize existing solutions and market trends.

4.6K views
5 minute read

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

ByKai Waehner
28. July 2022
1 share
No comments

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.

Kappa Architecture vs Lambda Architecture for Apache Kafka Pulsar Data Lakes

21.4K views
17 minute read

Kappa Architecture is Mainstream Replacing Lambda

ByKai Waehner
23. September 2021
4 comments

Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers. This blog post explores why a single real-time pipeline, called Kappa architecture, is the better fit. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for Lambda.

25.5K views
20 minute read

Can Apache Kafka Replace a Database?

ByKai Waehner
12. March 2020
15 shares
2 comments

Can and should Apache Kafka replace a database? How long can and should I store data in Kafka?…

2.9K views
2 minute read

Big Data Spain: Talk about KSQL – The Streaming SQL Engine for Apache Kafka

ByKai Waehner
15. November 2018
No comments

KSQL – The Open Source Streaming SQL Engine for Apache Kafka => Slides from my talk at Big Data Spain 2018 are online. Check it out!

ByKai Waehner
13. February 2018
No comments

At OOP 2018 conference in Munich, I presented an updated version of my talk about building scalable, mission-critical…

ByKai Waehner
7. September 2017
No comments

I do a lot of presentations these days at meetups and conferences about how to leverage Apache Kafka and Kafka Streams to apply analytic models (built with H2O, TensorFlow, DeepLearning4J and other frameworks) to scalable, mission-critical environments. As many attendees have asked me, I created a video recording about this talk (focusing on live demos).

ByKai Waehner
23. May 2017
No comments

Apache Kafka Streams to build Real Time Streaming Microservices. Apply Machine Learning / Deep Learning using Spark, TensorFlow, H2O.ai, etc. to add AI. Embed Kafka Streams into Java App, Docker, Kubernetes, Mesos, anything else.

ByKai Waehner
1. May 2017
One comment

After three great years at TIBCO Software, I move back to open source and join Confluent, the company behind the open source project Apache Kafka to build mission-critical, scalable infrastructures for messaging, integration and stream processsing. In this blog post, I want to share why I see the future for middleware and big data analytics in open source technologies, why I really like Confluent, what I will focus on in the next months, and why I am so excited about this next step in my career.

ByKai Waehner
15. November 2016
No comments

Streaming Analytics Comparison of Open Source Frameworks, Products and Cloud Services. Includes Apache Storm, Flink, Spark, TIBCO, IBM, AWS Kinesis, Striim, Zoomdata, …

Technology Evangelist

Kai Waehner

Spark

The Data Streaming Landscape 2023

Big Data Spain: Talk about KSQL – The Streaming SQL Engine for Apache Kafka

Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem

Kafka Streams + H2O.ai + TensorFlow (Video Recording / Live Demo)

Apache Kafka Streams + Machine Learning (Spark, TensorFlow, H2O.ai)

Why I Move (Back) to Open Source for Messaging, Integration and Stream Processing

Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Services

Technology Evangelist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Open Standards for Data Lineage: OpenLineage for Batch AND Streaming