Cloudera Archives - Kai Waehner

SaaS vs PaaS Cloud Service for Data Streaming with Apache Kafka and Flink

9.8K views
6 minute read

Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink

ByKai Waehner
18. January 2025

The cloud revolution has reshaped how businesses deploy and manage data streaming with solutions like Apache Kafka and Flink. Distinctions between SaaS and PaaS models significantly impact scalability, cost, and operational complexity. Bring Your Own Cloud (BYOC) expands the options, giving businesses greater flexibility in cloud deployment. Misconceptions around terms like “serverless” highlight the need for deeper analysis to avoid marketing pitfalls. This blog explores deployment options, enabling informed decisions tailored to your data streaming needs.

19.7K views
8 minute read

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

ByKai Waehner
14. January 2025

Discover when Apache Flink is the right tool for your stream processing needs. Explore its role in stateful and stateless processing, the advantages of serverless Flink SaaS solutions like Confluent Cloud, and how it supports advanced analytics and real-time data integration together with Apache Kafka. Dive into the trade-offs, deployment options, and strategies for leveraging Flink effectively across cloud, on-premise, and edge environments, and when to use Kafka Streams or Single Message Transforms (SMT) within Kafka Connect for ETL instead of Flink.

41.8K views
21 minute read

The Data Streaming Landscape 2025

ByKai Waehner
4. December 2024
1 share

Data streaming is a new software category. It has grown from niche adoption to becoming a fundamental part of modern data architecture, leveraging open source technologies like Apache Kafka and Flink. With real-time data processing transforming industries, the ecosystem of tools, platforms, and cloud services has evolved significantly. This blog post explores the data streaming landscape of 2025, analyzing key players, trends, and market dynamics shaping this space.

Data Streaming Landscape 2023 with Apache Kafka Flink and much more

17.5K views
13 minute read

The Data Streaming Landscape 2023

ByKai Waehner
21. December 2022
1 share

Data streaming is a new software category to process data in motion. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary stream processing engines like Apache Flink and SaaS offerings have emerged. And competitive technologies like Pulsar and Redpanda try to get market share. This blog post explores the data streaming landscape of 2023 to summarize existing solutions and market trends.

Apache Kafka vs Apache Pulsar Comparison and Myths Explored

26.9K views
25 minute read

Pulsar vs Kafka – Comparison and Myths Explored

ByKai Waehner
9. June 2020
847 shares

Pulsar vs Kafka – which one is better? This blog post explores pros and cons, popular myths, and…

ByKai Waehner
14. April 2015

Apache Hadoop is getting more and more relevant. Not just for big data processing (e.g. MapReduce), but also in fast data processing (e.g. stream processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize big data and fast data Hadoop use cases.

ByKai Waehner
13. May 2014

Slides from my talk “Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?”…

ByKai Waehner
25. September 2013

Slides from my session “Big Data beyond Apache Hadoop – How to Integrate ALL your Data” at JavaOne 2013 in San Francisco are online.

ByKai Waehner
26. April 2013

Slides from my talk “Big Data beyond Apache Hadoop – How to integrate ALL your data” at NoSQLmatters 2013 in Cologne are online.

Uncategorized

ByKai Waehner
14. March 2013

In March 2013, I was at 33rd Degree – “A Conference for Java Masters”. I had two talks, including a new one: “You are not Facebook or Google? Why you should still care about Big Data”. It is a great talk to give an overview about big data, especially from a business perspective (paradigm shift, business value, challenges). However, I also talk about alternatives for big data from a technology perspective, mainly about the defacto standard Apache Hadoop, its ecosystem, distributions, and tooling (i.e. big data suites).

Technology Evangelist

Kai Waehner

Cloudera

Apache Flink: Overkill for Simple, Stateless Stream Processing and ETL?

The Data Streaming Landscape 2023

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

“Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” – Slides (including TIBCO Examples) from JAX 2014 Online

Slides online: “Big Data beyond Apache Hadoop – How to Integrate ALL your Data” – JavaOne 2013

Slides from NoSQLmatters: “Big Data beyond Apache Hadoop – How to integrate ALL your data with Apache Camel and Talend”

You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop Ecosystem (Pig, Hive, Hortonworks, Cloudera, MapR, Informatica, Talend)

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Why Databricks and Snowflake Speak the Kafka Protocol: Ingestion vs. Architecture

Process Intelligence Explained: Mining, Orchestration, and the Decision Gate