Confluent Archives - Kai Waehner

One Apache Kafka Cluster Type Does NOT Fit All Use Cases

1.6K views
13 minute read

Apache Kafka Cluster Type Deployment Strategies

ByKai Waehner
29. July 2024
No comments

Organizations start their data streaming adoption with a single Apache Kafka cluster to deploy the first use cases. The need for group-wide data governance and security but different SLAs, latency, and infrastructure requirements introduce new Kafka clusters. Multiple Kafka clusters are the norm, not an exception. Use cases include hybrid integration, aggregation, migration, and disaster recovery. This blog post explores real-world success stories and cluster strategies for different Kafka deployments across industries.

Apache Iceberg Open Table Format for Data Lake Lakehouse Streaming wtih Kafka Flink Databricks Snowflake AWS GCP Azure

8.2K views
11 minute read

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

ByKai Waehner
13. July 2024
No comments

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.

Data Lineage for Data Streaming with OpenLineage Apache Kafka and Flink

4.5K views
11 minute read

Open Standards for Data Lineage: OpenLineage for Batch AND Streaming

ByKai Waehner
13. May 2024
No comments

One of the greatest wishes of companies is end-to-end visibility in their operational and analytical workflows. Where does data come from? Where does it go? To whom am I giving access to? How can I track data quality issues? The capability to follow the data flow to answer these questions is called data lineage. This blog post explores market trends, efforts to provide an open standard with OpenLineage, and how data governance solutions from vendors such as IBM, Google, Confluent and Collibra help fulfil the enterprise-wide data governance needs of most companies, including data streaming technologies such as Apache Kafka and Flink.

My Data Streaming Journey with Kafka and Flink - 7 Years at Confluent

3.1K views
11 minute read

My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent

ByKai Waehner
3. May 2024
No comments

Time flies… I joined Confluent seven years ago when Apache Kafka was mainly used by a few tech giants and the company had ~100 employees. This blog post explores my data streaming journey, including Kafka becoming a de facto standard for over 100,000 organizations, Confluent doing an IPO on the NASDAQ stock exchange, 5000+ customers adopting a data streaming platform, and emerging new design approaches and technologies like data mesh, GenAI, and Apache Flink. I look at the past, present and future of my personal data streaming journey. Both, from the evolution of technology trends and the journey as a Confluent employee that started in a Silicon Valley startup and is now part of a global software and cloud company.

Google Apache Kafka for BigQuery GCP Cloud Service

3.6K views
8 minute read

When (Not) to Choose Google Managed Service for Apache Kafka?

ByKai Waehner
10. April 2024
No comments

Google announced its Apache Kafka for BigQuery cloud service at its conference Google Cloud Next 2024 in Las Vegas. Welcome to the data streaming club joining Amazon, Microsoft, IBM, Oracle, Confluent, and others. This blog post explores this new managed Kafka offering for GCP, reviews the current status of the data streaming landscape, and shares some criteria to evaluate when Kafka in general and Google Apache Kafka in particular should (not) be used.

19.5K views
12 minute read

The Past, Present and Future of Stream Processing

ByKai Waehner
20. March 2024
No comments

Stream processing has existed for decades. The adoption grows with open source frameworks like Apache Kafka and Flink in combination with fully managed cloud services. This blog post explores the past, present and future of stream processing, including the relation of machine learning and GenAI, streaming databases, and the integration between data streaming and data lakes with Apache Iceberg.

JavaScript Node JS Apache Kafka for Full Stack Data Streaming in Event Driven Architecture

5.8K views
10 minute read

JavaScript, Node.js and Apache Kafka for Full-Stack Data Streaming

ByKai Waehner
4. March 2024
No comments

JavaScript is a pivotal technology for web applications. With the emergence of Node.js, JavaScript became relevant for both client-side and server-side development, enabling a full-stack development approach with a single programming language. Both Node.js and Apache Kafka are built around event-driven architectures, making them naturally compatible for real-time data streaming. This blog post explores open-source JavaScript Clients for Apache Kafka and discusses the trade-offs and limitations of JavaScript Kafka producers and consumers compared to stream processing technologies such as Kafka Streams or Apache Flink.

Dish Wireless Cloud-native 5G Telco Network powered by Data Streaming with Apache Kafka

2.5K views
8 minute read

How Apache Kafka helps Dish Wireless building cloud-native 5G Telco Infrastructure

ByKai Waehner
27. October 2023
No comments

5G telco infrastructure provides the basic foundations of data movement and increasingly unlocks new capabilities for low latency and critical SLAs. Real-time data processing with data streaming using Apache Kafka enables innovation across industries. This blog post explores the success story of Dish Wireless and its cloud-native standalone 5G infrastructure leveraging data streaming.

5.6K views
5 minute read

Decentralized Data Mesh with Data Streaming in Financial Services

ByKai Waehner
28. October 2022
No comments

Digital transformation requires agility and fast time to market as critical factors for success in any enterprise. The decentralization with a data mesh separates applications and business units into independent domains. Data sharing in real-time with data streaming helps to provide information in the proper context to the correct application at the right time. This blog post explores a case study from the financial services sector where a data mesh was built across countries for loosely coupled data sharing but standardized enterprise-wide data governance.

Is Amazon MSK Serverless for Apache Kafka a Self-Driving Car or just a Car Engine

8.5K views
13 minute read

When NOT to choose Amazon MSK Serverless for Apache Kafka?

ByKai Waehner
30. August 2022
One comment

Apache Kafka became the de facto standard for data streaming. Various cloud offerings emerged and improved in the last years. Amazon MSK Serverless is the latest Kafka product from AWS. This blog post looks at its capabilities to explore how it relates to “the normal” partially managed Amazon MSK, when the serverless version is a good choice, and when other fully-managed cloud services like Confluent Cloud are the better option.

Technology Evangelist

Kai Waehner

Confluent

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

Open Standards for Data Lineage: OpenLineage for Batch AND Streaming

JavaScript, Node.js and Apache Kafka for Full-Stack Data Streaming

Technology Evangelist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)