Cloud Archives - Kai Waehner

How I Trained a Chatbot K.AI of Myself Without Coding Evaluating OpenAI Custom GPT Chatbase Botsonic LiveChatAI

939 views
13 minute read

Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI

ByKai Waehner
23. June 2024
No comments

Generative AI (GenAI) enables many new use cases for enterprises and private citizens. While I work on real-time enterprise scale AI/ML deployments with data streaming, big data analytics and cloud-native software applications in my daily business life, I also wanted to train a conversational chatbot for myself. This blog post introduces my journey without coding to train K.AI, a personal chatbot that can be used to learn in a conversational pace format about data streaming and the most successful use cases in this area. Yes, this is also based on my expertise, domain knowledge and opinion, which is available as public internet data, like my hundreds of blog articles, LinkedIn shares, and YouTube videos.

7.0K views
8 minute read

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

ByKai Waehner
15. June 2024
No comments

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.

Data Streaming with Apache Kafka for Industrial IoT in the Automotive Industry at Brose

755 views
3 minute read

Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases

ByKai Waehner
13. June 2024
No comments

Data streaming unifies OT/IT workloads by connecting information from sensors, PLCs, robotics and other manufacturing systems at the edge with business applications and the big data analytics world in the cloud. This blog post explores how the global automotive supplier Brose deploys a hybrid industrial IoT architecture using Apache Kafka in combination with Eclipse Kura, OPC-UA, MuleSoft and SAP.

RAG and Kafka Flink to Prevent Hallucinations in GenAI

1.8K views
5 minute read

Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations

ByKai Waehner
30. May 2024
No comments

How do you prevent hallucinations from large language models (LLMs) in GenAI applications? LLMs need real-time, contextualized, and trustworthy data to generate the most reliable outputs. This blog post explains how RAG and a data streaming platform with Apache Kafka and Flink make that possible. A lightboard video shows how to build a context-specific real-time RAG architecture. Also, learn how the travel agency Expedia leverages data streaming with Generative AI using conversational chatbots to improve the customer experience and reduce the cost of service agents.

My Data Streaming Journey with Kafka and Flink - 7 Years at Confluent

2.0K views
11 minute read

My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent

ByKai Waehner
3. May 2024
No comments

Time flies… I joined Confluent seven years ago when Apache Kafka was mainly used by a few tech giants and the company had ~100 employees. This blog post explores my data streaming journey, including Kafka becoming a de facto standard for over 100,000 organizations, Confluent doing an IPO on the NASDAQ stock exchange, 5000+ customers adopting a data streaming platform, and emerging new design approaches and technologies like data mesh, GenAI, and Apache Flink. I look at the past, present and future of my personal data streaming journey. Both, from the evolution of technology trends and the journey as a Confluent employee that started in a Silicon Valley startup and is now part of a global software and cloud company.

Apache Kafka and Snowflake Cost Efficiency and Data Governance

2.2K views
10 minute read

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

ByKai Waehner
26. April 2024
No comments

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a “shift left architecture” where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg’s open table format enable new use cases and a best of breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink.

Snowflake with Apache Kafka and Iceberg Connector

1.7K views
8 minute read

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

ByKai Waehner
22. April 2024
No comments

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.

Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL

1.9K views
9 minute read

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

ByKai Waehner
19. April 2024
No comments

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.

Google Apache Kafka for BigQuery GCP Cloud Service

2.1K views
7 minute read

When (Not) to Choose Google Apache Kafka for BigQuery?

ByKai Waehner
10. April 2024
No comments

Google announced its Apache Kafka for BigQuery cloud service at its conference Google Cloud Next 2024 in Las Vegas. Welcome to the data streaming club joining Amazon, Microsoft, IBM, Oracle, Confluent, and others. This blog post explores this new managed Kafka offering for GCP, reviews the current status of the data streaming landscape, and shares some criteria to evaluate when Kafka in general and Google Apache Kafka in particular should (not) be used.

Streaming Analytics SQL API with Apache Kafka Confluent ClickHouse Tinybird

5.8K views
8 minute read

Apache Kafka and Tinybird (ClickHouse) for Streaming Analytics HTTP APIs

ByKai Waehner
4. April 2024
No comments

Apache Kafka became the de facto standard for data streaming. However, the combination of an event-driven architecture with request-response APIs is crucial for most enterprise architectures. This blog post explores how Tinybird innovates with a REST/HTTP layer on top of the open source analytics database ClickHouse in the cloud. Integrating Kafka with Tinybird, the benefits of fully managed services like Confluent Cloud, and customer stories from Factorial and FanDuel show why Kafka and analytics databases complement each other for more innovation and faster time-to-market.

Technology Evangelist

Kai Waehner

Cloud

Technology Evangelist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

The Digitalization of Airport and Airlines with IoT and Data Streaming using Kafka and Flink

Energy Trading with Apache Kafka and Flink