dwh Archives - Kai Waehner

Data Warehouse and Data Lake Modernization with Data Streaming

13.1K views
9 minute read

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

ByKai Waehner
15. July 2022

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

Serverless Kafka for Data in Motion as Rescue for Data at Rest in the Data Lake

15.9K views
12 minute read

Serverless Kafka in a Cloud-native Data Lake Architecture

ByKai Waehner
25. June 2021
1 share

Apache Kafka became the de facto standard for processing data in motion. Kafka is open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use a serverless Kafka SaaS offering to focus on business logic. However, hybrid scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden. This blog post explores how to leverage cloud-native and serverless Kafka offerings in a hybrid cloud architecture. We start from the perspective of data at rest with a data lake and explore its relation to data in motion with Kafka.

ByKai Waehner
10. September 2014

The article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from. Comparison of open source and proprietary stream processing / streaming analytics alternatives: Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG’s Apama, etc.

ByKai Waehner
13. May 2014

Slides from my talk “Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?”…

ByKai Waehner
26. June 2013

In this blog post, I will show you how to „ETL“ all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. You need not be a cloud or DWH expert, or an expert developer to integrate with Amazon’s cloud data warehouse Redshift. It is very easy with Talend’s integration solutions. Just drag&drop, configure, do some graphical mappings / transformations (if necessary), that’s it. Code is generated. Job runs. With Talend, you can easily „ETL“ all data from different sources to Redshift and store it there for under $1,000 per terabyte per year – even with the open source version!

Technology Evangelist

Kai Waehner

dwh

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

Comparison of Stream Processing and Streaming Analytics Alternatives (Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama)

“Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” – Slides (including TIBCO Examples) from JAX 2014 Online

Global Executive Technology Strategist

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j

Process Intelligence Explained: Mining, Orchestration, and the Decision Gate

Beyond Enterprise Data Lineage: The Case for a Platform-Independent Data Catalog