Read More

Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming

An open table format framework like Apache Iceberg is essential in the enterprise architecture to ensure reliable data management and sharing, seamless schema evolution, efficient handling of large-scale datasets and cost-efficient storage. This blog post explores market trends, adoption of table format frameworks like Iceberg, Hudi, Paimon, Delta Lake and XTable, and the product strategy of leading vendors of data platforms such as Snowflake, Databricks (Apache Spark), Confluent (Apache Kafka / Flink), Amazon Athena and Google BigQuery.
Read More
The Shift Left Architecture
Read More

The Shift Left Architecture – From Batch and Lakehouse to Real-Time Data Products with Data Streaming

Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences. This blog post introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg. Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics / AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.
Read More
SAP Datasphere and Apache Kafka as Data Fabric for ERP Integration
Read More

SAP Datasphere and Apache Kafka as Data Fabric for S/4HANA ERP Integration

SAP is the leading ERP solution across industries around the world. Data integration with other data platforms, applications, databases, and APIs is one of the hardest challenges in the IT and software landscape. This blog post explores how SAP Datasphere in conjunction with the data streaming platform Apache Kafka enables a reliable, scalable and open data fabric for connecting SAP business objects of ECC and S/4HANA ERP with other real-time, batch, or request-response interfaces.
Read More
The Heart of the Data Mesh Beats Real Time with Apache Kafka
Read More

The Heart of the Data Mesh Beats Real-Time with Apache Kafka

If there were a buzzword of the hour, it would undoubtedly be “data mesh”! This new architectural paradigm unlocks analytic and transactional data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios. The data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a decentralized data mesh infrastructure must be real-time, reliable, and scalable. Learn how the de facto standard for data streaming, Apache Kafka, plays a crucial role in building a data mesh.
Read More
Best Practices for Data Analytics with AWS Azure Googel BigQuery Spark Kafka Confluent Databricks
Read More

Best Practices for Building a Cloud-Native Data Warehouse or Data Lake

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 5: Best Practices for Building a Cloud-Native Data Warehouse or Data Lake.
Read More
Case Studies for Cloud Native Analytics with Data Warehouse Data Lake Data Streaming Lakehouse
Read More

Case Studies: Cloud-native Data Streaming for Data Warehouse Modernization

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 4: Case Studies for cloud-native data streaming and data warehouses.
Read More
Data Warehouse and Data Lake Modernization with Data Streaming
Read More

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.
Read More
Data Warehouse vs Data Lake vs Data Streaming Comparison
Read More

Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 1: Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Read More
Apache Kafka Transactions API vs Big Data Lake and Batch Analytics
Read More

Analytics vs. Transactions in Data Streaming with Apache Kafka

Workloads for analytics and transactions have very unlike characteristics and requirements. Many people think that Apache Kafka is not built for transactions and should only be used for big data analytics. This blog post explores when and how to use Kafka in resilient, mission-critical architectures and when to use the built-in Transaction API.
Read More
Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka
Read More

When to Use Reverse ETL and when it is an Anti-Pattern

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.
Read More