Apache Kafka and Snowflake Cost Efficiency and Data Governance
Read More

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a “shift left architecture” where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg’s open table format enable new use cases and a best of breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink.
Read More
Snowflake with Apache Kafka and Iceberg Connector
Read More

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion with a Kafka Connect connector, batch ingestion from large files, or leveraging a standard table format like Apache Iceberg. This blog post explores the alternatives and discusses its trade-offs. The end shows how data streaming helps with hybrid architectures where data needs to be ingested from the private data center into Snowflake in the public cloud.
Read More
Snowflake and Apache Kafka Data Integration Anti Patterns Zero Reverse ETL
Read More

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL and near real-time data ingestion with Apache Kafka. This blog post explores the different approaches and discovers its trade-offs. Following industry recommendations, it is suggested to avoid anti-patterns like Reverse ETL and instead use data streaming to enhance the flexibility, scalability, and maintainability of enterprise architecture.
Read More
Data Warehouse and Data Lake Modernization with Data Streaming
Read More

Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure

The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.
Read More
Reverse ETL Anti Pattern vs Event Streaming with Apache Kafka
Read More

When to Use Reverse ETL and when it is an Anti-Pattern

This blog post explores why software vendors (try to) introduce new solutions for Reverse ETL, when Reverse ETL is really needed, and how it fits into the enterprise architecture. The involvement of event streaming to process data in motion is a key piece of Reverse ETL for real-time use cases.
Read More