Comparison: Data Preparation vs. Inline Data Wrangling in Machine Learning and Deep Learning Projects

Posted in Analytics, Big Data, Business Intelligence, Hadoop on February 13th, 2017 by Kai Wähner

I want to highlight a new presentation about Data Preparation in Data Science projects:

“Comparison of Programming Languages, Frameworks and Tools for Data Preprocessing and (Inline) Data Wrangling  in Machine Learning / Deep Learning Projects”

Data Preparation as Key for Success in Data Science Projects

A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comparison Of Log Analytics for Distributed Microservices – Open Source Frameworks, SaaS and Enterprise Products

Posted in Analytics, Big Data, Business Intelligence, Cloud, Hadoop, Microservices, SOA on October 20th, 2016 by Kai Wähner

I had two sessions at O’Reilly Software Architecture Conference in London in October 2016. It is the first #OReillySACon in London. A very good organized conference with plenty of great speakers and sessions. I can really recommend this conference and its siblings in other cities such as San Francisco or New York if you want to learn about good software architectures and new concepts, best practices and technologies. Some of the hot topics this year besides microservices are DevOps, serverless architectures and big data analytics.

I want to share the slide of my session about comparing open source frameworks, SaaS and Enterprise products regarding log analytics for distributed microservices:

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Framework and Product Comparison for Big Data Log Analytics and ITOA

Posted in Analytics, Big Data, Hadoop, Microservices on February 4th, 2016 by Kai Wähner

In February 2016, I presented a brand new talk at OOP in Munich: “Comparison of Frameworks and Tools for Big Data Log Analytics and IT Operations Analytics”. The focus of the talk is to discuss different open source frameworks, SaaS cloud offerings and enterprise products for analyzing big masses of distributed log events. This topic is getting much more traction these days with the emerging architecture concept of Microservices.

Key Take-Aways

  • Log Analytics enables IT Operations Analytics for Machine Data
  • Correlation of Events is the Key for Added Business Value
  • Log Management is complementary to other Big Data Components
Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Difference between a Data Warehouse and a Live Datamart?

Posted in Analytics, Big Data, Business Intelligence, In Memory on October 9th, 2015 by Kai Wähner

Data Warehouses have existed for many years in almost every company. While they are still as good and relevant for the same use cases as they were 20 years ago, they cannot solve new, existing challenges and those sure to come in a ever-changing digital world. The upcoming sections will clarify when to still use a Data Warehouse and when to use a modern Live Datamart instead.

What is a Data Warehouse (DWH)?

A Data Warehouse is a central repository of integrated data from more disparate sources. It stores historical data to create analytical reports for knowledge workers throughout the enterprise. A DWH includes a server, which stores the historical data and a client for analysis and reporting.

Tags: , , , , , , , , , , , , ,

Comparison of Stream Processing and Streaming Analytics Alternatives (Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama)

Posted in Analytics, Big Data, Business Intelligence, Hadoop on September 10th, 2014 by Kai Wähner

The demand for stream processing is increasing a lot these days. Frameworks (Apache Storm, Spark) and products (e.g. IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama) for stream processing and streaming analytics are getting a lot of attention these days. The reason is that often processing big volumes of data is not enough. Data has to be processed fast, so that a firm can react to changing business conditions in real time. This is required for trading, fraud detection, system monitoring, and many other examples. A “too late architecture” cannot realize these use cases.

Tags: , , , , , , , , , , , , , , , ,

“Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” – Slides (including TIBCO Examples) from JAX 2014 Online

Posted in Analytics, Big Data, Business Intelligence, Cloud, ESB, Hadoop on May 13th, 2014 by Kai Wähner

Slides from my talk “Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about Real Time?” at JAX 2014 (Twitter #jaxcon) in Mainz are online. JAX is a great conference with interesting topics and many good speakers!

Content (Data Warehouse, Business Intelligence, Hadoop, Stream Processing)

Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. New business models based on predictive analytics, such as recommendation systems or fraud detection, are relevant more than ever before. Apache Hadoop seems to become the de facto standard for implementing big data solutions. For that reason, solutions from many different vendors emerged on top of Hadoop.

Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Integration of Amazon Redshift Cloud Data Warehouse (AWS SaaS DWH) with Talend Data Integration (DI) / Big Data (BD) / Enterprise Service Bus (ESB)

Posted in Cloud, EAI, ESB on June 26th, 2013 by Kai Wähner

In this blog post, I will show you how to „ETL“ all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. Let’s begin with a short introduction to Amazon Redshift (copied from website):

„Amazon Redshift is [part of Amazon Web Services (AWS) and] a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. With a few clicks in the AWS Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for under $1,000 per terabyte per year.
Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data very quickly.“

Tags: , , , , , , , , , , , , , ,