Comparison: Data Preparation vs. Inline Data Wrangling in Machine Learning and Deep Learning Projects

Posted in Analytics, Big Data, Business Intelligence, Hadoop on February 13th, 2017 by Kai Wähner

I want to highlight a new presentation about Data Preparation in Data Science projects:

“Comparison of Programming Languages, Frameworks and Tools for Data Preprocessing and (Inline) Data Wrangling  in Machine Learning / Deep Learning Projects”

Data Preparation as Key for Success in Data Science Projects

A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Characteristics of a Good Visual Analytics and Data Discovery Tool

Posted in Analytics, Big Data, Business Intelligence, Hadoop on July 28th, 2016 by Kai Wähner

Visual Analytics and Data Discovery allow analysis of big data sets to find insights and valuable information. This is much more than just classical Business Intelligence (BI). See this article for more details and motivation: “Using Visual Analytics to Make Better Decisions: the Death Pill Example“. Let’s take a look at important characteristics to choose the right tool for your use cases.

Visual Analytics Tool Comparison and Evaluation

Several tools are available on the market for Visual Analytics and Data Discovery. Three of the most well known options are Tableau, Qlik and TIBCO Spotfire. Use the following list to compare and evaluate different tools to make the right decision for your project:

Tags: , , , , , , , , , , , , , , , , , , , ,

Comparison of Stream Processing Frameworks and Products

Posted in Analytics, Business Intelligence, Hadoop, In Memory on October 25th, 2015 by Kai Wähner

See how products, libraries, and frameworks that full under ‘streaming data analytics’ use cases are categorized and compared.

Streaming Analytics processes data in real time while it is in motion. This concept and technology emerged several years ago in financial trading, but it is growing increasingly important these days due to digitalization and Internet of Things (IoT). The following slide deck from a recent talk at a conference covers:

  • Real world success stories from different industries (Manufacturing, Retailing, Sports)
  • Alternative Frameworks and Products for Stream Processing
  • Complementary Relationship to Data Warehouse, Apache Hadoop, Statistics, Machine Learning, Open Source R, SAS, Matlab, etc.
Tags: , , , , , , , , , , , , , , , , , , , , ,

Comparison of Stream Processing and Streaming Analytics Alternatives (Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama)

Posted in Analytics, Big Data, Business Intelligence, Hadoop on September 10th, 2014 by Kai Wähner

The demand for stream processing is increasing a lot these days. Frameworks (Apache Storm, Spark) and products (e.g. IBM InfoSphere Streams, TIBCO StreamBase, Software AG Apama) for stream processing and streaming analytics are getting a lot of attention these days. The reason is that often processing big volumes of data is not enough. Data has to be processed fast, so that a firm can react to changing business conditions in real time. This is required for trading, fraud detection, system monitoring, and many other examples. A “too late architecture” cannot realize these use cases.

Tags: , , , , , , , , , , , , , , , ,

Fundamentals of Stream Processing (IBM InfoSphere Streams, TIBCO StreamBase, Apache Storm) – Book Review

Posted in Analytics, Big Data, Hadoop on July 1st, 2014 by Kai Wähner

Internet of things, cloud and mobile are the major drivers for stream processing. Use cases are network monitoring, intelligent surveillance, but also less technical things such as inventory management or fraud detection. The book helps a lot to get a basic understanding about history, concepts and patterns of the stream processing paradigm.

“Fundamentals of Stream Processing: Application Design, Systems, and Analytics” (www.amazon.com/Fundamentals-Stream-Processing-Application-Analytics/dp/1107015545) is one of only few books available about stream processing. Published in 2014 by Cambridge University Press. Authors are Henrique C. M. Andrade (JP Morgan, New York), Bugra Gedik (Bilkent University, Turkey), Deepak S. Turaga (IBM Thomas J. Watson Research Center, New York).

Tags: , , , , , , , , , , , , , , , , , , , , , , ,