Comparison: Data Preparation vs. Inline Data Wrangling in Machine Learning and Deep Learning Projects

Posted in Analytics, Big Data, Business Intelligence, Hadoop on February 13th, 2017 by Kai Wähner

I want to highlight a new presentation about Data Preparation in Data Science projects:

“Comparison of Programming Languages, Frameworks and Tools for Data Preprocessing and (Inline) Data Wrangling  in Machine Learning / Deep Learning Projects”

Data Preparation as Key for Success in Data Science Projects

A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

“Present, past and future of NoSQL, Big Data and Hadoop” – Speaker Interview with Kai Wähner at NoSQL Roadshow

Posted in Persistence on September 13th, 2013 by Kai Wähner

Just a short blog post with a link to an interview which I gave for NoSQL Roadshow 2013 in Zurich. I talk about present, past and future of NoSQL…

Here is the link to the NoSQL interview: http://nosqlroadshow.com/nosql-zurich-2013/interviewkai

I appreciate every feedback or discussions via @KaiWaehner, konktakt@kai-waehner.de, or social networks (LinkedIn, Xing).

Tags: , , , , ,

You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop Ecosystem (Pig, Hive, Hortonworks, Cloudera, MapR, Informatica, Talend)

Posted in Uncategorized on March 14th, 2013 by Kai Wähner

In March 2013, I was at 33rd Degree – “A Conference for Java Masters”. I had two talks, including a new one: “You are not Facebook or Google? Why you should still care about Big Data”. It is a great talk to give an overview about big data, especially from a business perspective (paradigm shift, business value, challenges). However, I also talk about alternatives for big data from a technology perspective, mainly about the defacto standard Apache Hadoop, its ecosystem (Hive, Pig, HBase, Oozie, Sqoop, etc.), distributions (Cloudera, Hortonworks, MapR), and tooling (i.e. big data suites, e.g. Talend, Informatica, Oracle, IBM).

Tags: , , , , , , , , , ,