Comparison: Data Preparation vs. Inline Data Wrangling in Machine Learning and Deep Learning Projects

I want to highlight a new presentation about Data Preparation in Data Science projects:

“Comparison of Programming Languages, Frameworks and Tools for Data Preprocessing and (Inline) Data Wrangling in Machine Learning / Deep Learning Projects”

Data Preparation as Key for Success in Data Science Projects

A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.

This session compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing (like Talend, Pentaho), streaming analytics ingestion (like Apache Storm, Flink, Apex, TIBCO StreamBase, IBM Streams, Software AG Apama), and data wrangling (DataWrangler, Trifacta) within visual analytics. Various options and their trade-offs are shown in live demos using different advanced analytics technologies and open source frameworks such as R, Python, Apache Hadoop, Spark, KNIME or RapidMiner. The session discusses how this is related to visual analytics tools (like TIBCO Spotfire). Therefore, it also shows best practices for how the data scientist and business analyst should work together to build good analytic models.

Key Takeaway: Inline Data Wrangling Within Visual Analytics Tooling

Key takeaways of this session:

–    Learn various options for preparing data sets to build analytic models
–    Understand the pros and cons and the targeted persona for each option
–    See different technologies and open source frameworks for data preparation
–    Understand the relation to visual analytics and streaming analytics, and how these concepts are actually leveraged to build the analytic model after data preparation

Slide Deck

The following shows the slide deck:

http://www.slideshare.net/KaiWaehner/data-preparation-vs-inline-data-wrangling-in-data-science-and-machine-learning

Video Recording: Data Preparation vs. (Inline) Data Wrangling

Here is the video recording:

Kai Waehner

bridging the gap between technical innovation and business value for data integration, workflow orchestration, and agentic AI.

Next Blockchain, Integration, Streaming Analytics, Ethereum, Hyperledger »

Previous « Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Services

Data Integration vs Workflow Orchestration: Connecting Systems Is Not Coordinating the Work

Data integration and workflow orchestration get confused because both ship hundreds of connectors. This post…

1 week ago

Process Intelligence

Process Intelligence Landscape 2026: Mining, Orchestration, and the Agentic AI Shift

Process intelligence has become three things, not one: mining, orchestration, and a decision gate. Here…

2 weeks ago

Data Integration

When to Use AMQP, JMS, Kafka, or MQTT: Trade-offs, Not a Winner

AMQP, JMS, Kafka, and MQTT get compared as rivals, but a message broker, a log,…

3 weeks ago

Data Streaming

Kafka vs Flink vs Spark: Do You Really Need Real-Time?

Most vendors sell milliseconds, but most enterprise use cases do not need them. A critical…

3 weeks ago

Data Integration

Edge to Cloud and Back: Four Data Movement Problems, and Why One Technology Never Solves All of Them

Edge to cloud is not one integration problem. It is four: telemetry going up, control…

4 weeks ago

Data Integration

Data Integration Landscape 2026: Event Streaming, API, and Batch in the Era of Agentic AI

The Data Integration Landscape 2026 maps every major vendor across three communication paradigms: request-response, event-driven,…