Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this IIoT integration end to end in a scalable, reliable and flexible way.
This blog post covers a high level overview about the challenges and a good, flexible architecture to solve the problems. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.
Challenges in IIoT / Industry 4.0
Here are some of the key challenges in IIoT / Industry 4.0:
- IoT != IIoT: Automation industry does not use MQTT or other standards, but is slow, insecure, not scalable and proprietary.
- Product Lifecycles are very long (tens of years), no simple changes or upgrades
- IIoT usually uses incompatible protocols, typically proprietary and just built for one specific vendor.
- Automation industry uses proprietary and expensive monoliths which are not scalable and not extendible.
- Machines and PLCs are insecure by nature with no authentication, no authorization, no encryption.
This is still state of the art in automation industry. This is no surprise with such long product life cycles, but still very concerning.
Evolution of Convergence between IT and Automation Industry
Today, everybody talks about cloud, big data analytics, machine learning and real time processing at scale. The convergence between IT and Automation Industry is coming, as the analyst report from IoT research company IOT Analytics shows:
There is huge demand to build an open, flexible, scalable platform. Many opportunities from business and technical perspective:
- Cost reduction
So, how to get from legacy technologies and proprietary IIoT protocols to cloud, big data, machine learning, real time processing? How to build a reliable, scalable and flexible architecture and infrastructure?
Apache Kafka and Apache PLC4X for End-to-End IIoT Integration
I assume you already know it: Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides
- Open Source (Apache 2.0 License)
- Persistent Storage
- Stream Processing
If you need more details about Apache Kafka, check out the Kafka website, the extensive Confluent documentation or some free video recordings and slides from any Kafka Summit to learn about the technology and use cases.
The only very important thing I want to point out is that Apache Kafka includes Kafka Connect and Kafka Streams:
Kafka Connect enables reliable and scalable integration of Kafka with other systems. Kafka Streams allows to write standard Java apps and microservices to continuously process your data in real-time with a lightweight stream processing API. And finally, KSQL enables Stream Processing using SQL-like Semantics.
Apache PLC4X for PLC Integration (Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, etc.)
Apache PLC4X is less established on the market than Apache Kafka. It also “just covers a niche” (a big one, of course) compared to Kafka, which is used in any industry for many different use cases. However, PLC4X is a very interesting top level Apache project for automation industry.
The Goal is to open up PLC interfaces from IIoT world to the outside world. PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.
PLC4X provides a Kafka Connect connector. Therefore, you can leverage the benefits of Apache Kafka (high availability, high throughput, high scalability reliability, real time processing) to deploy PLC4X integration pipelines. With this, you can build one single architecture and infrastructure for
- legacy IIoT connectivity using PLC4X and Kafka Connect
- data processing using Kafka Streams / KSQL
- integration with the rest of the enterprise using Kafka Connect and any other sink (database, big data analytics, machine learning, ERP, CRM, cloud services, custom business applications, etc.)
As Kafka decouples the producers from the consumers, you can consume the IIoT machine sensor data from any application – some might be real time, some might be batch, and some might be request-response communication for human interaction on a web or mobile app.
Apache PLC4X vs. OPC-UA
A little bit off-topic: How to choose between Apache PLC4X (open source framework for IIoT) and OPC-UA (open standard for IIoT). In short, both are different things and can also be complementary. Here is a comparison:
- Open standard
- All the pros and cons of an open standard (works with different vendors; slow adoption; inflexible, etc.)
- Often poorly implemented by the vendors
- Requires app server on top of PLC
- Every device has to be retrofitted with the ability to speak a new protocol and use a common client to speak with these devices
- Often over-engineering for just reading the data
- Activating OPC-UA support on existing PLCs greatly increases the load on the PLCs
- With licensing cost for every machine
- Open source framework (Apache 2.0 license)
- Provides unified API by implementing drivers for communicating with most industrial controllers in the protocols they natively understand
- No need to modify existing hardware
- No increased load on the PLCs
- No need to pay for licenses to activate OPC-UA support
- Drivers being implemented from the specs or by reverse engineering protocols in order to be fully Apache 2.0 licensed
- PLC4X adapter for OPC-UA available -> Both can be used together!
As you see, both have their pros and cons. To me, and this is clearly my subjective opinion, PLC4X provides a great alternatives with high flexibility and low footprint.
Confluent and IoT Platform Solutions
Many IoT Platform Solutions are available on the market. This includes products like Siemens MindSphere or Cisco Kinetic, and cloud services from the major cloud providers like AWS, GCP or Azure. And you have Kafka + PLC4X as you just learned above. Often, this is not a “neither … nor” decision:
You can either use
- just Kafka and PLC4X for lightweight and flexible IIoT integration based on a scalable, reliable and open event streaming platform
- just a IoT Platform Solution if the pros of such a specific product (dedicated for a specific vendor protocol, nice GUI, etc.) outperform the cons (like high cost, proprietary and inflexible solution)
- both together where you use the IoT Platform Solution to integrate with the PLCs and then send the data to Kafka to integrate with the rest of the enterprise (with all the benefits and added value Kafka brings)
- both together where you use Kafka and PLC4X for PLC integration and one of the consumers is the IoT Platform Solution (while other consumers can also get the data from Kafka – fully decoupled from the IoT Platform Solution)
All alternatives have their pros and cons. There is no single solution which fits every use case! Therefore, no surprise that most IoT Solution Platforms provide Kafka source and sink connectors.
Apache Kafka and Apache PLC4X – Slides / Video Recording / Github Code Example
If you got curious about more details and insights, please check out my video recording and slide deck.
Slide Deck – Apache Kafka and PLC4X:
Video Recording – Apache Kafka and PLC4X:
Github Code Example – Apache Kafka and PLC4X:
We are also building a nice and simple demo on Github these days:
Kafka-native end-to-end IIoT Data Integration and Processing with Kafka Connect, KSQL and Apache PLC4X
PLC4X gets most exciting if you try it out by yourself and connect to your machines or tools. So, check out the example and adjust it to connect to your infrastructure.
Feedback and Questions?
Please let me know your feedback and questions about Kafka, its ecosystem and PLC4X for IIoT integration. Let’s also connect on LinkedIn to discuss interesting IIoT use cases and technologies in the future.
Hi Kai, Do you know if PLC4X is suitable for production? The project still looks pretty immature, version 0.1 I think, and it’s been that way for quite a long time.
Apache PLC4X is production-ready. Several companies use it in their OT/IT infrastructure. Having said that, it is “just” a framework. You need to operate and support it by yourself (or find a vendor or system integrator who helps).