Kafka for Cybersecurity (Part 5 of 6) – Zero Trust and Air-Gapped Environments

Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from various real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM / SOAR modernization. This post is part five: Air-gapped and Zero Trust Environments.

Blog series: Apache Kafka for Cybersecurity

This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases, architectures, and reference deployments for Kafka in the cybersecurity space:

Part 1: Data in Motion as cybersecurity backbone
Part 2: Situational awareness
Part 3: Threat intelligence
Part 4: Forensics
Part 5 (THIS POST): Air-gapped and zero trust environments
Part 6: SIEM / SOAR modernization

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Zero Trust – A Firewall is NOT Good Enough

The zero trust security model (also, zero trust architecture, zero trust network architecture, sometimes known as perimeterless security, describes an approach to the design and implementation of IT systems. The main concept behind zero trust is that devices and other interfaces should not be trusted by default, even if they are connected to a managed corporate network such as the corporate LAN and even if the devices were previously verified.

In most modern enterprise environments, corporate networks consist of many interconnected segments, cloud-based services and infrastructure, connections to remote and mobile environments, and increasing connections to non-conventional IT, such as IoT devices. The once traditional approach of trusting devices within a notional corporate perimeter, or devices connected to it via a VPN, makes less sense in such highly diverse and distributed environments. Instead, the zero trust approach advocates mutual authentication, including checking the identity and integrity of devices without respect to location, and providing access to applications and services based on the confidence of device identity and device health in combination with user authentication.

Technical Requirements for Zero Trust Environments

Authentication and authorization are key success factors to secure software. Therefore, enterprise-ready software requires features such as role-based access control (RBAC), Active Directory / LDAP integration, audit logs, end-to-end encryption in motion, and at rest, bring your own key (BYOK), and more. But that is just one piece of the puzzle!

Additionally, zero trust environments require:

Protection of everything, not just firewalls and computing assets
Threat intelligence that includes the cyber network and human intelligence
Safe IT/OT integration at industrial sites
Unidirectional communication enforced through hardware and/or software-based security solutions
Replica servers instead of direct access
Surveillance for safety and theft Protection

Obviously, the implementation of all these requirements is tough and expensive. Hence, some environments might accept less secure infrastructure. A risk assessment, regulations, and due diligence will tell you how much investment into the security is required.

For this reason, many so-called zero trust environments are not as secure as technically possible. A firewall in combination with RBAC, BYOK, and other security features is good enough for some teams. It is always a trade-off. But especially safety-critical infrastructure in air-gapped environments requires a true zero trust architecture.

Air-Gapped Environments and OT/IT Integration

A truly air-gapped environment requires the strongest security implementation. These infrastructures require a zero trust architecture. Relevant examples include power plants, oil rigs, water utilities, railway networks, manufacturing, airplanes (between flight control units and in-flight entertainment systems), and more.

From a technical perspective, zero trust includes unidirectional communication and strict surveillance of people and things entering/leaving the site.

A lot of the content from this section is from my notes of the material from Waterfall Security, a leading provider of unidirectional hardware gateways.

Unidirectional Gateway (= Unidirectional Network = Data Diode)

A zero trust infrastructure requires a robust network segmentation between IT and OT networks. A firewall is not enough. More on this discussion later.

A unidirectional gateway (also referred to as a unidirectional network or data diode) is a network appliance or device that allows data to travel in only one direction.

After years of development, unidirectional networks have evolved from being only a network appliance or device allowing raw data to travel only in one direction, used in guaranteeing information security or protection of critical digital systems, such as industrial control systems, from inbound cyber attacks, to combinations of hardware and software running in proxy computers in the source and destination networks.

The hardware permits data to flow from one network to another but is physically unable to send any information back into the source network. The software replicates databases and emulates protocol servers and devices.

Server replication software replicates databases and other servers from industrial networks to enterprise networks through unidirectional gateway hardware. Users and other applications interact naturally with the replica servers in normal enterprise IT environments. (e.g., replication of an OT historian database to an IT replication for business users)

The product solutions from vendors differ a lot. The hardware gateways from some vendors have limited or no software support. Oh, and no surprise, the branding differs, too. Some vendors sell data diodes, others sell unidirectional gateways, and others tell you that unidirectional gateways are better than data diodes. I stick with the term unidirectional gateway in this post but mean all the vendors.

Use Cases for Unidirectional Gateways

The main reason why enterprises and governments take the cost and efforts to deploy unidirectional gateways is the guarantee to build a secure, hardware-based OT – IT bridge.

Here are some use cases for unidirectional gateways:

Database replication
File transfer
Cloud integration
Device emulation data sniffing
Remote diagnostics and maintenance
Real-time monitoring of safety-critical networks
Scheduled application and operating system updates

Various architecture options enable enforce unidirectional communication. This does not mean that you cannot communicate in both directions. Different patterns exist, such as switching the direction for some time or deploying two unidirectional gateways (one in each direction).

The book Secure Operations Technology from Andrew Ginter is a great resource to learn about the use cases, challenges, best practices, and various architectures for zero trust environments powered by unidirectional gateways.

Firewall (= Software) vs. Unidirectional Gateway (Hardware)

Most people say that their infrastructure is secure when I talk to customers because they use a firewall plus infrastructure- and application-specific configuration such as RBAC, encryption, BYOK, etc. Actually, this is good enough for many use cases because the risk assessment compares the cost to the risk.

Unfortunately, there is no such thing as a “unidirectional firewall”. All TCP and other connections through firewalls are intrinsically bidirectional. A full-duplex SYNC/ACK nature is easily exploited. This is why UDP is preferred for cyber data. Cyber data is not transactional data, so losing bits of it, not having it all, or not getting it in the right order at the right time really doesn’t matter. Threat intelligence is about pattern recognition, not about recognizing revenue.

Therefore, requirements in an air-gapped zero trust environment are different: Instead of discussing data loss or guaranteed ordering, the main goal is unidirectional communication and no technical ability to send data from the outside in.

A firewall is just software. Waterfall’s whitepaper describes the risks of a firewall compared to a hardware-based unidirectional gateway:

A firewall is not good enough for some safety scenarios, as you can see in the above table.

With this long introduction, let’s discuss how Apache Kafka fits into this discussion around air-gapped environments and unidirectional zero trust communication.

Kafka in Zero Trust Edge Environments

Event streaming with Apache Kafka at the edge is not cutting edge anymore. It is a common approach to providing the same open, flexible, and scalable architecture at the edge as in the cloud or data center.

Possible locations for a Kafka edge deployment include retail stores, cell towers, trains, small factories, restaurants, etc. I already discussed the concepts and architectures in detail in the past: “Apache Kafka is the New Black at the Edge” and “Use Cases and Architectures for Kafka at the Edge“.

Kafka in an Air-Gapped Environment

Data in Motion in air-gapped and zero trust environments requires the self-managed deployment of the Kafka infrastructure. This ensures disconnected local event streaming without the need for internet communication. Ideally, engineering teams leverage provisioning and operations tools for Kafka, such as Confluent’s open-source Ansible playbooks and installer or the Confluent Operator for Kubernetes.

Some environments are truly air-gapped and disconnected. However, in many cases, at least a unidirectional replication of some information to the outside world is required.

Hybrid Architecture with Air-Gapped Edge and Cloud Kafka

I wrote a lot about deploying Kafka in hybrid architectures. Check out the following good summary of Kafka deployment patterns: “Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments“.

Some Confluent customers deploy Confluent in front of other air-gapped edge infrastructure to provide a secure intermediary (on Linux) between the existing (Windows) hardware and modern (Linux) infrastructure:

Kafka-native tools for bidirectional replication between the air-gapped environments and the public data center or cloud include MirrorMaker 2, Confluent Replicator, and Confluent Cluster Linking. The latter is my recommended option as it uses the native Kafka protocol for replication. Hence, it does not require separate infrastructure via Kafka Connect like the other two options.

Initiation from the Air-Gapped Kafka and Replication to the Cloud

In many zero trust environments, only the secured site can initiate the communication to remote sites. Not the other way round. This makes sense. It is much harder for external attackers to get communication running as the firewall configuration is very restrictive. However, it is challenging to implement in some scenarios. Not all tools support this requirement.

Confluent provides a feature called “Source Connection Origination” that solves exactly this problem. The source application always initiates communication from the secure on-prem site. You don’t need to get an exception from your network security team!

Hence, data is sent both ways between Kafka clusters without opening up any on-prem firewalls:

If you need even better security, then a unidirectional Kafka gateway is the way to go.

Kafka-native Unidirectional Data Diode

A hardware-based unidirectional gateway is the foundation of true zero trust infrastructures. In the past, the software running on top focused on file and database replication from the edge site to the remote site safely.

Air-gapped safety environments have the same challenges as any other data center:

Higher volumes of events with the need for elastic scale
Demand for real-time capabilities and flexible deployment of applications
Processing data in motion instead of just storing data at rest for batch processing

Hence, Kafka is obviously the right choice for building a software-based unidirectional gateway for zero trust security architectures. This enables streaming data from industrial networks to enterprise networks.

Confluent Data Diode for Streaming OT/IT Replication

Confluent Data Diode combines UDP-based Kafka Connect source and sink connectors for high volume streaming replication between OT and IT. Like any other software for unidirectional gateways, it should run over a one-way hardware interface (Ethernet cable, OWL Cyber, Waterfall WF-500, or any other gateway).

The open architecture of the Confluent Data Diode enables the implementation of additional data processing scenarios such as filtering, anomaly detection, analytics, receive upstream traffic, etc. The blog post “Apache Kafka as Modern Data Historian” explores the capabilities and relation to traditional historians in more detail.

The following diagram shows the architecture of the Confluent Data Diode:

The Data Diode connector solves a similar purpose as Confluent Replicator; however, the big difference is that the data diode connector works over UDP, while Confluent Replicator requires TCP/IP.

The Data Diode connector is meant to be used in a high-security unidirectional network. The network settings do not permit TCP/IP packets in such networks, and UDP packets are only allowed in one direction. The sink connector serializes one or more Kafka records into a datagram packet and sends it to a remote server running the Data Diode Source Connector. The sink connector must be installed in the source Kafka cluster.

Example: Border Protection with Kafka for Security and Surveillance

Let’s take a look at an example I have seen at various customers in similar ways: Security and video surveillance in real-time at the edge (like at the border control):

The example is generic. The architecture is very flexible. Deployment is possible in

Zero trust environments using unidirectional gateways and Confluent’s Data Diode
Less safety-critical (but still critical) environments secured with firewalls, VPN, etc. leveraging the source connection initiation to replicate events in real-time from the edge to the remote data center or cloud
Disconnected edge environments that operate completely offline (or maybe replicate data in a manual process from time to time, e.g., using a USB stick).

Use cases include data integration at the edge, data processing (such as image recognition, data correlation, threat detection, and alerting in real-time), and replication to the remote data center.

Kafka Enables Disconnected Edge Computing and Unidirectional Zero Trust Replication

The firewall is NOT a secure solution for zero trust environments. That might be the key lesson learned for many IT-related people. Frankly, I never heard about hardware-based unidirectional gateways until a few months ago when we had OT/IT conversations with some customers.

The Confluent Data Diode is an exciting solution to modernize the OT/IT integration in a zero trust environment via streaming replication at scale. This is the next step after many companies already run Kafka at the disconnected edge or in hybrid scenarios using “traditional Kafka replication methods” such as MirrorMaker or Confluent Replicator.

Do you already deploy Kafka in air-gapped and zero trust environments? How do you solve the security questions? Do you use a hardware-based unidirectional gateway? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

Kai Waehner

builds cloud-native event streaming infrastructures for real-time data processing and analytics