Apache Kafka

Kafka for Cybersecurity (Part 3 of 6) – Cyber Threat Intelligence

Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from various real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM / SOAR modernization. This post is part three: Cyber Threat Intelligence.

Blog series: Apache Kafka for Cybersecurity

This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases,  architectures, and reference deployments for Kafka in the cybersecurity space:

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Cyber Threat Intelligence

Threat intelligence, or cyber threat intelligence, reduces harm by improving decision-making before, during, and after cybersecurity incidents reducing operational mean time to recovery, and reducing adversary dwell time for information technology environments.

Threat intelligence is evidence-based knowledge, including context, mechanisms, indicators, implications, and action-oriented advice about an existing or emerging menace or hazard to assets. This intelligence can be used to inform decisions regarding the subject’s response to that menace or hazard.

Threat intelligence solutions gather raw data about emerging or existing threat actors & threats from various sources. This data is then analyzed and filtered to produce threat intel feeds and management reports that contain information that automated security control solutions can use.

Threat intelligence keeps organizations informed of the risks of advanced persistent threats, zero-day threats and exploits, and how to protect against them.

Situational Awareness is Not Enough…

… but the foundation to collect and pre-process data in real-time at scale. Only real-time situational awareness enables real-time threat intelligence to provide huge benefits to the enterprise:

  • Mitigate harmful events in cyberspace
  • Proactive cybersecurity posture that is predictive, not just reactive
  • Bolster overall risk management policies
  • Improved detection of threats
  • Better decision-making during and following the detection of a cyber intrusion

In summary, threat intelligence allows to:

  • See the whole board. And see it more quickly.
  • See around corners.
  • See the enemy before they see you.

Threat Intelligence for Prevention or Mitigation across the Cyber Kill Chain

Threat intelligence is the knowledge that allows you to prevent or mitigate cyberattacks. It covers all the phases of the so-called “Cyber Kill Chain“:

Threat intelligence provides several benefits:

  • Empowers organizations to develop a proactive cybersecurity posture and to bolster overall risk management policies
  • Drives momentum toward a cybersecurity posture that is predictive, not just reactive
  • Enables improved detection of threats
  • Informs better decision-making during and following the detection of a cyber intrusion

Transactional Data vs. Analytics Data

Most use cases around data-in-motion are about all the data. This is true for all transactional use cases and even for many analytical use cases. Each event is valuable: A sale, an order, a payment, an alert from a machine, etc.

However, data is often full of noise. As I discussed earlier in this blog series, the goal in the cybersecurity space is to find the needle in the haystack and to reduce false-positive alerts.

SIEM, SOAR, OT, and ICS are almost always analytic processing regimes, BUT knowing when they are not is important. Kafka can configure topics to be tuned for transactions or analytics.  That is unprecedented in the history of data processing. Threat intelligence (= awareness-in-motion) assumes the PATTERN is valuable, not the data.

Analytics in Motion powered by Kafka Streams / ksqlDB

As you can hopefully imagine from the above requirements and characteristics, event streaming with Apache Kafka and its streaming analytics ecosystem is a perfect fit for the technical infrastructure for threat intelligence.

Threat detection makes sense of the signal and the noise of the data by continuously processing signatures. This enables to detect, contain and neutralize threats proactively:

Analytics can be many things in such a scenario:

On a high level, the advantages of using Kafka Streams or ksqlDB for threat intelligence can be described as follows:

  • A single scalable and reliable real-time infrastructure for end-to-end data integration and data processing
  • Flexibility to write custom rules and embed other rules engines, frameworks, or trained models
  • Integration with other threat detection systems like IDS, SIEM, SOAR

The business logic for cyber threat detection looks different for every use case. Known attack patterns like MITRE ATT&ACK help with the implementation. However, situational awareness and threat detection also need to detect unknown anomalies.

Let’s now take a look at a concrete example.

Intel’s Cyber Intelligence Platform

Let me quote Intel themselves:

As cyber threats continuously grow in sophistication and frequency, companies need to quickly acclimate to effectively detect, respond, and protect their environments. At Intel, we’ve addressed this need by implementing a modern, scalable Cyber Intelligence Platform (CIP) based on Splunk and Confluent. We believe that CIP positions us for the best defense against cyber threats well into the future.

Our CIP ingests tens of terabytes of data each day and transforms it into actionable insights through streams processing, context-smart applications, and advanced analytics techniques. Kafka serves as a massive data pipeline within the platform. It provides us the ability to operate on data in-stream, enabling us to reduce Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR). Faster detection and response ultimately leads to better prevention.

Let’s explore Intel’s CIP for threat intelligence in more detail.

Detecting Vulnerabilities with Stream Processing

Intel’s CIP leverages the whole Kafka ecosystem provided by Confluent:

  • Ingestion: Kafka producer clients for various sources such as databases, scanning engines, IP address management, asset management inventory, etc.
  • Streaming analytics: Kafka Streams for filtering vulnerabilities by business unit, joining asset ownership with vulnerable assets, etc.
  • Egress: Kafka Connect sink connectors for data lakes, IT partners, other business units, SIEM, SOAR, etc.
  • High availability: Multi-Region Clusters (MRC) for high availability across regions
  • And much more…

Here is a high-level architecture:

Intel’s Kafka Maturity Timeline

Building a cybersecurity infrastructure is not a big bang. A step-by-step approach starts with integrating the first sources and sinks, some simple stream processing, and deployment as a pilot project. Over time, more and more data sources and sinks are added, the business logic gets more powerful, and the scale increases.

Intel’s Kafka maturity timeline shows their learning curve:

Kafka Benefits to Intel

Intel describes their benefits for leveraging event streaming as follows:

  • Economies of scale
  • Operate on data in the stream
  • Reduce technical debt and downstream costs
  • Generates contextually rich data
  • Global scale and reach
  • Always on
  • Modern architecture with a thriving community
  • Kafka leadership through Confluent expertise

That’s pretty much the same reasons I use in many of my other blog posts to explain the rise of data in motion powered by Apache Kafka across industries and use cases… 🙂

For more intel on Intel’s Cyber Intelligence Platform powered by Confluent and Splunk, check out their whitepaper and Kafka Summit talk.

Scalable Real-time Cyber Threat Intelligence with Kafka

Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from various real-time and batch data sources to implement cyber threat intelligence.

The Cyber Intelligence Platform from Intel is a great example of a Kafka-powered cybersecurity solution. It leverages the whole Kafka ecosystem to build a scalable and reliable real-time integration and processing layer. The streaming analytics logic depends on the use case. It can cover simple business logic but also external rules engines or analytic models.

How do you fight against cybersecurity risks? What technologies and architectures do you use to implement cyber threat intelligence? How does Kafka complement other tools? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

Kai Waehner

builds cloud-native event streaming infrastructures for real-time data processing and analytics

Recent Posts

Open Standards for Data Lineage: OpenLineage for Batch AND Streaming

One of the greatest wishes of companies is end-to-end visibility in their operational and analytical…

3 days ago

My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent

Time flies… I joined Confluent seven years ago when Apache Kafka was mainly used by…

2 weeks ago

Apache Kafka + Flink + Snowflake: Cost Efficient Analytics and Data Governance

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables…

3 weeks ago

Snowflake Data Integration Options for Apache Kafka (including Iceberg)

The integration between Apache Kafka and Snowflake is often cumbersome. Options include near real-time ingestion…

3 weeks ago

Snowflake Integration Patterns: Zero ETL and Reverse ETL vs. Apache Kafka

Snowflake is a leading cloud-native data warehouse. Integration patterns include batch data integration, Zero ETL…

4 weeks ago

When (Not) to Choose Google Apache Kafka for BigQuery?

Google announced its Apache Kafka for BigQuery cloud service at its conference Google Cloud Next…

1 month ago