The Future of Data Streaming with Apache Flink for Agentic AI

The Future of Data Streaming with Apache Flink for Agentic AI Supporting A2A and MCP
Agentic AI is moving into production. Autonomous, tool-using, goal-driven systems that need real-time data and context. Apache Kafka and Flink provide the event-driven foundation to run these agents at scale. With the new Flink Agents project (FLIP-531), Flink will natively support long-running, system-triggered AI agents integrated with LLMs, tools, and emerging protocols like MCP and A2A. This marks a major step toward reliable, enterprise-grade Agentic AI.

Agentic AI is changing how enterprises think about automation and intelligence. Agents are no longer reactive systems. They are goal-driven, context-aware, and capable of autonomous decision-making. But to operate effectively, agents must be connected to the real-time pulse of the business. This is where data streaming with Apache Kafka and Apache Flink becomes essential.

Apache Flink is entering a new phase with the proposal of Flink Agents, a sub-project designed to power system-triggered, event-driven AI agents natively within Flink’s streaming runtime. Let’s explore what this means for the future of agentic systems in the enterprise.

The Future of Data Streaming with Apache Flink for Agentic AI Supporting A2A and MCP

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various real-world examples AI-related topics like fraud detection and generative AI for customer service.

The State of Agentic AI

Agentic AI is no longer experimental. It is still in early stage of the adoption lifecycle, but starting to move into production for first critical use cases.

Agents today are expected to:

  • Make real-time decisions
  • Maintain memory across interactions
  • Use tools autonomously
  • Collaborate with other agents

But these goals face real infrastructure challenges. Existing frameworks like LangChain or LlamaIndex are great for prototyping. But (without the help of other tools) they are not designed for long-running, system-triggered workflows that need high availability, fault tolerance, and deep integration with enterprise data systems.

The real problem is integration. Agents must operate on live data, interact with tools and models, and work across systems. This complexity demands a new kind of runtime. One that is real-time, event-driven, and deeply contextual.

Open Standards and Protocols for Agentic AI: MCP and A2A

Standards are emerging to build scalable, interoperable AI agents. Two of the most important are:

  • Model Context Protocol (MCP) by Anthropic: A standardized interface for agents to access context, use tools, and generate responses. It abstracts how models interact with their environment, enabling plug-and-play workflows.
  • Agent2Agent (A2A) protocol by Google: A protocol for communication between autonomous agents. It defines how agents discover each other, exchange messages, and collaborate asynchronously.

These standards help define what agents do and how they do it. But protocols alone are not enough. Enterprises need a runtime to execute these agent workflows in production—with consistency, scale, and reliability. This is where Flink fits in.

Similar to microservices a decade ago, Agentic AI and protocols like MCP and A2A risk creating tight coupling and point-to-point spaghetti architectures if used in isolation. An event-driven data streaming backbone ensures these standards deliver scalable, resilient, and governed agent ecosystems instead of repeating past mistakes.

Apache Kafka and Flink together form the event-driven backbone for Agentic AI.

  • Apache Kafka provides durable, replayable, real-time event streams. It decouples producers and consumers, making it ideal for asynchronous agent communication and shared context.
  • Apache Flink provides low-latency, fault-tolerant stream processing. It enables real-time analytics, contextual enrichment, complex event processing, and now—agent execution.

Agentic AI with Apache Kafka as Event Broker

Agentic AI requires real-time data ingestion to ensure agents can react instantly to changes as they happen. It also depends on stateful processing to maintain memory across interactions and decision points. Coordination between agents is essential so that tasks can be delegated, results can be shared, and workflows can be composed dynamically. Finally, seamless integration with tools, models, and APIs allows agents to gather context, take action, and extend their capabilities within complex enterprise environments.

Apache Flink provides all of these natively. Instead of stitching together multiple tools, Flink can host the entire agentic workflow:

  • Ingest event streams from Kafka
  • Enrich and process data with Flink’s Table and DataStream APIs
  • Trigger LLMs or external tools via UDFs
  • Maintain agent memory with Flink state
  • Enable agent-to-agent messaging using Kafka or Flink’s internal mechanisms

A2A and MCP define how autonomous agents communicate and access context. With Kafka as the event broker and Flink as the stream processor, enterprises can build scalable, decoupled, and context-aware Agentic AI systems.

A2A and MCP with Apache Kafka as Event Broker and Flink as Stream Processor for Truly Decoupled Scalable and Contextual Agentic AI

Agents can still communicate point-to-point with each other via protocols like A2A. But their information must often reach many systems, and multiple point-to-point links create brittle architectures. Kafka solves this by acting as the scalable, decoupled event backbone.

More details: Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker.

While this is already possible to implement today, Flink still needs more built-in support for standard protocols like MCP and A2A, along with native AI and ML capabilities, to fully meet the demands of enterprise-grade agentic systems.

FLIP-531: Initiate Flink Agents as a new Sub-Project is an exciting “Flink Improvement Proposal” led by Xintong Song, Sean Falconer, and Chris Meyers. It introduces a native framework for building and running AI agents within Flink.

FLIP-531 - Initiate Flink Agents as a new Sub-Project
Source: Apache

Key Objectives

  • Provide an execution framework for event-driven, long-running agents
  • Integrate with LLMs, tools, and context providers via MCP
  • Support agent-to-agent communication (A2A)
  • Leverage Flink’s state management as agent memory
  • Enable replayability for testing and auditing
  • Offer familiar Java, Python, and SQL APIs for agent development

With FLIP-531, Apache Flink goes beyond orchestration and data preparation in Agentic AI environments. It now provides a native runtime to build, run, and manage autonomous AI agents at scale.

Apache Flink Supporting Agentic AI via MCP and A2A via FLIP-531

Developer Experience

Flink Agents will extend familiar Flink constructs. Developers can define agents using endpoints coming soon to Flink’s  Table API or DataStream API. They can connect to endpoints of large language models (LLM), register models, call tools, and manage context—all from within Flink.

Sample APIs are already available for Java, Python (PyFlink), and SQL. These include support for:

  • Agent workflows with tools and prompts
  • UDF-based tool invocation
  • Integration with MCP and external model providers
  • Stateful agent logic and multi-step workflows

Roadmap Milestones

The Flink Agents project is moving fast with a clear roadmap focused on rapid delivery and community-driven development:

  • Q2 2025: MVP design finalized
  • Q3 2025: MVP with model support, replayability, and tool invocation
  • Q4 2025: Multi-agent communication and example agents
  • Late 2025: First formal release and community expansion

The team is prioritizing execution and fast iteration, with a GitHub-based development model and lightweight governance to accelerate innovation.

The most impactful agents in the enterprise aren’t chatbots or assistants waiting for user input. They are always-on components, embedded in infrastructure, continuously observing and acting on real-time business events.

Apache Flink Agents are built for this model. Instead of waiting for a request, these agents run asynchronously as part of an event-driven architecture. They monitor streams of data, maintain memory, and trigger actions automatically—similar to observability agents, but for decision-making and automation.

This always-on approach is critical for modern use cases. Enterprises can’t afford delays in fraud detection, equipment failures, customer engagement, or supply chain response. Agents must act instantly—based on the data flowing through the system—not hours later via batch processing or human intervention.

Apache Flink provides the ideal foundation. Its low-latency, stateful stream processing enables agents to:

  • Observe and react to real-time signals from events, APIs, databases or external SaaS reqests
  • Maintain state across workflows and business processes
  • Collaborate asynchronously
  • Trigger tools, request-response APIs, or downstream actions

These aren’t chatbot wrappers—they’re autonomous services embedded in production. They function as part of your business nervous system, adapting in real time to changing conditions and continuously improving outcomes.

This architecture is what enterprises truly need: automation that is fast, reliable, and context-aware. It reduces time to detect and resolve issues, improves SLA adherence, and enables proactive decisions across the organization.

Always-on, embedded agents are the future of AI in business. Apache Flink is ready to power them. Let’s explore a few excellent use case examples across different industries in the next section.

Agentic AI use cases are emerging across industries. These systems demand real-time responsiveness, contextual intelligence, and full autonomy. Traditional REST APIs, manual orchestration, and batch jobs fall short in these environments. Instead, enterprises need infrastructure that is continuous, stateful, event-driven, and always-on (as discussed explicitly in the above section).

Apache Kafka and Apache Flink already power the data backbone of many digital-native organizations. For example:

With Flink Agents, developers can build:

  • Always-on and ReAct-style agents with structured workflows
  • Retrieval-augmented agents with semantic search
  • Long-lived stateful agents with memory
  • Fully autonomous systems with tool and API access

Here are some examples where Flink Agents will shine:

Finance

  • Fraud detection and risk scoring
  • Compliance monitoring
  • Adaptive trading systems with feedback loops

Manufacturing

  • Predictive maintenance for industrial equipment
  • Smart factory optimization
  • Supply chain managing demand and inventory

Retail

  • Real-time product tagging and catalog enrichment
  • Personalized promotion based on customer behavior
  • Inventory rebalancing and logistics agents

Healthcare

  • Patient monitoring and alerting
  • Claims processing and document triage
  • Compliance audit

Telecommunications

  • Self-healing networks
  • Customer support automation with feedback loops
  • Dynamic QoS optimization

Gaming

  • Adaptive AI opponents that respond to player behavior.
  • Dynamic content generation for evolving game environments.
  • Real-time moderation for abuse and cheating detection.

Public Sector

  • Traffic and energy optimization in a smart city
  • Automated citizen service assistants
  • Public safety agents for emergency detection and response

The Future of Agentic AI is Event-Driven

The rise of Agentic AI means a shift in infrastructure priorities.

It’s not enough to invest in model quality or prompt engineering. Enterprises must also modernize their data and execution layer.

Point-to-point communication between agents is fine for direct interaction, but at scale the real value comes from an event-driven backbone like Kafka and Flink that ensures information reliably reaches all required systems.

Flink Agents offers a production-grade, enterprise-ready foundation for agentic systems. It turns brittle demos into reliable applications by providing:

  • Consistent real-time data via Kafka
  • Stateful, fault-tolerant execution via Flink
  • Standardized protocols via MCP and A2A
  • Developer productivity via familiar APIs

This combination reduces time-to-market, increases system resilience, and lowers operational costs. It gives developers and architects the tools to build agents like real software. Scalable, testable, and observable.

This shift is NOT a replacement for data lakes, lakehouses, or AI platforms. It complements them by enabling real-time, event-driven execution alongside batch and analytical workloads.

To explore how streaming and AI platforms work together in practice, check out my Confluent + Databricks blog series, which highlights the value of combining Kafka, Flink, and lakehouse architectures for modern AI use cases.

The future of Agentic AI is event-driven. Apache Flink is ready to power it.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

Dont‘ miss my next post. Subscribe!

We don’t spam! Read our privacy policy for more info.
If you have issues with the registration, please try a private browser tab / incognito mode. If it doesn't help, write me: kontakt@kai-waehner.de

You May Also Like
Read More

Apache Kafka + Vector Database + LLM = Real-Time GenAI

Generative AI (GenAI) enables advanced AI use cases and innovation but also changes how the enterprise architecture looks like. Large Language Models (LLM), Vector Databases, and Retrieval Augmentation Generation (RAG) require new data integration patterns. Data streaming with Apache Kafka and Apache Flink processes incoming data sets in real-time at scale, connects various platforms, and enables decoupled data products.
Read More
Read More

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

This post shares a slide deck and video recording of the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB).
Read More