Telecom OSS Modernization with Data Streaming: From Legacy Burden to Cloud-Native Agility

Telecom OSS Modernization with Data Streaming using Apache Kafka and Flink for Cloud-Native BSS and OTT Integration
OSS is critical for service delivery in telecom, yet legacy platforms have become rigid and costly. They slow innovation just as 5G, cloud native networks and OTT partnerships demand agility. This article explores how a data streaming platform with Apache Kafka and Flink helps telcos modernize OSS step by step, cut costs, accelerate time to market and turn OSS into the real time backbone for AI and event driven operations.

Telecom networks are under pressure to deliver more services, faster, and with higher reliability. Yet many operators remain held back by legacy Operational Support Systems (OSS) that were designed for another era. These platforms, once the backbone of service delivery and assurance, now often slow down innovation, block real-time automation, and drive up costs. At the same time, Business Support Systems (BSS) and new Over-the-Top (OTT) services demand seamless integration with OSS to meet customer expectations. This article explores how a data streaming platform powered by Apache Kafka and Flink transforms OSS into a cloud-native, real-time nervous system of the telco to enable modernization step by step while supporting AI, multi-cloud, and event-driven operations.

Telecom OSS Modernization with Data Streaming using Apache Kafka and Flink for Cloud-Native BSS and OTT Integration

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including a dedicated chapter about the telecom sector.

OSS as the Nervous System of Telecom Operations

Operational Support Systems (OSS) have always been the backbone of telecom operations. They orchestrate provisioning, assurance, inventory, and workforce management. Without them, no service is delivered, monitored, or billed correctly. OSS ensures that customer promises are fulfilled in the network, often in near real time.OSS is the Nervous System of the Telco for BSS and OTT

In the modern telco stack, OSS works hand in hand with Business Support Systems (BSS). The BSS defines the “what”: the product catalog, pricing, eligibility, and customer orders. The OSS executes the “how”: activating those services in the network, assuring their performance, and managing the underlying resources. This separation of concerns is essential to avoid overloading one layer with responsibilities that belong to the other.

The rise of OTT (Over-the-Top) services adds another dimension. OTT services are third party digital offerings such as video, messaging or cloud apps that telcos must provision and assure alongside their own network services. Customers expect seamless activation of third-party video, messaging, or cloud applications alongside traditional telco services. That requires tight integration between BSS, which captures the commercial offer, and OSS, which ensures that both network and partner services are provisioned, assured, and synchronized.

OSS is therefore more than a supporting player. OSS is the nervous system of the telco, bridging customer intent in the BSS with fulfillment in the network and with external OTT ecosystems.

The Pain of Legacy OSS

Yet, while OSS should enable innovation, it often slows it down. Legacy OSS, built for a world of static services and manual processes, is increasingly misaligned with the demands of 5G, edge computing, and cloud-native applications.

As Gustavo Mársico outlined in his article “A Strategic Approach to Modernize Telco OSS”, the legacy challenge is not that OSS was built “wrong.” It was simply built for another era. Over the years, systems piled up like bricks, making the environment fragile and expensive:

  • High OPEX from maintaining monolithic platforms and professional services-heavy integrations.
  • Slow time-to-market, where introducing a new service can take months because provisioning logic is buried in outdated workflows.
  • Rigid, batch-driven architecture that cannot handle real-time telemetry, intent-based orchestration, or closed-loop automation.
  • Vendor lock-in, as proprietary integrations make OSS migration risky and costly.

The result is an OSS estate that acts as an anchor rather than a launchpad. Gustavo’s article was the inspiration for this article to explore in more detail how data streaming helps building cloud-native OSS infrastructure.

Data Streaming as the Cloud-Native Middleware for OSS Transformation

Data streaming becomes the backbone of event driven telecom architectures, Many operators are only starting this journey. This is where data streaming with the de facto standards Apache Kafka and Flink come into play: Kafka Connectors integrate billing systems, compliance platforms, and legacy OSS/BSS. Real-time telemetry from the radio access network, IoT gateways, and 5G core functions flows into Kafka Topics. Flink processes these events for fraud detection, quality monitoring, subscriber behavior insights, and real-time alerts.

Event-driven Architecture with Data Streaming using Apache Kafka and Flink in Telco Industry

Data streaming introduces a new integration fabric that is:

  • Cloud-native and open: Kafka and Flink run in hybrid and multi-cloud environments, at the edge, or on-premises, avoiding vendor lock-in.
  • Real-time and scalable: OSS can ingest, process, and act on events from networks, devices, and applications as they happen.
  • Democratized and reusable: Data products can be shared across OSS, BSS, and beyond, creating a common data backbone.
  • Cost-efficient: By decoupling legacy systems with event-driven bridges, modernization can follow a strangler fix pattern – replacing old systems step by step, without a “big bang.”

With a data streaming platform, OSS no longer needs to rely on fragile point-to-point integrations. Instead, every system (assurance, inventory, workforce, catalog, orchestration) can publish and consume streams of events. This makes OSS a dynamic hub, ready for continuous change.

Readers seeking an overview of the Data Streaming Platform market should review “The Data Streaming Landscape 2025”:

The Data Streaming Landscape 2025 with Kafka Flink Confluent Amazon MSK Cloudera Event Hubs and Other Platforms

Data Streaming and Business Process Management: Partners, Not Rivals

In telecom, Business Process Management (BPM) provides workflow orchestration for use cases such as service activation, order management, and assurance. Data streaming BPM address different needs. Used together, they combine real-time data flow with structured process orchestration:

  • Business process-led orchestration with tools around standards like BPMN gives structure. It defines workflows, decomposes orders, and drives fulfillment.
  • Data streaming with Kafka and Flink provides the event-driven nervous system. Every event, from network telemetry to customer updates, flows reliably across OSS, BSS, and partners.

OSS modernization needs both. BPM tools and workflow engines leverage BPMN to provide the model by visualizing and structuring business processes such as order decomposition and fulfillment. Kafka and Flink ensure those processes actually run in an event driven world. This keeps clear separation of concerns: BSS manages offers and customer intent, while OSS executes activation and assurance with speed and accuracy.

Data Streaming as Workflow Orchestration Engine

When Kafka can be the business process engine: Skip a BPM tool if the workflow has no human steps, is code first, and fits an event/state machine pattern. Use Kafka Topics for persistence and ordering, compacted topics for current state, replay for recovery, and the Saga pattern for multi step consistency.

Here is an example where data streaming with Kafka and Flink is used as stateful workflow engine to provide observability across multiple sites to calculate billing, adjust prices, incorporate late arriving information, etc.:

Observability Across Multiple Sites in Telco Infrastructure with Kafka and Flink as Stateful Workflow Engine

More details about data streaming for managing stateful business processes: Apache Kafka as Workflow and Orchestration Engine.

For long running transactions: Bring in a durable execution engine like Temporal or Restate. They add built in durability, retries, timers, and compensation for machine to machine workflows. Integrate them with Kafka to run reliable, multi step activations and jeopardy handling at telco scale. More details: The Rise of the Durable Execution Engine (Temporal, Restate) in an Event-driven Architecture (Apache Kafka).

Proof Point: EchoStar’s Dish Wireless and the Event-Driven OSS/BSS Merger

As part of EchoStar, Dish Wireless built its greenfield 5G network in the United States with an event-driven architecture at its core. By using Kafka as the central nervous system, Dish merged OSS and BSS into a cloud-native stack, orchestrating everything from provisioning to assurance in real time.

While this is not a brownfield modernization but a greenfield build, it is still highly relevant for the topic of Telecom OSS Modernization with Data Streaming. Dish demonstrates the target state that incumbents can aim for: a streaming-first, cloud-native OSS/BSS stack. Dish’s approach offers a blueprint for operators with legacy estates, showing how data streaming can gradually transform existing OSS from rigid and batch-driven to agile and event-driven.

DISH Wireless Telecom 5G Platform powered by Confluent using Apache Kafka
Source: Dish Network

Instead of siloed legacy workflows, Dish runs on a streaming-first foundation. OSS is no longer a passive afterthought but a proactive engine of customer experience. This demonstrates the power of starting fresh – but also provides a blueprint for incumbents to gradually evolve legacy OSS through the same event-driven model.

More details, including an interview with Dish, in the following article: How Apache Kafka helps Dish Wireless building cloud-native 5G Telco Infrastructure.

AI in Modern Telecom OSS: From Reactive to Predictive and Agentic

OSS modernization is not only about speed; it is about intelligence. With Kafka and Flink feeding real-time events into AI systems, telcos can evolve from reactive fault management to predictive and agent-driven operations:

  • Predictive AI: Detecting anomalies in telemetry data before service degradation occurs.
  • Generative AI: Assisting operations teams by summarizing incidents or suggesting workflows.
  • Agentic AI: Acting autonomously on OSS events, triggering closed-loop automation via the event-driven streaming backbone.

The following chart illustrates Agentic AI powered by data streaming across OSS and BSS layers:

Agentic AI using A2A MCP and Data Streaming with Kafka and Flink in the Telco OSS BSS OTT Infrastructure

A subscriber agent expresses intent, which the OSS/BSS agent interprets and executes by orchestrating network services, billing, and compliance. Events flow through Kafka Topics, where Flink processes them for assurance, fraud detection, and quality monitoring across the telco stack.

Data streaming ensures AI has the fuel it needs: complete, real-time, contextual data across OSS and BSS.

Business Outcomes: Step-by-Step OSS Modernization with Streaming

Adopting data streaming into OSS modernization unlocks measurable outcomes:

  • Reduced OPEX: By decoupling legacy systems and reducing custom integration costs.
  • Faster time-to-market: New services can be launched in weeks, not months.
  • Agility at scale: Deploy anywhere — on the edge, in private data centers, or across multiple clouds.
  • Data democratization: OSS no longer a silo, but a platform that shares data products with BSS, CRM, assurance, and analytics.
  • Future-proofing with AI: A foundation ready for predictive and intent-based operations.

The strangler fix pattern makes this achievable. Legacy OSS modules can be replaced incrementally:

Strangler Fig Pattern to Integrate, Migrate, Replace

Kafka and Flink act as the bridge — keeping old systems alive while new cloud-native components are introduced. Over time, the telco moves from rigid batch to agile streaming, without operational disruption.

Turning OSS into a Growth Engine

Legacy OSS has become a bottleneck in a world where telcos must innovate faster and deliver more reliable services. Data streaming with Apache Kafka and Flink provides the foundation to modernize step by step: reducing OPEX, cutting time-to-market, and enabling real-time automation.

By bridging OSS, BSS, and OTT with an event-driven backbone, telcos transform OSS from a cost center into a growth engine. The result is an agile, cloud-native nervous system that powers predictive operations today and prepares for agent-driven automation tomorrow.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including a dedicated chapter about the telecom sector.

Dont‘ miss my next post. Subscribe!

We don’t spam! Read our privacy policy for more info.
If you have issues with the registration, please try a private browser tab / incognito mode. If it doesn't help, write me: kontakt@kai-waehner.de

You May Also Like
How to do Error Handling in Data Streaming
Read More

Error Handling via Dead Letter Queue in Apache Kafka

Recognizing and handling errors is essential for any reliable data streaming pipeline. This blog post explores best practices for implementing error handling using a Dead Letter Queue in Apache Kafka infrastructure. The options include a custom implementation, Kafka Streams, Kafka Connect, the Spring framework, and the Parallel Consumer. Real-world case studies show how Uber, CrowdStrike, Santander Bank, and Robinhood build reliable real-time error handling at an extreme scale.
Read More
Request Response Data Exchange with Apache Kafka vs CQRS and Event Sourcing
Read More

When to use Request-Response with Apache Kafka?

How can I do request-response communication with Apache Kafka? That’s one of the most common questions I get regularly. This blog post explores when (not) to use this message exchange pattern, the differences between synchronous and asynchronous communication, the pros and cons compared to CQRS and event sourcing, and how to implement request-response within the data streaming infrastructure.
Read More