How to Build a Real-Time Advertising Platform with Apache Kafka and Flink

Real-Time Advertising Platform with Apache Kafka and Flink
An advertising platform requires real-time capabilities to provide dynamic targeting, ad personalization, ad fraud detection, budget allocation, and event-driven marketing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables context-specific advertising at any scale. Real-world success stories from Pinterest, Uber, Unity, buzzwil, and TV-Insight show different solutions and architectures for serving ads in marketing campaigns, embedded into mobile apps, and as SaaS software products.

An advertising platform requires real-time capabilities to provide dynamic targeting, ad personalization, ad fraud detection, budget allocation, and event-driven marketing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables context-specific advertising at any scale. Real-world success stories from Pinterest, Uber, Reddit, Unity, buzzwil, and TV-Insight show different solutions and architectures for serving ads in marketing campaigns, embedded into mobile apps, and as SaaS software products.

Real-Time Advertising Platform with Apache Kafka and Flink

What is a digital advertising platform?

An advertising (ads) platform is a digital system or service that allows businesses and advertisers to create, manage, and optimize their advertising campaigns across various channels. These platforms provide tools and features to target specific audiences, allocate budgets, track performance, and measure the effectiveness of advertising efforts.

Digital Marketing

Examples of advertising platforms include Google Ads, Facebook Ads, and programmatic advertising platforms that automate ad placement across websites and apps. These platforms play a crucial role in digital marketing, enabling advertisers to reach their target audience online and achieve their marketing objectives.

Challenges of an advertising platform

  1. Competition: Advertisers often face fierce competition for ad space. This can lead to higher costs and the need for effective targeting strategies.
  2. Ad Fraud: Digital advertising is susceptible to various forms of ad fraud, including click and impression fraud. Advertisers need to implement measures to protect their campaigns from fraudulent activity.
  3. Data Privacy Regulations: Stricter data privacy regulations, such as GDPR and CCPA, impact how advertisers collect and use customer data for targeting. Advertisers must comply with these regulations to avoid legal consequences.
  4. Ad Quality and Relevance: Ensuring that ads are of high quality and relevance to the target audience is essential. Poorly designed or irrelevant ads can lead to wasted ad spend and a negative user experience.
  5. Ad Fatigue: Showing the same ads repeatedly to users can lead to ad fatigue, causing users to ignore or block the ads. Advertisers need to manage frequency and creative refresh to combat this.
  6. Measurement and Attribution: Accurately measuring the impact of advertising campaigns and attributing conversions to specific ads or channels can be challenging, especially in a multi-channel marketing environment.
  7. Platform Changes: Advertising platforms frequently update their algorithms and policies. Advertisers need to adapt to these changes and stay informed to maintain campaign effectiveness.
  8. Budget Management: Effective budget allocation across various channels and campaigns can be complex. Balancing the budget to achieve the best results is an ongoing challenge.
  9. Creative Variation: Creating and testing different ad creatives to find the most effective ones requires ongoing effort and creativity.
  10. Ad Placement: Choosing the right placements on websites, apps, and social media is crucial. Advertisers must consider where their target audience spends their time.

Navigating these challenges requires a data-driven platform and a deep understanding of the digital advertising landscape, constant monitoring, and optimization.

Why does an ads platform need to be real-time?

An advertising platform should be real-time for several important reasons:

  1. Timely Campaign Adjustments: Real-time data allows advertisers to adjust their advertising campaigns promptly. They can respond quickly to market conditions, user behavior, or campaign performance changes. For example, if a particular ad is not performing well or if a sudden surge in user interest occurs, advertisers can pause or modify their campaigns immediately to optimize results.
  2. Dynamic Targeting: Real-time data enables dynamic and precise targeting. Advertisers can adjust their targeting criteria on the fly based on real-time user actions and data, ensuring that ads are delivered to the most relevant audience at the right moment.
  3. Optimized Bidding: Real-time bidding (RTB) is a crucial component of programmatic advertising. Advertisers can bid on ad inventory in real-time based on user data, maximizing their chances of winning ad placements at the best prices.
  4. Ad Personalization: Real-time data allows for highly personalized ad experiences. Advertisers can serve ads tailored to individual user preferences and behavior, increasing the likelihood of engagement and conversion.
  5. Ad Fraud Detection: Real-time monitoring and analysis of ad traffic can help detect and prevent ad fraud as it occurs. Ad platforms can identify suspicious patterns and take action to mitigate fraud, protecting advertisers’ investments.
  6. Budget Allocation: Real-time data informs budget allocation decisions. Advertisers can allocate more budget to high-performing campaigns and reduce spending on underperforming ones in real-time, ensuring efficient use of resources.
  7. Competitive Advantage: Real-time capabilities can provide a significant advantage in a competitive advertising landscape. Advertisers who can react swiftly to market changes and trends can capture opportunities that slower competitors might miss.
  8. User Engagement: Real-time advertising can engage users at the most opportune moments. For example, an e-commerce platform can display retargeting ads to users who abandoned their shopping carts in real-time, encouraging them to complete their purchases.
  9. Event-Driven Marketing: Real-time capabilities enable event-driven marketing. Advertisers can trigger ads based on specific user actions or external events, such as holidays or significant news events, making their campaigns more relevant and timely.
  10. Measurement and Attribution: Real-time data allows for immediate measurement of ad performance and attribution of conversions. Advertisers can track which ads and channels drive results and adjust their strategies accordingly.
  11. User Experience: Real-time ads can enhance the user experience by delivering current and contextually relevant content and offers. This can improve user engagement and satisfaction.

In today’s fast-paced digital advertising landscape, where user behavior and market conditions can change rapidly, real-time capabilities are essential for advertisers to stay competitive, make data-driven decisions, and maximize the impact of their advertising campaigns. Real-time advertising platforms empower advertisers to be more agile, responsive, and effective in reaching their target audience.

How does Apache Kafka help build an advertising platform?

Apache Kafka combines real-time messaging at any scale with true decoupling through its event store. The data streaming platform collects data, correlates real-time and historical events with stream processing, and shares created information with downstream consumers.

Data Streaming with Apache Kafka and Apache Flink for Advertisement Platform and Ads

One of the most underestimated capabilities is the out-of-the-box capability of Apache Kafka to ensure data consistency across real-time and non-real-time systems. The heart of the enterprise architecture is real-time, scalable, and reliable. But any near real-time, batch or request-response communication can produce or consume at its own pace with its own API or programming language.

Apache Flink is ideal for data correlation. No matter if the task is data integration (aka streaming ETL) or advanced stateful business and application logic. Apache Kafka and Apache Flink are a match made in heaven for data streaming.

Real Time Bidding and Fraud Detection Advertisement Platform with Apache Kafka and Flink

Real-world success stories show how data streaming with Kafka and Flink helps build a next-generation advertising platform. These technologies solve the abovementioned challenges to provide real-time and consistent information across all applications.

Advertising platforms are either directly embedded into customer-facing applications or built as software or SaaS products that other companies buy and leverage.

The following success stories explore ad platforms built with data streaming:

  • Pinterest: Image-sharing and natural engagement with ads (Kafka Streams).
  • Buzzvil: Lock screen advertisement for smartphones (Kafka and Confluent Cloud).
  • TV-Insight: Live decisions of regular TV ad blocks (Kafka, Flink, and Confluent Cloud).
  • Unity: Monetization network for gaming (Kafka and Confluent).
  • Uber Eats: Ads in the mobile food delivery app (Kafka, Flink, Pinot).
  • Reddit: Ads placing including real-time budget planning without over-delivery or under-delivery (Kafka, Flink, Druid).

Pinterest – Social media natural engagement with ads

Pinterest is an American image-sharing and social media service designed to enable the saving and discovery of information (specifically “ideas”) like recipes, home, style, motivation, and inspiration on the internet.

The content of ads is very close to the actual content. Naturally, users engage with the content and ads:

Pinterest Mobile App Home Feed Search and Ads

Pinterest talked about its Kafka-powered advertising platform for the first time in 2018 at a Kafka Summit. The Ad platform leverages Kafka for the data ingestion pipeline and stream processing with Kafka Streams to enable a real-time feedback loop. Recommendation engine (via machine learning), budgeting, and new ads exploration are some of the critical use cases.

Pinterest Ads Engine built with Kafka Streams

The continuous feedback loop enables real-time updates in seconds. Stateful stream processing with Kafka Streams correlates events from users, ads, budget, and other interfaces to decide on ads serving.

Stateful Stream Processing with Kafka Streams at Pinterest

Real-time (even at an extreme scale) is critical for Pinterest. When a new ad is created, the ads platform does not know about the user engagement with this ad on different surfaces. The faster the ads platform knows about the performance of the newly created ad, the better value can be provided to the user.

There is a balance between exploiting good ads and exploring new ads. The solution was adding a boosting factor to new ads to increase the probability of winning an auction.

Listen to the talk from Pinterest for more details, best practices, and lessons learned in developing and operating a scalable, real-time advertising platform with stateful stream processing using Kafka Streams.

Buzzvil – Lock screen advertising platform

Buzzvil provides a lock screen advertising platform that connects partners and advertisers:

buzzvil – AdTech for Publishers and Advertisers

Buzvill’s advertising platform is data-driven and built with Apache Kafka in the cloud. It optimizes ad spending through automation, behavioral analytics, audience targeting, rewards programs, and more. Data streaming enables a single source of truth for real-time ad transaction data.



buzzvil - Advertisement Platform built with Apache Kafka in Confluent Cloud

They built the ad platform with Apache Kafka in a fully managed Confluent Cloud to focus on business logic and faster time-to-market.

Data streaming with Apache Kafka enables 18x faster data updates for ad bidding. Confluent Cloud saves 20-30% infrastructure cost.

TV-Insight – Live decisions of regular TV ad blocks

TV-Insight developed a solution to help Joint Industry Committees (JIC), Broadcasters, and Advertisers to improve and evolve the data quality of existing TV measurement panels using return path data of connected devices.

The problem of monitoring classical TV

The essential difference between TV-Insight and all other “panel boosting” initiatives and products is that TV-Insight uses real-time data. Therefore, it can provide a live TV reach for live decisions of regular TV ad blocks.

TVI Insight Live Reach Prediction in Real-Time

The TV-Insight application collects data from the Smart TV or Set-Top Box via GDPR compliance device tracking. The live extrapolation enables advertising optimization:

TV Insight Enterprise Architecture for Real-Time Ads

The technical architecture and data pipeline look like the following. Apache Kafka is the real-time messaging platform and event store. Apache Kafka’s stateful stream processing correlates events to calculate real-time ad serving in the advertising platform.

Apache Kafka and Apache Flink for Advertisement Platform at TV Insight

Unity Ads – Monetization network for gaming

Unity is a cross-platform game engine developed by Unity Technologies. The engine has since been gradually extended to support a variety of desktop, mobile, console, and virtual reality platforms. The engine can create three-dimensional (3D) and two-dimensional (2D) games, interactive simulations, and other experiences. Industries outside video gaming have adopted the engine, such as film, automotive, architecture, engineering, and construction.

In 2019, Unity apps and content were installed 33 billion times, reaching 3 billion devices worldwide.

The 3D development platform and game engine is not the only product of Unity Technologies. Unity Ads is one of the largest monetization networks in the world:

  • Reward players for watching ads
  • Incorporate banner ads
  • Incorporate Augmented Reality (AR) ads
  • Playable ads
  • Cross-Promotions
  • IAPs (in-app purchases)

Unity is a data-driven company:

  • Average about half a million events per second
  • Handles millions of dollars in monetary transactions
  • Data infrastructure based on Apache Kafka

single data pipeline provides the foundational infrastructure for analytics, R&D, monetization, cloud services, etc., for real-time and batch processing leveraging Apache Kafka:

  • Real-time monetization network
  • Feed machine learning models in real-time
  • Data lake went from two-day latency down to 15 minutes

If you want to learn about Unity’s success story of migrating this platform from self-managed Kafka to the cloud, read the post on the Confluent Blog: “How Unity uses Confluent for real-time event streaming at scale“.

Uber Eats – Ads embedded into food delivery app

Uber provides an exciting food delivery app capability: Uber Eats allows ads embedding. With this ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more.

Uber wrote an excellent article that focuses on how they leveraged open source technology to build Uber’s first near real-time exactly-once events processing system. Uber leverages Kafka, Flink, and Pinot for its advertising platform. This perfectly combines the right technologies.

Uber Eats Architecture of the Advertisement Platform using Kafka Flink and Pinot

As Uber writes: “With every ad served, there are corresponding events per user (impressions, clicks). The responsibility of the ad events processing system is to manage the flow of events, cleanse them, aggregate clicks and impressions, attribute them to orders, and provide this data in an accessible format for reporting and analytics as well as dependent clients (e.g., other ads systems).”

While speed, scale, and reliability are always crucial for such a system, I want to emphasize the part about accuracy and why exactly-once processing with Kafka and Flink was a critical piece of the architecture.

The Aggregation Job implemented with Apache Flink does a lot of the heavy lifting: Data cleansing, persistence for order attribution, aggregation, and record UUID generation.

Uber Aggregation Job with Apache Flink

Exactly-once with Kafka and Flink is very important, as their blog post explains: “Uber can’t afford to overcount events. Double counting clicks results in overcharging advertisers and overreporting the success of ads. Both being poor customer experiences, this requires processing events exactly-once. Uber is the marketplace in which ads are being served, therefore our ad attribution must be 100% accurate.”

Reddit – Ads Placing with Budget Planning avoiding Over- or Under-Delivery

Reddit is an American social news aggregation, content rating, and discussion website. Registered users submit content to the site such as links, text posts, images, and videos, which other members then vote up or down.

Reddit’s ads platform allows advertisers to create ad campaigns and set both daily and lifetime budgets for a campaign. Here is Reddit’s decision tree to place advertisements:

Reddit Decision Tree to Place Advertisements

The data pipeline leverages Kafka, Flink, and Druid to analyze campaign budgets in real-time. The platform leverages real-time plus historical user activity data to decide which ad to place. All within 30 milliseconds to avoid over-delivery and under-delivery (budget spent too quickly / slowly).

Reddit Ads Serving Platform using Apache Kafka, Flink and Druid

Watch Reddit’s talk from Druid Summit “Low Latency Real-Time Ads Pacing Queries” to learn more about their ads platform and use cases.

Real-world success stories from Pinterest, Uber, Unity, buzzwil, and TV-Insight showed how to embed real-time advertising into your applications or build a dedicated marketing product.

Data streaming with Apache Kafka and Apache Flink enables context-specific advertising at scale in real time. The cloud makes it possible to focus on business logic and faster time-to-market with a fully managed data streaming platform.

How do you leverage data streaming in marketing and advertising use cases? Do you deploy at the edge, in the cloud, or both? Or do you integrate 3rd party marketing platforms into your advertising platforms? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

Dont‘ miss my next post. Subscribe!

We don’t spam! Read our privacy policy for more info.
If you have issues with the registration, please try a private browser tab / incognito mode. If it doesn't help, write me:

Leave a Reply
You May Also Like
How to do Error Handling in Data Streaming
Read More

Error Handling via Dead Letter Queue in Apache Kafka

Recognizing and handling errors is essential for any reliable data streaming pipeline. This blog post explores best practices for implementing error handling using a Dead Letter Queue in Apache Kafka infrastructure. The options include a custom implementation, Kafka Streams, Kafka Connect, the Spring framework, and the Parallel Consumer. Real-world case studies show how Uber, CrowdStrike, Santander Bank, and Robinhood build reliable real-time error handling at an extreme scale.
Read More