FlinkSQL

Pinterest Fights Spam and Abuse with Kafka and Flink: A Deep Dive into the Guardian Rules Engine

Spam, abuse, and fraud are major threats to every social media platform. Attackers constantly evolve, making detection difficult and slow batch systems ineffective. This blog post explores how Pinterest tackled this challenge by building a real-time detection platform called Guardian rules engine – powered by data streaming with Apache Kafka and Apache Flink. It highlights how data streaming helps protect hundreds of millions of users through faster response, scalable architecture, and smarter rule enforcement.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

Social Networks: Spam and Abuse Challenges

Spam, abuse, and fraud are serious problems in every social network. Attackers create fake accounts, post misleading content, and try to exploit users or the platform itself. These actions hurt user trust, reduce engagement, and impact the business model.

Detecting abuse is not simple. Attack patterns change fast. Attackers adapt quickly to detection systems. Relying only on batch-based systems means reacting too late. Speed, flexibility, and scalability are essential.

Pinterest, like other large-scale platforms, needed a better way to protect its users. The company built a real-time detection platform Guardian using Apache Kafka and Apache Flink. This article explores the talk FlinkSQL-Powered Asynchronous Data Processing in Pinterest’s Rule Engine Platform by Xinyu Chen and Abhishek Tiwari, presented at the Current 2025 Data Streaming Conference.

To stay ahead of attackers, most social networks moved to an event-driven architecture powered by data streaming. Apache Kafka provides the foundation to collect and transport user activity and content events in real time. Apache Flink adds real-time stream processing for detecting patterns, running business logic, and taking instant action.

This streaming approach means no waiting for batch jobs. Rules execute as soon as events occur. Results are faster. Risks are lower. This is a major upgrade compared to traditional abuse detection workflows.

Pinterest’s Rules Engine Guardian: A Platform to Protect Users from Spam, Abuse, and Fraud

Pinterest’s Guardian platform detects and stops spam and abuse in real time. It’s built on Apache Kafka and Flink. The system protects hundreds of millions of users and supports trust and safety at massive scale.

What Is Pinterest?

Pinterest is a social network and visual discovery engine used by hundreds of millions of people around the world. Users—called Pinners—save and explore content related to food, fashion, travel, home design, and more. The platform earns revenue through advertising and commerce partnerships.

Source: Pinterest (Example Searching for Video Game Room Ideas)

Trust and safety are critical for Pinterest’s business. Spam and abuse reduce content quality and can drive users away. The Guardian platform is key to keeping Pinterest clean, safe, and enjoyable for everyone.

Pinterest: Operating Apache Kafka at Planet Scale

Pinterest is one of the most impressive examples of a large-scale data streaming platform in action. Its use of Apache Kafka is not just massive, it is foundational. The numbers are staggering, and they highlight what’s possible when an organization embraces real-time data streaming across the entire business.

This story began years ago, and the scale Pinterest achieved by 2022 was already ahead of what most companies even consider:

  • ~1 Exabyte of data in AWS S3
  • ~4 Trillion messages per day flowing through Kafka
  • ~80 GB/s throughput
  • 50+ Kafka clusters in production
  • ~4,000 Kafka brokers
  • ~3,000 Kafka topics
  • Over 500,000 partitions
Source: Keynote at Current 2022 with Pinterest

These are not just vanity metrics. Each figure represents real-time business processes, machine learning models, user activity tracking, content recommendations, trust and safety pipelines, and more—all running on Kafka.

Guardian is the rules engine Pinterest built for various use cases. It serves multiple business roles:

  • Subject Matter Experts (SMEs) write and test rules based on abuse patterns
  • Integrity Analysts monitor the system and identify new threats
  • Product Managers test policy changes and impact
  • Engineers extend the system with integrations and logic

Guardian provides a shared platform with a SQL-like interface, making it simple to define rules that detect abuse patterns. Each rule is a query that can trigger an action (like softban, deactivate, or monitor).

Source: Pinterest

Instead of writing complex Python scripts or manually switching between systems, users write one rule, test it, and deploy it—all within Guardian leveraging GSQL (Guardian-SQL).

From Monolith to Event-Driven with FlinkSQL

The original abuse detection system at Pinterest (internally called Stingray) was built in Python. It worked but didn’t scale. Rules were hard to manage. The process was slow and error-prone.

Guardian introduced a modern, event-driven architecture. The key upgrade was using FlinkSQL for processing streaming data.

Source: Pinterest

This changed how rules are written, tested, and deployed. It also improved:

  • Scalability: Flink handles high event volumes
  • Performance: Fast rule execution with low latency
  • Flexibility: Easier rule updates and experiments
  • Observability: Real-time feedback with dashboards

Guardian now processes billions of rows across thousands of columns, using distributed processing and micro-batching for performance.

Connecting with Kafka, Iceberg, StarRocks, and More

Guardian’s architecture integrates well with modern data systems:

  • Apache Kafka streams user activity and system events
  • Apache Iceberg provides scalable storage for historical analysis
  • StarRocks offers OLAP support for rich dashboards and queries
  • Internal KV Stores support counters and aggregation logic
Source: Pinterest

This flexibility allows Guardian to handle real-time detection, historical backtesting, and hybrid use cases. It also supports schema evolution and asynchronous processing, which are critical in a fast-changing environment.

Action Rules: Driving Real-Time Enforcement

The core of Guardian is the Action Rule. It uses a SQL-like syntax to select events and apply enforcement actions. For example:

  • Deactivate a suspicious account
  • Softban a user showing bot-like behavior
  • Hide content that violates policy
  • Monitor a risky pattern for review
Source: Pinterest

Rules can be written, tested, and deployed in under an hour. Analysts can backfill past data to evaluate impact. The same query can be used for both testing and enforcement, improving consistency and reducing bugs.

Guardian also supports:

  • Counters: Like the number of times a domain was flagged
  • Visualizations: To spot abuse spikes from specific regions or ISPs
  • Monitoring and Alerts: Trigger alerts when a rule condition matches
  • Backfills: Run historical data through new rules

All of this happens in a system that is highly optimized for performance and low infrastructure cost.

Real-Time Data Streaming and AI for Safer Social Networks

Pinterest’s Guardian platform is a clear example of how data streaming and real-time analytics can transform spam and abuse detection. Kafka and Flink enabled Pinterest to move from slow, batch-based workflows to responsive, scalable, and collaborative systems.

The next frontier is the integration of AI and machine learning. Rules engines are essential, but combining them with models that learn from user behavior and adapt to new threats will bring even greater accuracy and speed.

For any social network, real-time data streaming is no longer optional—it is a foundation for trust and growth. Combining streaming with AI allows platforms to react faster, scale smarter, and protect users more effectively.

Organizations looking to modernize their trust and safety operations should start by investing in streaming platforms and real-time engines like Kafka and Flink. These tools provide the flexibility, control, and performance needed to stay ahead in the fight against abuse.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

Kai Waehner

bridging the gap between technical innovation and business value for real-time data streaming, processing and analytics

Share
Published by
Kai Waehner

Recent Posts

Building Agentic AI with Amazon Bedrock AgentCore and Data Streaming Using Apache Kafka and Flink

Agentic AI goes beyond chatbots. These are autonomous systems that observe, reason, and act—continuously and…

5 days ago

Inside FourKites Logistics Platform: Data Streaming for AI and End-to-End Visibility in the Supply Chain

Global supply chains face constant disruption. Trade conflicts, wars, inflation, and shifting regulations are making…

2 weeks ago

The Rise of Kappa Architecture in the Era of Agentic AI and Data Streaming

The shift from Lambda to Kappa architecture reflects the growing demand for unified, real-time data…

3 weeks ago

FinOps in Real Time: How Data Streaming Transforms Cloud Cost Management

FinOps bridges the gap between finance and engineering to control cloud spend in real time.…

3 weeks ago

Unified Namespace vs. Data Product in IT/OT for Industrial IoT

Industrial companies are connecting machines, sensors, and enterprise systems like never before. Real-time data, cloud-native…

4 weeks ago

Open RAN and Data Streaming: How the Telecom Industry Modernizes Network Infrastructure with Apache Kafka and Flink

Open RAN is transforming telecom by decoupling hardware and software to unlock flexibility, innovation, and…

4 weeks ago