Conversation
What is Sentry Relay?Quick AnswerRelay is Sentry's event forwarding and ingestion service - it acts as a high-performance proxy that sits between your application (where the Sentry SDK runs) and Sentry's backend. It handles the first stage of processing events, including validation, filtering, rate limiting, and normalization before forwarding data to Kafka for further processing. How It WorksWhat Problem Does Relay Solve?Think of Relay as a smart gateway or bouncer for Sentry events. When millions of events come in from SDKs around the world, you need something that can:
Relay handles all of this at the edge, offloading work from the core Sentry backend. Two Operating ModesRelay can run in two different modes depending on your needs: 1. Proxy Mode (Simple Forwarding)
2. Processing Mode (Full Ingestion)
The Ingestion Pipeline FlowHere's how an event flows through Relay in processing mode (the typical setup): From the data flow diagram I found, the complete path looks like: Key ResponsibilitiesRelay handles several critical functions:
Technical Details
Why It MattersRelay is critical to Sentry's architecture because:
Key Concepts
References
Next StepsTo deepen your understanding of Relay:
Research sources: getsentry/relay repository (README.md), getsentry/sentry-docs/develop-docs (data-flow.mdx, ingestion/relay/index.mdx, self-hosted/index.mdx) Generated by senpai agent (Agent ID: a1c1738) |
Sentry Envelope FormatsQuick AnswerEnvelopes are Sentry's universal wire format for sending data from SDKs to the Sentry backend. Think of an envelope like a postal package: it has a shipping label (envelope headers) and can contain one or more items inside (events, attachments, sessions, replays, profiles, logs, and more). This format replaced older single-purpose HTTP endpoints and lets SDKs batch multiple, heterogeneous pieces of data into a single HTTP request. Why Envelopes ExistBefore envelopes, SDKs had to use separate endpoints and separate HTTP requests for different data types -- one call for an error event, another for an attachment, yet another for session data. This was inefficient, especially when you wanted to send related data together (like a crash event plus a minidump file plus a screenshot attachment). Envelopes solve several problems at once:
The Structure of an EnvelopeAn envelope follows a simple line-delimited format (not unlike HTTP multipart form data, but simpler). Here is the grammar: In plain English, an envelope is:
Key Rules
Concrete ExampleHere is a real envelope with two items -- an attachment and an error event: Reading this line by line:
Envelope Headers (the "shipping label")The top-level envelope headers apply to the entire envelope and all its items. The most important ones are:
Item HeadersEvery item within an envelope has its own header line. Two headers are universal:
Additional item-specific headers depend on the item type (for example, attachments require All Item TypesThis is where it gets interesting. Here is the full catalog of item types that an envelope can carry: Core Event Types
Telemetry Data
Attachments
Replay
User Feedback
SDK Diagnostics
Metrics
Reserved (Internal Use)These types are reserved and must not be written by SDK implementations: How Envelopes Flow Through the Ingestion PipelineHere is how an envelope travels from an SDK to storage: Step-by-step
Relay ChainingRelays can be chained. An organization might run a local Relay (in "proxy" or "static" mode) that forwards envelopes to Sentry's hosted Relay. Each hop in the chain preserves the envelope format, including unknown item types and headers it does not understand. Size LimitsRelay enforces these limits on envelopes (values subject to change, defined in Relay's config):
The
|
Sentry Relay Types OverviewSentry's Relay infrastructure has three conceptual tiers of Relays, plus two configuration modes that determine how each Relay behaves. The Three Tiers of RelayThink of Sentry's ingestion pipeline as a series of layers that an event passes through on its way from an SDK to Kafka (and ultimately to storage). Here's the flow: 1. External Relay (Customer-Operated)
2. PoP Relay (Point of Presence)
3. Processing Relay
The Two Configuration Modes (
|
| Mode | Behavior |
|---|---|
managed (default) |
Fetches project configurations from its upstream (Sentry or another Relay). Requires authentication. This is what all Sentry-operated Relays and most customer Relays use. |
proxy |
Forwards all events with minimal processing. Does NOT fetch project configs. Rate limiting still applies. Useful when you just want a forwarding proxy without needing project-specific behavior. |
There was previously a static mode (project configs read from local files), but it was deprecated and removed in Relay v25.9.0.
Other Important Distinctions
Internal vs. External (Trust Level)
This is configured via relay.internal: true/false in the Relay config:
- Internal Relays (PoP + Processing): Receive the full
ProjectConfigwith all fields, including internal/sensitive settings. - External Relays (customer): Receive a limited
LimitedProjectConfig— Sentry's endpoint has an early return that stops populating the config for untrusted Relays.
Default vs. Canary (Instance Type)
From the RelayInstance enum in the code:
- Default: Normal production instance
- Canary: Used for gradual rollouts and experiments before hitting the full fleet
Summary Table
| Tier | Operator | Internal? | Processing? | Mode | Gets Full Config? |
|---|---|---|---|---|---|
| External Relay | Customer | No | No | managed or proxy | No (limited) |
| PoP Relay | Sentry | Yes | No | managed | Yes (full) |
| Processing Relay | Sentry | Yes | Yes | managed | Yes (full) |
Key Concepts
processing.enabled: The boolean flag that distinguishes Processing Relays from all others. Only Processing Relays write to Kafka.relay.internal: Controls trust level and what project config data the Relay receives.- Forward compatibility: Critical because External Relays can be running any version. Enums need catch-all variants, unknown fields must be ignored.
- Chaining: Relays can chain — an External Relay forwards to a PoP Relay, which forwards to a Processing Relay. Each layer adds more processing.
References
- Relay Modes documentation — Official docs on managed vs. proxy modes
- Project Config Versioning — Internal vs. External Relay, naming conventions
RelayModeenum in source — The actual Rust enum definition- Relay Best Practices — Engineering guidelines for working on Relay
- GoCD deployment configs —
processing.jsonnetandpops.jsonnetshow the two deployment pipelines
Next Steps
- Explore the Relay repository to see the processing pipeline in
relay-server/src/services/processor.rs - Read about how project configs flow from Sentry → Relay
- Look into how dynamic sampling works differently at each Relay tier
Research sources: getsentry/sentry-docs (develop-docs/ingestion/relay/), getsentry/relay (relay-config/src/config.rs, gocd/templates/), Relay CHANGELOG.md
The Role of Kafka in Sentry's InfrastructureQuick AnswerKafka is the central nervous system of Sentry's event pipeline. It acts as the distributed message bus that decouples event ingestion from event processing and storage, allowing Sentry to handle massive volumes of error events, transactions, profiles, and more without losing data even under heavy load. How Kafka Fits Into Sentry's ArchitectureHere's the simplified flow: Kafka sits right in the middle -- after ingestion, before processing. Think of it like a conveyor belt system in a factory: Relay puts items on the belt, and various workers (consumers) pick items off to do their specialized jobs. Why Kafka? The Three Big Reasons1. Decoupling Producers from ConsumersWithout Kafka, Relay (the ingestion service) would need to directly call Sentry's processing code. If processing slowed down, ingestion would back up, and you'd start dropping events. With Kafka in between:
2. Handling Traffic Spikes (Backpressure)Sentry processes billions of events. During traffic spikes (imagine a major outage hitting thousands of customers simultaneously), Kafka absorbs the burst. Consumers can catch up gradually without data loss. 3. Fan-Out to Multiple ConsumersA single event often needs to be processed by multiple systems. Kafka lets different consumer groups independently read from the same topic. For example, after an event is saved, both Snuba (for search/analytics) and the post-process pipeline (for alerting) need to see it. Kafka Topics in SentryKafka organizes messages into topics -- think of them as named channels. Sentry has many topics, each serving a specific purpose. Here are the major categories: Ingestion TopicsThese receive raw data from Relay:
Processed/Internal TopicsAfter initial processing, events flow to these:
Subscription & Alerting Topics
Taskworker Topics (Newer!)
Dead Letter Queues (DLQs)Almost every topic has a corresponding The Event Pipeline in DetailHere's how an error event flows through Kafka: Notice Kafka appears twice -- once for ingestion, once for processed events. This two-stage design means ingestion and post-processing can scale independently. The Consumer Framework: ArroyoSentry uses a custom streaming library called Arroyo to build Kafka consumers. It provides:
Consumers are defined in sentry run consumer <consumer-name>Key Concepts
References
Next Steps
Research sources: |
Quick AnswerSentry and OpenTelemetry (OTel) work together at multiple levels: Sentry's newer SDKs are actually built on top of OpenTelemetry internally, Sentry can ingest native OTLP data directly, and you can link Sentry-instrumented services with OTel-instrumented services for end-to-end distributed tracing. The relationship has evolved from "separate systems you can bridge" to "deeply integrated." How It Works1. Sentry SDKs Are Built on OpenTelemetry (Under the Hood)This is the most important thing to understand. The modern Sentry JavaScript SDK (v8+) uses OpenTelemetry under the hood. From the docs:
This means:
Other SDKs (Python, Java, etc.) offer OTel integration as well, though the depth of integration varies by platform. 2. The Historical Context (Why This Matters)When Sentry first built performance monitoring, OTel was still in its early stages. Sentry created its own model with a key concept OTel doesn't have: transactions (a grouping of spans representing a unit of work like an HTTP request). Over time, Sentry has been converging toward the OTel model. The integration works through two main OTel extension points:
3. Three Ways to Get OTel Data Into SentryThere are three primary integration patterns: A) Use a Sentry SDK (OTel built-in)The simplest path. The Sentry SDK handles everything — it uses OTel internally and sends data to Sentry in Sentry's format. B) Direct OTLP Export (No Sentry SDK)Sentry has native OTLP endpoints that accept standard OpenTelemetry data. You can point any OTel SDK's OTLP exporter directly at Sentry — no Sentry SDK needed at all. Use this when:
C) Forwarding via OTel Collector / Vector / Fluent BitFor infrastructure-level telemetry, you can forward data through a pipeline: Use this when:
4. Linking Sentry + OTel Across Services (Distributed Tracing)A common real-world scenario: you have a Sentry SDK on the frontend and an OTel-instrumented backend. To get end-to-end traces: The Sentry SDK option 5. Key Requirements for Full OTel Support in an SDKFor an SDK to be "fully OTel compatible," it must:
Key Concepts
References
Next Steps
Research sources: |
Test Prompt
Senpai ResponseQuick AnswerDynamic Sampling is Sentry's system for intelligently deciding which events to keep and which to drop after they arrive at the ingestion pipeline. Since Sentry extracts metrics from 100% of incoming events (giving you accurate counts, durations, and web vitals), Dynamic Sampling's job is to retain a representative subset of full event payloads — biased toward the most valuable data — so you can still dig into traces, spans, and custom tags without storing everything. How It Works — The Big PictureWhere It Sits in the PipelineDynamic Sampling lives in Relay (our Rust-based ingestion service), and it's step 4 of 5 in the event lifecycle: Key insight: Metrics are extracted before sampling, so your dashboards, alerts, and performance graphs see all your data. Dynamic Sampling only affects whether you can see the full event details (traces, spans, tags) in tools like Trace Explorer or Discover. The Two-Layer Data Model
Core Concepts1. Fidelity (Target Sample Rate)Fidelity = the overall target sample rate for an organization. For example, a fidelity of 20% means Sentry aims to store ~20% of incoming events. There are two modes:
2. Rules and BiasesDynamic Sampling works through a rule-based system. Rules live in the project configuration that Relay fetches from Sentry. Each rule has:
Rules are evaluated top-to-bottom. 3. Trace vs. Transaction Sampling
Trace sampling works by seeding the random number generator with the trace ID, so every event in the same trace gets the same keep/drop decision. The Dynamic Sampling Context (DSC) propagated by SDKs makes this possible. The Biases (Smart Prioritization)Within the target sample rate, Sentry applies biases to retain more valuable data:
Architecture: How Rules Get to Relay
Key Concepts Glossary
References
Next Steps
Research sources: getsentry/sentry-docs develop-docs (dynamic-sampling/index.mdx, the-big-picture.mdx, architecture.mdx, fidelity-and-biases.mdx), getsentry/sentry code search for |
Quick AnswerThe EAP (Events Analytics Platform) schema lives in the getsentry/snuba repository. There are two key places to look: Where the Schema is Defined1. Storage Configuration (YAML) — the "current truth"
This is the main storage definition that Snuba uses at runtime. It declares the full column schema, query processors, stream loader config, and deletion settings. There are also related storage configs for downsampled tables:
All live under: 2. Migrations (Python) — how the ClickHouse tables are actually created
This is the initial migration that creates the 3. Entity Configuration
This defines the logical entity that maps to the storage(s) — it's what the query layer uses to route queries. Schema Structure at a GlanceThe
The 40 buckets for string/float attributes are a sharding strategy — attributes are hashed into buckets so ClickHouse doesn't have to scan one massive Map column for every query. Key Concepts
References
Research sources: getsentry/snuba repository — storage YAML configs, migration files, and directory structure |
best used as following for full output
or like this which delegates to subagent but does an extra step of unnecessary summarizing
outputs below are single question, agent invokations