senpai agent poc (ai hackweek) by sl0thentr0py · Pull Request #45 · getsentry/skills

sl0thentr0py · 2026-02-17T13:53:42Z

best used as following for full output

claude --agent sentry-skills:senpai
explain relay's role in sentry's infrastructure

or like this which delegates to subagent but does an extra step of unnecessary summarizing

ask senpai to explain relay's role in sentry's infrastructure

outputs below are single question, agent invokations

sl0thentr0py · 2026-02-17T13:57:30Z

❯ senpai can you explain what relay is

What is Sentry Relay?

Quick Answer

Relay is Sentry's event forwarding and ingestion service - it acts as a high-performance proxy that sits between your application (where the Sentry SDK runs) and Sentry's backend. It handles the first stage of processing events, including validation, filtering, rate limiting, and normalization before forwarding data to Kafka for further processing.

How It Works

What Problem Does Relay Solve?

Think of Relay as a smart gateway or bouncer for Sentry events. When millions of events come in from SDKs around the world, you need something that can:

Quickly validate that events are legitimate and properly formatted
Apply rate limits to prevent overwhelming the system
Filter out unwanted events early (before expensive processing)
Perform initial data normalization

Relay handles all of this at the edge, offloading work from the core Sentry backend.

Two Operating Modes

Relay can run in two different modes depending on your needs:

1. Proxy Mode (Simple Forwarding)

Acts as a lightweight proxy between SDKs and Sentry
Validates incoming events and forwards them upstream
Useful for customers who want to run Relay in their own infrastructure for data privacy/compliance
Minimal processing overhead

2. Processing Mode (Full Ingestion)

This is what Sentry uses internally for SaaS and self-hosted installations
Performs comprehensive event processing, normalization, and filtering
Produces events directly into Kafka for the rest of the pipeline
Requires connection to Redis (for project configs) and Kafka (for event output)

The Ingestion Pipeline Flow

Here's how an event flows through Relay in processing mode (the typical setup):

1. SDK sends event → Relay (via nginx)
2. Relay validates DSN and Project ID (checks Redis for project config)
3. Relay processes the envelope (parses, normalizes, applies rate limits)
4. Relay produces processed events to Kafka topics
5. Kafka → Sentry ingest consumers pick up events for further processing

From the data flow diagram I found, the complete path looks like:

Internet → nginx → Relay → Kafka → ingest-consumers → more processing
                     ↓
                   Redis (for project configs)

Key Responsibilities

Relay handles several critical functions:

Envelope Processing: SDKs send data in "envelopes" containing events, attachments, sessions, etc. Relay parses these and routes different data types appropriately
Project Config Management: Relay caches project configurations from Redis to know how to handle events for each project
Rate Limiting: Enforces quotas and rate limits at the edge
Data Scrubbing: Removes sensitive data (PII) based on project settings
Event Normalization: Ensures event data conforms to expected schema
Smart Filtering: Can drop events based on various rules before expensive processing

Technical Details

Written in Rust: Built for high performance and memory safety
Stateless: Relay instances can be horizontally scaled easily
Config-driven: Uses YAML configuration files and fetches project-specific configs from Redis
Kafka Integration: In processing mode, writes to Kafka topics for downstream consumers

Why It Matters

Relay is critical to Sentry's architecture because:

Performance: Handles millions of events per second with minimal latency
Scalability: Stateless design allows easy horizontal scaling
Cost Efficiency: Filters and rate-limits early, reducing unnecessary processing
Flexibility: Customers can run their own Relay instances for data residency requirements
Reliability: Acts as a buffer between SDKs and the backend

Key Concepts

Envelope: The container format SDKs use to send data to Relay - can contain multiple items (events, attachments, sessions)
Project Config: Configuration data Relay fetches from Redis that tells it how to handle events for each project (rate limits, filters, PII rules)
Data Category: Classification of data types (errors, transactions, attachments) used for rate limiting and routing
DSN (Data Source Name): The URL/key that SDKs use to send data - Relay validates these

References

Relay GitHub Repository - Main codebase
Relay Developer Documentation - Code and development docs
Relay Product Documentation - User-facing documentation
Self-hosted Data Flow - Shows how Relay fits into the overall architecture
Ingestion Documentation - Engineering docs on ingestion pipeline

Next Steps

To deepen your understanding of Relay:

Explore the codebase: Check out the /tmp/relay repository structure to see how it's organized
Run Relay locally: Try setting up a local development instance following the README
Understand envelopes: Look into how SDKs format data and how Relay parses it
Learn about project configs: Explore what configuration options Relay pulls from Redis
Follow event flow: Trace a single event through the entire pipeline from SDK → Relay → Kafka → processing

Research sources: getsentry/relay repository (README.md), getsentry/sentry-docs/develop-docs (data-flow.mdx, ingestion/relay/index.mdx, self-hosted/index.mdx)

Generated by senpai agent (Agent ID: a1c1738)

sl0thentr0py · 2026-02-17T13:57:32Z

 ❯  ask senpai to explain envelope formats to me

Sentry Envelope Formats

Quick Answer

Envelopes are Sentry's universal wire format for sending data from SDKs to the Sentry backend. Think of an envelope like a postal package: it has a shipping label (envelope headers) and can contain one or more items inside (events, attachments, sessions, replays, profiles, logs, and more). This format replaced older single-purpose HTTP endpoints and lets SDKs batch multiple, heterogeneous pieces of data into a single HTTP request.

Why Envelopes Exist

Before envelopes, SDKs had to use separate endpoints and separate HTTP requests for different data types -- one call for an error event, another for an attachment, yet another for session data. This was inefficient, especially when you wanted to send related data together (like a crash event plus a minidump file plus a screenshot attachment).

Envelopes solve several problems at once:

Batching: You can combine an error event, its attachments, and a client report all in one HTTP request.
Heterogeneous payloads: Binary attachments and JSON event payloads can coexist in a single submission.
Hop-to-hop communication: Envelopes work not just between SDK and server, but also between Relays in a chain, and between different SDKs (e.g., a Native SDK embedded in a mobile SDK).
Offline storage: The format is suitable for writing to disk when network connectivity is unavailable, then transmitting later.

The Structure of an Envelope

An envelope follows a simple line-delimited format (not unlike HTTP multipart form data, but simpler). Here is the grammar:

Envelope = EnvelopeHeaders "\n" { Item } [ "\n" ]
Item     = ItemHeaders "\n" Payload
Payload  = { arbitrary bytes }

In plain English, an envelope is:

One line of JSON: the envelope headers
Zero or more items, each consisting of:
- One line of JSON: the item headers
- The item payload (whose length is either declared in a length header or is implicitly "everything up to the next newline")
An optional trailing newline

Key Rules

Newlines are always UNIX-style \n (ASCII 10). A \r before \n is treated as part of the payload, not as a line ending.
All headers are single-line, compact JSON objects.
Unknown header attributes must be preserved (not stripped) by any component that processes the envelope. This is important for forward compatibility.
An empty envelope (headers only, no items) is valid but meaningless and can be discarded.

Concrete Example

Here is a real envelope with two items -- an attachment and an error event:

{"event_id":"9ec79c33ec9942ab8353589fcb2e04dc","dsn":"https://examplePublicKey@o0.ingest.sentry.io/0"}\n
{"type":"attachment","length":10,"content_type":"text/plain","filename":"hello.txt"}\n
\xef\xbb\xbfHello\r\n\n
{"type":"event","length":41,"content_type":"application/json"}\n
{"message":"hello world","level":"error"}\n

Reading this line by line:

Envelope header: Contains the event_id and dsn for authentication.
Item 1 header: Declares an attachment of 10 bytes.
Item 1 payload: The raw attachment bytes (exactly 10 bytes, including a BOM and Windows newline).
Item 2 header: Declares an event of 41 bytes.
Item 2 payload: A JSON error event.

Envelope Headers (the "shipping label")

The top-level envelope headers apply to the entire envelope and all its items. The most important ones are:

Header	Required?	Description
`event_id`	Depends on items	UUID identifying the event/transaction. Required when the envelope contains an event, transaction, or feedback.
`dsn`	Recommended	The full DSN string. Allows the envelope to be "self-authenticating" -- it carries everything needed to route and authenticate itself.
`sent_at`	Recommended	RFC 3339 UTC timestamp of when the SDK sent the envelope. Used by Relay for clock drift correction. Should be set as close to transmission time as possible, and must appear only once.
`sdk`	Recommended	SDK name and version information. Carried at the envelope level so it applies even to items like minidumps or sessions that lack their own SDK metadata.

Item Headers

Every item within an envelope has its own header line. Two headers are universal:

Header	Required?	Description
`type`	Required	The kind of data in this item (e.g., `"event"`, `"transaction"`, `"attachment"`, `"session"`, `"replay_event"`, `"profile"`, `"log"`, etc.)
`length`	Recommended	Payload size in bytes. If omitted, the payload extends to the next newline. For payloads containing newlines (like binary data), `length` must be specified.

Additional item-specific headers depend on the item type (for example, attachments require filename).

All Item Types

This is where it gets interesting. Here is the full catalog of item types that an envelope can carry:

Core Event Types

Item Type	Description
`event`	An error or default event (JSON). At most one per envelope. Mutually exclusive with `transaction`.
`transaction`	A performance transaction (JSON). At most one per envelope. Mutually exclusive with `event`.

Telemetry Data

Item Type	Description
`session`	A single session init or update for Release Health. Can appear multiple times (up to 100 per envelope).
`sessions`	Pre-aggregated session count buckets (up to 100 buckets per item). Can appear multiple times.
`span`	A standalone span (Version 2 spans protocol).
`log`	A container of log entries (new structured logging). Contains an `items` array. Must include `item_count` and `content_type` headers.
`otel_log`	OpenTelemetry-formatted log entries.
`profile`	A profiling payload, associated with a transaction. Must be in the same envelope as its transaction.
`profile_chunk`	A V2 profile sample (continuous profiling). Can exist independently.
`check_in`	A Cron Monitor check-in payload. At most one per envelope.

Attachments

Item Type	Description
`attachment`	A raw binary or text attachment. Can appear multiple times. Supports special `attachment_type` values for minidumps, Apple crash reports, Unreal Engine context, and view hierarchies. Also supports a trace attachment variant via a special content type.

Replay

Item Type	Description
`replay_event`	Replay metadata (JSON). At most once per envelope. Must be paired with a replay recording.
`replay_recording`	The actual replay DOM recording data (JSON or gzipped JSON). Must be paired with a replay event.

User Feedback

Item Type	Description
`feedback`	The current user feedback format. Contains an event with a `feedback` context. At most once per envelope.
`user_report`	Deprecated. The old user report format, associating comments with an existing event.

SDK Diagnostics

Item Type	Description
`client_report`	SDK-side telemetry about what happened to events before they reached Sentry (e.g., how many events were rate-limited or dropped client-side). Can appear multiple times.

Metrics

Item Type	Description
`trace_metric`	A container of trace-scoped metric entries. Follows the same container pattern as `log` items with `item_count` and `content_type` headers.

Reserved (Internal Use)

These types are reserved and must not be written by SDK implementations: security, unreal_report, form_data.

How Envelopes Flow Through the Ingestion Pipeline

Here is how an envelope travels from an SDK to storage:

  SDK                    Relay                        Sentry (Django)
  ===                    =====                        ===============

  1. SDK constructs   -->  2. Relay receives        -->  5. Sentry consumers
     envelope with          envelope via POST             read from Kafka
     headers + items        /api/<project_id>/            topics and process
                            envelope/                     items into storage

                         3. Relay processes:
                            - Authenticates (DSN
                              from header, query
                              param, or envelope
                              header)
                            - Applies rate limits
                            - Filters (inbound
                              filters, sampling)
                            - Validates/normalizes
                              payloads
                            - Applies clock drift
                              correction using
                              sent_at
                            - Extracts metrics
                              from transactions

                         4. Relay forwards
                            accepted items to
                            Kafka (or upstream
                            Relay in a chain)

Step-by-step

SDK creates the envelope: The SDK serializes one or more items into the envelope format. For example, when a crash occurs, it might bundle an error event, a minidump attachment, and a client report together.
HTTP transmission: The SDK POSTs the envelope to POST /api/<project_id>/envelope/. The content type is application/x-sentry-envelope (though text/plain is also accepted to avoid CORS preflight requests in browsers). The entire envelope body can be gzip-compressed via standard HTTP content encoding.
Relay processing: Relay is the first service to receive the envelope. It is written in Rust for high performance. Relay:
- Authenticates the request using the DSN (from HTTP auth headers, query parameters, or the envelope's own dsn header).
- Validates that items conform to size limits.
- Applies rate limiting per project, per data category.
- Filters events based on inbound filter rules and dynamic sampling decisions.
- Normalizes event payloads (fills in defaults, trims fields, etc.).
- Corrects clock drift using the sent_at timestamp.
- Extracts metrics from transaction and span data.
- Gracefully skips unknown item types (preserving them for forward compatibility).
Kafka: Relay produces messages onto various Kafka topics based on item type. Events go to one topic, attachments to another, sessions to another, and so on.
Sentry consumers: Django-based consumer services read from Kafka and handle final processing -- symbolication, grouping, storage in PostgreSQL/Snuba/ClickHouse, etc.

Relay Chaining

Relays can be chained. An organization might run a local Relay (in "proxy" or "static" mode) that forwards envelopes to Sentry's hosted Relay. Each hop in the chain preserves the envelope format, including unknown item types and headers it does not understand.

Size Limits

Relay enforces these limits on envelopes (values subject to change, defined in Relay's config):

Scope	Limit
Entire envelope (decompressed)	200 MiB
Event / transaction / span / log / metric items	1 MiB each
Individual metric within an envelope	2 KiB
Monitor check-in item	100 KiB
Client report item	4 KiB
Profile item	50 MiB
Replay item (compressed)	10 MiB
Replay item (decompressed)	100 MiB
Sessions per envelope	100
Pre-aggregated session buckets per `sessions` item	100

The `length` Header: Explicit vs. Implicit Payloads

This is a subtlety worth understanding:

Explicit length (recommended): The item header includes "length": 42, meaning the next 42 bytes are the payload, regardless of whether they contain newlines. After those bytes, the parser expects a \n or EOF.
Implicit length (length omitted): The payload extends until the next \n or EOF. This only works for payloads guaranteed not to contain newline characters. It is used for small, simple payloads like session updates, where omitting the length can improve compression ratios when many small items are batched together.

The general advice for SDK implementors is: always include length unless you have a specific reason not to (like optimizing compression for many tiny session items).

Authentication Methods

Envelopes support three ways to authenticate:

HTTP X-Sentry-Auth header: The traditional method. The DSN public key is embedded in the header value.
Query parameter: The DSN key is passed as a query parameter on the URL.
Envelope header dsn field: The full DSN is embedded in the envelope's own header JSON. This makes the envelope fully self-contained ("self-authenticating"). This method requires Relay v21.6.0+.

If multiple authentication methods are provided, they must all agree or the request is rejected.

Key Concepts Summary

Envelope: A container format with a header line followed by zero or more items. Think of it as a "package" that can hold multiple types of Sentry data.
Item: A single piece of data within an envelope (event, attachment, session, profile, etc.), consisting of its own header line and payload.
Self-authenticating: An envelope with a dsn header carries everything needed to route itself, making it suitable for offline storage and deferred sending.
Forward compatibility: All components must preserve unknown headers and unknown item types. This lets Sentry evolve the format without breaking existing infrastructure.
Clock drift correction: The sent_at header enables Relay to correct for differences between the SDK's clock and the server's clock.

References

Envelopes specification: https://develop.sentry.dev/sdk/data-model/envelopes/
Envelope Items specification: https://develop.sentry.dev/sdk/data-model/envelope-items/
Relay implementation: getsentry/relay repository, specifically relay-server/src/envelope/mod.rs
Relay config (size limits): getsentry/relay/relay-config/src/config.rs
Ingestion overview: https://develop.sentry.dev/ingestion/

Next Steps

Dynamic Sampling Context (DSC): Envelopes can carry a trace header with sampling decisions that propagate across services.
Explore Relay's Rust code: Clone getsentry/relay and look at relay-server/src/envelope/mod.rs to see how envelopes are parsed and validated in practice.
Study client reports: Understanding the client_report item type helps you see how SDKs report their own event drop/rate-limit statistics back to Sentry.
Look at SDK implementations: Pick any SDK (e.g., getsentry/sentry-python or getsentry/sentry-javascript) and search for "envelope" to see how they construct and serialize envelopes.

sl0thentr0py · 2026-02-17T13:57:34Z

 ❯ what are the different kinds of relays in sentry's infrastructure and how many of them do we have

Sentry Relay Types Overview

Sentry's Relay infrastructure has three conceptual tiers of Relays, plus two configuration modes that determine how each Relay behaves.

The Three Tiers of Relay

Think of Sentry's ingestion pipeline as a series of layers that an event passes through on its way from an SDK to Kafka (and ultimately to storage). Here's the flow:

  SDK (your app)
       │
       ▼
┌─────────────────┐
│ External Relay   │  ← Optional, customer-operated
│ (Customer Relay) │
└───────┬─────────┘
        │
        ▼
┌─────────────────┐
│   PoP Relay      │  ← Sentry-operated, at the edge
│ (Point of        │
│  Presence)       │
└───────┬─────────┘
        │
        ▼
┌─────────────────┐
│ Processing Relay │  ← Sentry-operated, in the core region
│ (Internal)       │
└───────┬─────────┘
        │
        ▼
     Kafka → Snuba → Storage

1. External Relay (Customer-Operated)

Who runs it: Customers (self-hosted or on-prem users)
Purpose: Gives customers a local ingestion point. Events are filtered, rate-limited, and normalized before leaving the customer's network, reducing bandwidth and providing data scrubbing at the edge.
Key trait: Sentry has no control over the version customers run. This is why forward compatibility is so important in the Relay codebase — newer SDKs might send data to older External Relays. The developer docs explicitly note: "we practically have no deprecation policy for old versions."
Config: Runs in managed or proxy mode. It receives a restricted/limited project configuration (via LimitedProjectConfig) — Sentry intentionally withholds sensitive internal config from untrusted Relays.
How many: Variable — as many as customers choose to deploy. We don't control this number.

2. PoP Relay (Point of Presence)

Who runs it: Sentry (us!)
Purpose: These are Sentry's edge nodes distributed geographically. They sit close to customers to minimize latency for the initial SDK → Relay request. The key insight: PoP Relays respond to the SDK asynchronously — they accept the event quickly and then forward it to the Processing Relay in the background. This means SDKs don't have to wait for the event to travel all the way to our core infrastructure.
What they do: Rate limiting, basic normalization, filtering, metrics extraction, dynamic sampling. They do not do full event processing (no symbolication, no grouping, etc.).
Config: These are "internal" Relays (the config flag relay.internal: true gives them access to the full ProjectConfig, not the limited one). They run in managed mode but do not have processing.enabled = true.
How many: Based on the deployment configs, PoP Relays are deployed across multiple regions including US, DE (Germany), S4S (Sentry for Sentry), and other regions. The exact number of instances scales with traffic. The deployment pipeline (relay-pop) deploys to these regions with canary and primary stages.

3. Processing Relay

Who runs it: Sentry (us!)
Purpose: This is the final Relay in the chain before events hit Kafka. It does the "heavy lifting" — full event processing including symbolication coordination, grouping, PII scrubbing, metric extraction, and ultimately producing events to Kafka topics.
Key trait: This is the only Relay that has processing.enabled = true in its config. This flag enables features like writing directly to Kafka, accessing Redis for project config caching, and performing full normalization.
Config: Runs as an internal, managed Relay with the processing feature flag compiled in and enabled. It receives the full, trusted project configuration.
How many: Deployed in the core regions (US, DE, S4S, single-tenant regions). The deployment pipeline is relay-processing. The exact replica count depends on traffic.

The Two Configuration Modes (`relay.mode`)

Orthogonal to the tier, every Relay runs in one of two modes (defined in relay-config/src/config.rs):

Mode	Behavior
`managed` (default)	Fetches project configurations from its upstream (Sentry or another Relay). Requires authentication. This is what all Sentry-operated Relays and most customer Relays use.
`proxy`	Forwards all events with minimal processing. Does NOT fetch project configs. Rate limiting still applies. Useful when you just want a forwarding proxy without needing project-specific behavior.

There was previously a static mode (project configs read from local files), but it was deprecated and removed in Relay v25.9.0.

Other Important Distinctions

Internal vs. External (Trust Level)

This is configured via relay.internal: true/false in the Relay config:

Internal Relays (PoP + Processing): Receive the full ProjectConfig with all fields, including internal/sensitive settings.
External Relays (customer): Receive a limited LimitedProjectConfig — Sentry's endpoint has an early return that stops populating the config for untrusted Relays.

Default vs. Canary (Instance Type)

From the RelayInstance enum in the code:

Default: Normal production instance
Canary: Used for gradual rollouts and experiments before hitting the full fleet

Summary Table

Tier	Operator	Internal?	Processing?	Mode	Gets Full Config?
External Relay	Customer	No	No	managed or proxy	No (limited)
PoP Relay	Sentry	Yes	No	managed	Yes (full)
Processing Relay	Sentry	Yes	Yes	managed	Yes (full)

Key Concepts

processing.enabled: The boolean flag that distinguishes Processing Relays from all others. Only Processing Relays write to Kafka.
relay.internal: Controls trust level and what project config data the Relay receives.
Forward compatibility: Critical because External Relays can be running any version. Enums need catch-all variants, unknown fields must be ignored.
Chaining: Relays can chain — an External Relay forwards to a PoP Relay, which forwards to a Processing Relay. Each layer adds more processing.

References

Relay Modes documentation — Official docs on managed vs. proxy modes
Project Config Versioning — Internal vs. External Relay, naming conventions
RelayMode enum in source — The actual Rust enum definition
Relay Best Practices — Engineering guidelines for working on Relay
GoCD deployment configs — processing.jsonnet and pops.jsonnet show the two deployment pipelines

Next Steps

Explore the Relay repository to see the processing pipeline in relay-server/src/services/processor.rs
Read about how project configs flow from Sentry → Relay
Look into how dynamic sampling works differently at each Relay tier

Research sources: getsentry/sentry-docs (develop-docs/ingestion/relay/), getsentry/relay (relay-config/src/config.rs, gocd/templates/), Relay CHANGELOG.md

sl0thentr0py · 2026-02-17T13:57:36Z

❯ explain the role of kafka in sentry's infra

The Role of Kafka in Sentry's Infrastructure

Quick Answer

Kafka is the central nervous system of Sentry's event pipeline. It acts as the distributed message bus that decouples event ingestion from event processing and storage, allowing Sentry to handle massive volumes of error events, transactions, profiles, and more without losing data even under heavy load.

How Kafka Fits Into Sentry's Architecture

Here's the simplified flow:

Your App -> SDK -> Relay -> KAFKA -> Sentry Consumers -> ClickHouse/Postgres
                                  -> Snuba Consumers  -> ClickHouse
                                  -> Post-Process      -> Alerts, etc.

Kafka sits right in the middle -- after ingestion, before processing. Think of it like a conveyor belt system in a factory: Relay puts items on the belt, and various workers (consumers) pick items off to do their specialized jobs.

Why Kafka? The Three Big Reasons

1. Decoupling Producers from Consumers

Without Kafka, Relay (the ingestion service) would need to directly call Sentry's processing code. If processing slowed down, ingestion would back up, and you'd start dropping events. With Kafka in between:

Relay just writes messages to Kafka topics and moves on
Consumers read at their own pace
If a consumer falls behind, messages queue up safely in Kafka rather than being lost

2. Handling Traffic Spikes (Backpressure)

Sentry processes billions of events. During traffic spikes (imagine a major outage hitting thousands of customers simultaneously), Kafka absorbs the burst. Consumers can catch up gradually without data loss.

3. Fan-Out to Multiple Consumers

A single event often needs to be processed by multiple systems. Kafka lets different consumer groups independently read from the same topic. For example, after an event is saved, both Snuba (for search/analytics) and the post-process pipeline (for alerting) need to see it.

Kafka Topics in Sentry

Kafka organizes messages into topics -- think of them as named channels. Sentry has many topics, each serving a specific purpose. Here are the major categories:

Ingestion Topics

These receive raw data from Relay:

Topic	What flows through it
`ingest-events`	Error events from SDKs
`ingest-transactions`	Performance transaction data
`ingest-spans`	Individual span data
`ingest-attachments`	File attachments
`ingest-replay-events`	Session replay events
`ingest-feedback-events`	User feedback
`ingest-monitors`	Cron monitor check-ins
`profiles`	Profiling data
`ingest-metrics` / `ingest-performance-metrics`	Metrics data

Processed/Internal Topics

After initial processing, events flow to these:

Topic	Purpose
`events`	Processed error events, consumed by Snuba
`transactions`	Processed transactions, consumed by Snuba
`generic-events`	Generic event stream (issue platform)
`snuba-metrics` / `snuba-generic-metrics`	Metrics for ClickHouse storage

Subscription & Alerting Topics

Topic	Purpose
`events-subscription-results`	Alert query results for errors
`transactions-subscription-results`	Alert query results for transactions

Taskworker Topics (Newer!)

Topic	Purpose
`taskworker`, `taskworker-ingest`, `taskworker-email`, etc.	Kafka-based task execution (replacing some Celery usage)

Dead Letter Queues (DLQs)

Almost every topic has a corresponding -dlq topic. When a message fails processing, it goes to the DLQ instead of being lost. This is critical for reliability -- you can investigate and replay failed messages later.

The Event Pipeline in Detail

Here's how an error event flows through Kafka:

1. SDK sends event
       |
2. Relay receives, validates, rate-limits
       |
3. Relay publishes to -----------------> Kafka topic: "ingest-events"
                                              |
4. Sentry ingest consumer reads <-------------+
       |
5. Preprocesses (symbolication, etc.)
       |
6. Saves event to nodestore (Postgres)
       |
7. Publishes to -----------------------> Kafka topic: "events"
                                              |
                            +-----------------+------------------+
                            |                                    |
8a. Snuba consumer              8b. Post-process forwarder
    writes to ClickHouse             triggers alerts, integrations

Notice Kafka appears twice -- once for ingestion, once for processed events. This two-stage design means ingestion and post-processing can scale independently.

The Consumer Framework: Arroyo

Sentry uses a custom streaming library called Arroyo to build Kafka consumers. It provides:

A ProcessingStrategyFactory pattern for defining how messages are processed
Batching, parallel processing, and backpressure handling
Commit management (tracking which messages have been processed)

Consumers are defined in src/sentry/consumers/__init__.py and run via:

sentry run consumer <consumer-name>

Key Concepts

Topic: A named channel in Kafka (e.g., ingest-events). Messages in a topic are ordered within partitions.
Consumer: A service that reads from a topic and does work (e.g., snuba-consumer writes events to ClickHouse).
Producer: A service that writes to a topic (e.g., Relay produces to ingest-events).
Dead Letter Queue (DLQ): A fallback topic for messages that fail processing.
Consumer Group: Multiple instances of a consumer share the workload of a topic -- each partition is read by only one instance.
Arroyo: Sentry's Python library for building robust Kafka consumers with batching and backpressure.

References

Kafka consumers guide -- How to create new topics and consumers
Application Architecture Overview -- High-level system diagram
Self-hosted Data Flow -- Detailed data flow with Kafka's role
Arroyo library -- The streaming consumer framework
kafka_definition.py -- All topic definitions in code
consumers/__init__.py -- All consumer definitions

Next Steps

Hands-on: Run sentry run consumer --help in your dev environment to see all available consumers
Explore topics: Look at kafka_definition.py to see the full list of ~80+ topics
Learn Arroyo: Check out the Arroyo getting started guide if you'll be building consumers
Related topics: Snuba (the query engine that reads from Kafka into ClickHouse) and Relay (the ingestion service that produces to Kafka)

Research sources: getsentry/sentry-docs/develop-docs (kafka.mdx, data-flow.mdx, overview.mdx, feedback-architecture.mdx), getsentry/sentry/src/sentry/conf/types/kafka_definition.py

sl0thentr0py · 2026-02-17T14:10:45Z

❯ how do sentry and opentelemetry work together

Quick Answer

Sentry and OpenTelemetry (OTel) work together at multiple levels: Sentry's newer SDKs are actually built on top of OpenTelemetry internally, Sentry can ingest native OTLP data directly, and you can link Sentry-instrumented services with OTel-instrumented services for end-to-end distributed tracing. The relationship has evolved from "separate systems you can bridge" to "deeply integrated."

How It Works

1. Sentry SDKs Are Built on OpenTelemetry (Under the Hood)

This is the most important thing to understand. The modern Sentry JavaScript SDK (v8+) uses OpenTelemetry under the hood. From the docs:

"The Sentry SDK uses OpenTelemetry under the hood. This means that any OpenTelemetry instrumentation that emits spans will automatically be picked up by Sentry without any further configuration."

This means:

If a user adds any OTel-compatible instrumentation library, Sentry picks up those spans automatically
Context isolation and trace propagation use OTel's machinery
Users can optionally use OTel APIs directly alongside the Sentry SDK

Other SDKs (Python, Java, etc.) offer OTel integration as well, though the depth of integration varies by platform.

2. The Historical Context (Why This Matters)

When Sentry first built performance monitoring, OTel was still in its early stages. Sentry created its own model with a key concept OTel doesn't have: transactions (a grouping of spans representing a unit of work like an HTTP request). Over time, Sentry has been converging toward the OTel model.

The integration works through two main OTel extension points:

┌─────────────────────────────────────────────┐
│           OpenTelemetry SDK                  │
│                                              │
│  ┌──────────────────┐  ┌─────────────────┐  │
│  │  SpanProcessor   │  │   Propagator    │  │
│  │  (SentrySpan     │  │  (SentryTrace   │  │
│  │   Processor)     │  │   Propagator)   │  │
│  └────────┬─────────┘  └───────┬─────────┘  │
│           │                    │             │
└───────────┼────────────────────┼─────────────┘
            │                    │
            ▼                    ▼
   ┌─────────────────┐  ┌──────────────────┐
   │  Transforms OTel │  │ Propagates       │
   │  spans → Sentry  │  │ sentry-trace +   │
   │  spans/txns      │  │ baggage headers  │
   └─────────────────┘  └──────────────────┘

SentrySpanProcessor: Converts OTel spans into Sentry's data model (spans + transactions)
SentryPropagator: Ensures sentry-trace and baggage headers are injected/extracted alongside OTel's W3C traceparent headers, enabling distributed tracing and dynamic sampling

3. Three Ways to Get OTel Data Into Sentry

There are three primary integration patterns:

A) Use a Sentry SDK (OTel built-in)

The simplest path. The Sentry SDK handles everything — it uses OTel internally and sends data to Sentry in Sentry's format.

Your App → Sentry SDK (OTel under the hood) → Sentry

B) Direct OTLP Export (No Sentry SDK)

Sentry has native OTLP endpoints that accept standard OpenTelemetry data. You can point any OTel SDK's OTLP exporter directly at Sentry — no Sentry SDK needed at all.

Your App → OTel SDK → OTLP Exporter → Sentry OTLP Endpoint

Use this when:

You're already fully invested in OTel and don't want another SDK
You want the simplest setup with no Sentry-specific code

C) Forwarding via OTel Collector / Vector / Fluent Bit

For infrastructure-level telemetry, you can forward data through a pipeline:

Infrastructure Sources → OTel Collector / Vector / Fluent Bit → Sentry

Use this when:

You need to collect logs from infrastructure (CloudWatch, Nginx, Kafka, syslog)
You want to transform or route telemetry before it reaches Sentry
You're aggregating data from multiple sources

4. Linking Sentry + OTel Across Services (Distributed Tracing)

A common real-world scenario: you have a Sentry SDK on the frontend and an OTel-instrumented backend. To get end-to-end traces:

┌──────────────────┐         ┌──────────────────────┐
│  Frontend         │  W3C   │  Backend              │
│  (Sentry SDK)     │──────→ │  (OTel instrumented)  │
│                   │traceparent│                     │
│  propagate        │ header │  picks up trace       │
│  TraceparentOn    │        │  context              │
└──────────────────┘         └──────────────────────┘

The Sentry SDK option propagateTraceparent sends the standard W3C traceparent header with outgoing requests, which OTel-instrumented backends pick up automatically. This links everything into a single distributed trace visible in Sentry's trace view.

5. Key Requirements for Full OTel Support in an SDK

For an SDK to be "fully OTel compatible," it must:

Link errors to traces via trace context
Propagate both sentry-trace and baggage headers (via the SentryPropagator)
Match Span/Trace IDs between Sentry and OTel
Filter out spans representing requests to Sentry itself (avoid recursion)

Key Concepts

OTLP (OpenTelemetry Protocol): The wire protocol for sending telemetry data. Sentry now accepts this natively.
SpanProcessor: An OTel extension point that lets Sentry transform OTel spans into Sentry's data model.
Propagator: An OTel extension point for injecting/extracting trace context headers across service boundaries.
sentry-trace header: Sentry's custom trace propagation header (used alongside W3C traceparent).
traceparent header: The W3C standard trace context header that both OTel and Sentry understand.
Dynamic Sampling Context (DSC): Carried in the baggage header, allows Sentry to make consistent sampling decisions across a distributed trace.

References

Sentry OTel SDK Dev Docs — Technical spec for how SDKs implement OTel support (SpanProcessor, Propagator, etc.)
Sentry with OTel (User Docs) — Guide for linking Sentry SDK traces with OTel-instrumented services
Direct OTLP Export — Sending OTel traces/logs directly to Sentry's OTLP endpoints
OTLP Forwarding — Forwarding telemetry via OTel Collector, Vector, or Fluent Bit
JS SDK OpenTelemetry Docs — How the JavaScript SDK uses OTel under the hood

Next Steps

If you're working on SDKs: Read the OTel SDK spec to understand the SpanProcessor and Propagator implementation details
If you're working on ingestion (Relay): Look at how Relay handles OTLP data — search for OTLP-related code in getsentry/relay
If you're curious about the data model: Check out the performance monitoring research doc for historical context on how Sentry's model differs from (and is converging with) OTel's model
Hands-on: Try setting up a Node.js app with the Sentry SDK and adding a third-party OTel instrumentation library — you'll see the spans appear in Sentry automatically

Research sources: getsentry/sentry-docs — specifically develop-docs/sdk/telemetry/traces/opentelemetry.mdx, docs/concepts/otlp/sentry-with-otel.mdx, docs/concepts/otlp/direct/index.mdx, docs/concepts/otlp/forwarding/index.mdx, and docs/platforms/javascript/common/opentelemetry/index.mdx

sl0thentr0py · 2026-02-17T15:08:43Z

Test Prompt

what is dynamic sampling and how does it work

Senpai Response

Quick Answer

Dynamic Sampling is Sentry's system for intelligently deciding which events to keep and which to drop after they arrive at the ingestion pipeline. Since Sentry extracts metrics from 100% of incoming events (giving you accurate counts, durations, and web vitals), Dynamic Sampling's job is to retain a representative subset of full event payloads — biased toward the most valuable data — so you can still dig into traces, spans, and custom tags without storing everything.

How It Works — The Big Picture

Where It Sits in the Pipeline

Dynamic Sampling lives in Relay (our Rust-based ingestion service), and it's step 4 of 5 in the event lifecycle:

SDK → Relay
        │
        ├─ 1. Inbound Data Filters  (e.g., deny old browsers, blocked releases)
        ├─ 2. Quota Enforcement      (billing — all events past here count toward quota)
        ├─ 3. Metrics Extraction      (extract counts, durations, web vitals from 100% of events)
        ├─ 4. ⭐ Dynamic Sampling    (decide: keep or drop the full event payload?)
        └─ 5. Rate Limiting           (safety valve — rarely triggered)
              │
              ▼
           Stored Events (indexed, searchable)

Key insight: Metrics are extracted before sampling, so your dashboards, alerts, and performance graphs see all your data. Dynamic Sampling only affects whether you can see the full event details (traces, spans, tags) in tools like Trace Explorer or Discover.

The Two-Layer Data Model

Layer	Source	Affected by Sampling?	Used For
Metrics	Extracted from 100% of events	No	SPM/TPM, web vitals, alerts, dashboards
Stored Events	Kept/dropped by Dynamic Sampling	Yes	Trace Explorer, Discover, span details, custom tags

Core Concepts

1. Fidelity (Target Sample Rate)

Fidelity = the overall target sample rate for an organization. For example, a fidelity of 20% means Sentry aims to store ~20% of incoming events.

There are two modes:

Automatic Mode (default): Sentry manages sample rates per project, boosting low-volume projects so they aren't drowned out. Set at the org level via sentry:target_sample_rate.
Manual Mode: You set a static sample rate per project. Sentry won't adjust them.

2. Rules and Biases

Dynamic Sampling works through a rule-based system. Rules live in the project configuration that Relay fetches from Sentry. Each rule has:

A condition (which events it matches)
A sampling value (either a sampleRate or a factor that multiplies with other rules)
An optional time range and decaying function

Rules are evaluated top-to-bottom. factor rules accumulate multipliers; sampleRate rules produce a final decision.

3. Trace vs. Transaction Sampling

Type	Scope	Guarantees
Trace Sampling	Entire trace (all related events)	All-or-nothing: keep the whole trace or drop it
Transaction Sampling	Individual transaction	No trace completeness guarantee

Trace sampling works by seeding the random number generator with the trace ID, so every event in the same trace gets the same keep/drop decision. The Dynamic Sampling Context (DSC) propagated by SDKs makes this possible.

The Biases (Smart Prioritization)

Within the target sample rate, Sentry applies biases to retain more valuable data:

Bias	What It Does	Configurable?
Prioritize New Releases	Boosts sample rate for new releases, decaying as adoption increases	✅ Yes
Prioritize Dev Environments	100% sample rate for `dev`, `local`, `test`, etc.	✅ Yes
Prioritize Low-Volume Projects	Boosts small projects so they're not overshadowed (Automatic Mode only)	No (automatic)
Prioritize Low-Volume Transactions	Rebalances so rare transactions get more samples	No (automatic)
Deprioritize Health Checks	Reduces rate for `/healthz`, `/heartbeat`, etc.	✅ Yes

Architecture: How Rules Get to Relay

┌─────────────────────────────────────────────────────────┐
│                        Sentry                           │
│                                                         │
│  ┌──────────┐    ┌───────────┐    ┌──────────────────┐  │
│  │  Celery   │───▶│   Redis   │◀───│  Rule Generator  │  │
│  │  Tasks    │    │  (shared  │    │  (project config │  │
│  │ (periodic │    │   state)  │    │   recomputation) │  │
│  │  cron)    │    └───────────┘    └────────┬─────────┘  │
│  └──────────┘                               │            │
│   Computes:                                 │            │
│   - Low-volume project rates                ▼            │
│   - Low-volume txn rates        ┌──────────────────┐    │
│   - Org recalibration           │ Project Config   │    │
│                                 │ (cached in Redis)│    │
│                                 └────────┬─────────┘    │
└──────────────────────────────────────────│──────────────┘
                                           │ pull (periodic)
                                           ▼
                                    ┌─────────────┐
                                    │    Relay     │
                                    │              │
                                    │ Evaluates    │
                                    │ rules per    │
                                    │ event, makes │
                                    │ keep/drop    │
                                    │ decision     │
                                    └─────────────┘

Celery tasks run periodically, querying Snuba for traffic volumes and computing per-project and per-transaction sample rates. Results go to Redis.
Rule generation in Sentry reads from Redis and DB, assembles sampling rules, and caches the project configuration.
Relay pulls project configs periodically and evaluates rules for every incoming event.
If no rule matches, Relay keeps the event (better to oversample than lose important data).

Key Concepts Glossary

DSC (Dynamic Sampling Context): Metadata propagated by SDKs across a trace so all events in a trace can be sampled consistently.
Fidelity: The target overall sample rate for an organization.
Factor rule: A multiplier that adjusts sample rate without making a final decision (matching continues).
SampleRate rule: Sets the base rate and terminates matching (final decision).
Recalibration: Periodic adjustment to ensure the actual sample rate stays close to the target sample rate.

References

Developer Docs (primary source): https://develop.sentry.dev/application-architecture/dynamic-sampling/
Fidelity & Biases: https://develop.sentry.dev/application-architecture/dynamic-sampling/fidelity-and-biases/
Architecture: https://develop.sentry.dev/application-architecture/dynamic-sampling/architecture/
Code — Rule generation: src/sentry/dynamic_sampling/rules/base.py in getsentry/sentry
Code — Biases: src/sentry/dynamic_sampling/rules/biases/ directory in getsentry/sentry
Code — Celery tasks: src/sentry/dynamic_sampling/tasks/ directory in getsentry/sentry
Relay sampling implementation: relay_sampling docs

Next Steps

Explore the biases code at src/sentry/dynamic_sampling/rules/biases/ to see how each bias generates rules
Read about the DSC at https://develop.sentry.dev/sdk/telemetry/traces/dynamic-sampling-context/ to understand how SDKs propagate trace context
Look at the Celery tasks in src/sentry/dynamic_sampling/tasks/ to understand how sample rates are periodically recalculated
Try the UI: Go to Project Settings → Performance to see which biases are enabled and how they can be toggled

Research sources: getsentry/sentry-docs develop-docs (dynamic-sampling/index.mdx, the-big-picture.mdx, architecture.mdx, fidelity-and-biases.mdx), getsentry/sentry code search for dynamic_sampling

sl0thentr0py · 2026-02-17T15:27:15Z

Prompt: where is the EAP schema located

Quick Answer

The EAP (Events Analytics Platform) schema lives in the getsentry/snuba repository. There are two key places to look:

Where the Schema is Defined

1. Storage Configuration (YAML) — the "current truth"

snuba/datasets/configuration/events_analytics_platform/storages/eap_items.yaml

This is the main storage definition that Snuba uses at runtime. It declares the full column schema, query processors, stream loader config, and deletion settings. There are also related storage configs for downsampled tables:

eap_items.yaml — main table
eap_items_downsample_8.yaml
eap_items_downsample_64.yaml
eap_items_downsample_512.yaml
eap_item_co_occurring_attrs.yaml

All live under:

snuba/datasets/configuration/events_analytics_platform/storages/

2. Migrations (Python) — how the ClickHouse tables are actually created

snuba/snuba_migrations/events_analytics_platform/0024_items.py

This is the initial migration that creates the eap_items_1_local and eap_items_1_dist ClickHouse tables. It defines the columns, engine (ReplacingMergeTree), partition key, sort order, and indices. Subsequent migrations (0026 through 0053) add columns, indices, downsampled views, etc.

3. Entity Configuration

snuba/datasets/configuration/events_analytics_platform/entities/eap_items.yaml

This defines the logical entity that maps to the storage(s) — it's what the query layer uses to route queries.

Schema Structure at a Glance

The eap_items table has this structure:

Column	Type	Purpose
`organization_id`	UInt64	Org scoping
`project_id`	UInt64	Project scoping
`item_type`	UInt8	Type of item (span, log, etc.)
`timestamp`	DateTime	When the event occurred
`trace_id`	UUID	Distributed trace ID
`item_id`	UInt128	Unique item identifier
`sampling_weight`	UInt64	For extrapolation
`sampling_factor`	Float64	Sampling correction factor
`retention_days`	UInt16	TTL control
`attributes_bool`	Map(String, Bool)	Boolean attributes
`attributes_int`	Map(String, Int64)	Integer attributes
`attributes_string_0..39`	Map(String, String)	String attributes (40 buckets!)
`attributes_float_0..39`	Map(String, Float64)	Float attributes (40 buckets!)

The 40 buckets for string/float attributes are a sharding strategy — attributes are hashed into buckets so ClickHouse doesn't have to scan one massive Map column for every query.

Key Concepts

EAP = Events Analytics Platform — Sentry's generalized storage for trace items (spans, logs, etc.) in ClickHouse
ReplacingMergeTree — The ClickHouse engine used, which deduplicates rows with the same sort key
Downsampled tables — Pre-aggregated views at 8x, 64x, and 512x reduction for faster queries on large datasets

References

Storage config: snuba/datasets/configuration/events_analytics_platform/storages/eap_items.yaml
Initial migration: snuba/snuba_migrations/events_analytics_platform/0024_items.py
Rust processor: rust_snuba/src/processors/eap_items.rs (how Kafka messages are processed into rows)
RPC resolvers: snuba/web/rpc/v1/resolvers/R_eap_items/ (how queries are served)

Research sources: getsentry/snuba repository — storage YAML configs, migration files, and directory structure

senpai poc

8857197

sl0thentr0py changed the title ~~senpai poc (ai hackweek)~~ senpai agent poc (ai hackweek) Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

senpai agent poc (ai hackweek)#45

senpai agent poc (ai hackweek)#45
sl0thentr0py wants to merge 1 commit intomainfrom
neel/senpai

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

Uh oh!

sl0thentr0py commented Feb 17, 2026

Uh oh!

sl0thentr0py commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sl0thentr0py commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sl0thentr0py commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is Sentry Relay?

Quick Answer

How It Works

What Problem Does Relay Solve?

Two Operating Modes

The Ingestion Pipeline Flow

Key Responsibilities

Technical Details

Why It Matters

Key Concepts

References

Next Steps

Uh oh!

sl0thentr0py commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sentry Envelope Formats

Quick Answer

Why Envelopes Exist

The Structure of an Envelope

Key Rules

Concrete Example

Envelope Headers (the "shipping label")

Item Headers

All Item Types

Core Event Types

Telemetry Data

Attachments

Replay

User Feedback

SDK Diagnostics

Metrics

Reserved (Internal Use)

How Envelopes Flow Through the Ingestion Pipeline

Step-by-step

Relay Chaining

Size Limits

The length Header: Explicit vs. Implicit Payloads

Authentication Methods

Key Concepts Summary

References

Next Steps

Uh oh!

sl0thentr0py commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sentry Relay Types Overview

The Three Tiers of Relay

1. External Relay (Customer-Operated)

2. PoP Relay (Point of Presence)

3. Processing Relay

The Two Configuration Modes (relay.mode)

Other Important Distinctions

Internal vs. External (Trust Level)

Default vs. Canary (Instance Type)

Summary Table

Key Concepts

References

Next Steps

Uh oh!

sl0thentr0py commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Role of Kafka in Sentry's Infrastructure

Quick Answer

How Kafka Fits Into Sentry's Architecture

Why Kafka? The Three Big Reasons

1. Decoupling Producers from Consumers

2. Handling Traffic Spikes (Backpressure)

3. Fan-Out to Multiple Consumers

Kafka Topics in Sentry

Ingestion Topics

Processed/Internal Topics

Subscription & Alerting Topics

Taskworker Topics (Newer!)

Dead Letter Queues (DLQs)

The Event Pipeline in Detail

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

The `length` Header: Explicit vs. Implicit Payloads

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

The Two Configuration Modes (`relay.mode`)

sl0thentr0py commented Feb 17, 2026 •

edited

Loading

sl0thentr0py commented Feb 17, 2026 •

edited

Loading