Trace sampling #228

Geal · 2021-12-01T14:43:01Z

Geal · 2021-12-08T13:45:00Z

Following up on the tests in #224 and #237, and after discussing it, the idea of adapting the sampling rate on the result (example: aggressive sampling for successful queries, no sampling for failed ones so we can receive them all) has tradeoffs we might not want to make yet: this requires that we do tail sampling, where we store all of the spans until the end of the query, and then decide if we want to send the trace.

This can result in high memory usage, especially with long running queries. This will also be difficult to get right.
So as a first phase, I'll set up tracing-opentelemetry's head-based sampling, with a rate defined in the configuration, where the sampling decision is done once at the beginning of the trace.
If needed, users will be able to change the configuration and reload it to adapt the rate for debugging

jwilm · 2021-12-14T21:51:57Z

Hi @Geal,

I saw your issue link and thought I'd drop a note here. We ended up implementing a proof-of-concept tail sampling for tracing-opentelemetry. The code is available publicly, but we still consider it experimental. That said, we have it deployed as part of a high volume application, and it has been running without issue for about 1 week. Actually, we've had better memory utilization than without it due to fewer spans being sent out of the system.

To utilize the tail sampling layer, we internally added an extension method to the tracing::Span type that looks like this:

    fn drop_this_trace(&self) {
        dispatcher::get_default(|d| {
            let registry = d
                .downcast_ref::<Registry>()
                .expect("Subscriber is not Registry; this is a bug.");

            let span = d.current_span();
            let spanref = registry.span(&span.id().unwrap()).unwrap();
            let extensions = spanref.extensions();
            let trace_context = extensions.get::<TraceContext>().unwrap();
            let mut trace_ext = trace_context.trace.extensions_mut();
            trace_ext.insert(SampleDecision {
                record_trace: false,
            });
        });
    }

For now we are just calling this in our application code, but it should be possible to write a tracing layer to be inserted before the tracing-opentelemetry layer to look for span attributes having a particular status code (if I'm understanding your use case correctly) and adding the SamplingDecision to the trace context extensions as in the code snippet.

If you want to try it out, I'd be happy to answer any questions or take feedback you may have.

Cheers!

# [v0.1.0-alpha.3] 2022-01-11 ## 🚀🌒 Public alpha release > An alpha or beta release is in volatile, active development. The release might not be feature-complete, and breaking API changes are possible between individual versions. ## ✨ Features - Trace sampling [#228](#228): Tracing each request can be expensive. The router now supports sampling, which allows us to only send a fraction of the received requests. - Health check [#54](#54) ## 🐛 Fixes - Schema parse errors [#136](#136): The router wouldn't display what went wrong when parsing an invalid Schema. It now displays exactly where a the parsing error occurred, and why. - Various tracing and telemetry fixes [#237](#237): The router wouldn't display what went wrong when parsing an invalid Schema. It now displays exactly where a the parsing error occurred, and why. - Query variables validation [#62](#62): Now that we have a schema parsing feature, we can validate the variables and their types against the schemas and queries.

there was an issue with recent attempted publishes due to a bad secret rotation. this PR temporarily disables cargo publishes in our CI pipelines while we create GitHub releases. It will be enabled once the packages are properly released.

Geal added the triage label Dec 1, 2021

Geal self-assigned this Dec 1, 2021

Geal added 2021-11 and removed triage labels Dec 1, 2021

Geal mentioned this issue Dec 1, 2021

implement trace sampling #224

Merged

abernix added 2021-12 and removed 2021-11 labels Dec 3, 2021

Geal closed this as completed in #224 Dec 13, 2021

o0Ignition0o mentioned this issue Jan 11, 2022

release v0.1.0-alpha.3 #314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace sampling #228

Trace sampling #228

Geal commented Dec 1, 2021 •

edited

Loading

Geal commented Dec 8, 2021

jwilm commented Dec 14, 2021

Trace sampling #228

Trace sampling #228

Comments

Geal commented Dec 1, 2021 • edited Loading

Geal commented Dec 8, 2021

jwilm commented Dec 14, 2021

Geal commented Dec 1, 2021 •

edited

Loading