`sentry-tracing` can significantly slow down instrumented code if the `TransportThread` can't keep up

### Environment

`sentry 0.29.1` with features `contexts`, `panic`, `reqwest`, and `rustls`
`sentry-tower` in version `0.29.1` with the feature `http`
`sentry-tracing` in version `0.29.1` with default features

### Steps to Reproduce

This is most likely to be reproduced with [`traces_sample_rate`](https://docs.rs/sentry/latest/sentry/struct.ClientOptions.html#structfield.traces_sample_rate) set to `1.0` but it will eventually also happen with lower sample rates as well.

1. Configure the [DSN](https://docs.rs/sentry/0.29.2/sentry/struct.ClientOptions.html#structfield.dsn) to something that will cause requests to run into a timeout.
2. Instrument your code with `tracing` and register the [`layer`](https://docs.rs/sentry-tracing/latest/sentry_tracing/fn.layer.html)
3. Run your code in such a way that it produces 32 or more traces in a few seconds.

### Expected Result
The overhead from the sentry tracing layer should be minimal.

### Actual Result
The instrumented code is blocking, waiting for the traces to be sent to `sentry`.

### Detailed explanation
I've spent almost a week tracking down hanging integration tests after moving a test machine from one location to another.

I ultimately found out that we were configuring a DSN of `https://public@sentry.example.com/1` in integration tests which, of course, is not a valid DSN, but it worked fine because it had always caused connection errors immediately. Only in that particular network the machine was moved to, it didn't produce a connection error immediately, but only after a timeout of somewhere between ~30s and ~90s. This led to a channel in `sentry` filling up and the integration tests essentially hanging for hours in CI.

The channel in question is used in `TransportThread` here:
https://github.com/getsentry/sentry-rust/blob/616587bf5049b53fbccaedce9e9838493e0c81d9/sentry/src/transports/tokio_thread.rs#L29
It is a completely synchronous, bounded channel with capacity of 30. (that's why things start slowing down on the 32nd request because one envelope is already in flight, 30 are in the channel and the 32nd is the one that hangs, waiting to be written into the channel).

The envelopes are enqueued here:
https://github.com/getsentry/sentry-rust/blob/616587bf5049b53fbccaedce9e9838493e0c81d9/sentry/src/transports/tokio_thread.rs#L87-L89

And this is the code where things were blocked waiting for the network timeout:
https://github.com/getsentry/sentry-rust/blob/616587bf5049b53fbccaedce9e9838493e0c81d9/sentry/src/transports/reqwest.rs#L71

Although this issue didn't happen in production for us, it very well could if that same call blocks for some reason (e.g. a network issue or an issue in your `sentry.io` backend infrastructure). Let's imagine for example that `sentry.io` has an issue where incoming requests are accepted but no data flows, all customers using `sentry-rust` would suddenly find that every operation that is sampled will block for multiple tens of seconds.

### How to fix
The easiest fix I can come up with is to start dropping transactions if the `TransportThread` can't keep up, because I think a running application is more important than not losing any traces.

This could, for example, be achieved by changing:
https://github.com/getsentry/sentry-rust/blob/616587bf5049b53fbccaedce9e9838493e0c81d9/sentry/src/transports/tokio_thread.rs#L87-L89
To:
```rust
    pub fn send(&self, envelope: Envelope) {
        if self.sender.try_send(Task::SendEnvelope(envelope)).is_err() {
            // Some error message that an envelope was dropped
        }
    }
```

There might be other ways or places to achieve that same goal. But ultimately, the `tracing` instrumentation should not, under any circumstance, be allowed to block the instrumented code to such an extent as waiting for an HTTP request to go through.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`sentry-tracing` can significantly slow down instrumented code if the `TransportThread` can't keep up #543

Environment

Steps to Reproduce

Expected Result

Actual Result

Detailed explanation

How to fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	pub fn send(&self, envelope: Envelope) {
	let _ = self.sender.send(Task::SendEnvelope(envelope));
	}

Uh oh!

sentry-tracing can significantly slow down instrumented code if the TransportThread can't keep up #543

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Detailed explanation

How to fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`sentry-tracing` can significantly slow down instrumented code if the `TransportThread` can't keep up #543