Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Pipeline monitoring metrics #249

Closed
wants to merge 21 commits into from
Binary file added text/images/otel-pipeline-monitoring.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
364 changes: 364 additions & 0 deletions text/metrics/0238-pipeline-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
# OpenTelemetry Telemetry Pipeline metrics

Propose a uniform standard for telemetry pipeline metrics generated by
OpenTelemetry SDKs and Collectors.

WIP: this is a work-in-progress draft, derived from OTEP 238 which
was closed pending further development.

Tl;DR: this proposes TWO metric instrument semantics, three overall
considering SDKs and collectors:

- `otelcol_consumed_items`: Received and inserted data items (Collector)
- `otelcol_produced_items`: Exported, dropped, and discarded items (Collector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The producer/consumer terminology makes these definitions a bit confusing for me. Intuitively I would expect inserted items to be a producer behavior, and dropped/discarded items to be a consumer behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- I had this same realization, the terms feel ambiguous.

How would you feel about

otelcol_input_items and otelcol_output_items?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer

- `otelsdk_produced_items`: Exported, dropped, and discarded items (SDK)

## Motivation

OpenTelemetry desires to standardize conventions for the metrics
emitted by SDKs about success and failure of telemetry reporting. At
the same time, the OpenTelemetry Collector has existing conventions
which are expected to connect with metrics emitted by SDKs and have
similar definitions.

## Explanation

We use the term "pipeline" to describe an arrangement of system
components which produce, consume, and process telemetry on its way
from the point of origin to the endpoint(s) in its journey. Pipeline
components included in this specification are:

- OpenTelemetry SDKs: As telemetry producers, these components are the
start of a pipeline.
- OpenTelemetry Collectors: The OpenTelemetry collector contains an
arrangement of components which act as both consumers and producers.

The terms "following"/"follower" and "preceding"/"preceder" refer to
the relative components after or before another component in a
pipeline. The preceding component ("preceder") produces data that is
consumed by the following component ("follower").

An arrangement of pipeline components acting as a single unit, such as
Copy link

@0x006EA1E5 0x006EA1E5 May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that there will be similar otelcol_*_items metrics for the segments as well as the components? It's not clear to me here how these two concepts apply here.

When it comes to "data loss", I am often more interested in the network boundary between "segments", e.g., when using the loadbalancingexporter to route to a following Collector instance.
Currently, I compare the component level loadbalancingexporter and following otlpreceiver metrics to try to understand data loss, but really what I care about is segment level view

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0x006EA1E5, I'm working on writing out more details on data loss between segments. Here is my current scribble that looks at how a resource exhausted response would look.

implemented by the OpenTelemetry Collector, is called a segment. Each
segment consists of a receiver, zero or more processors, and an
exporter. The terms "following" and "preceding" apply to pipeline
Copy link

@0x006EA1E5 0x006EA1E5 May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a OTel Collector pipeline is configured with more than one receiver / exporter, is this then considered to be multiple, logical segments?

How about when the routingconnector is used? Will this be multiple segments contained within a single Collector instance?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0x006EA1E5, I would enjoy to continue this discussion on this new PR, but my short response is:

If a OTel Collector pipeline is configured with more than one receiver / exporter, is this then considered to be multiple, logical segments?

yes! A single Collector pipeline can have multiple segments

How about when the routingconnector is used? Will this be multiple segments contained within a single Collector instance?

My new PR includes an example with the spanmetrics connector, but the short answer is also yes. 🙂 A connector is both the end of one segment and the start of the following one. I'm not as familiar with the routing connector so will look into it more to get a better understanding. It looks like it would be a good example to include.

segments with the same meaning as for components. For example, a
agent pipeline segment forwards to a gateway pipeline.

### Detailed design

#### Producer and Consumer instruments

We choose to specify two metric instruments for use counting outcomes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is what was intended but this doesn't read right to me.

Suggested change
We choose to specify two metric instruments for use counting outcomes,
We choose to specify two metric instruments for use in counting outcomes,

one instrument to account for producer outcomes and one to account for
consumer outcomes. In an ideal pipeline, a conservation rule exists
between what goes in (i.e., is consumed) and what goes out (i.e., is
produced). The use of producer and consumer metric instruments is
designed to enable this form of consistency check. When the pipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an accounting perspective, I see why we would want to group received + inserted items (so that this total matches exported + dropped + discarded). But the language here is difficult to reconcile with the external vs internal nature of the operations.

Taking a step back, I agree with the categories you've identified (received, exported, inserted, discarded, dropped), but there are several ways to organize them. This proposal organizes the categories in terms of incremental (received, inserted) vs decremental (discarded, dropped, exported) because it gives us the desirable property that the two instruments should be equal. However, I wonder if these same categories can be modeled in a different way while still giving us the ability to check consistency.

Would it be enough that all categories should sum to 0 by subtracting the decremental operations from the incremental ones? Organized according to real data flow, it would be received - discarded + inserted - (dropped + exported) = 0. I think that by separating the incremental from the decremental, it allows this to work for backends, but alternately, could we require that the decremental categories are reported as negative numbers within the same instrument? To me this seems more intuitive but I'm not sure all backends can handle this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the equation received - discarded + inserted - (dropped + exported) = 0.

I don't think I see a difference between received and inserted. If the telemetry has the component name, it'll be clear whether it was a processor or a receiver, and could be just a semantic question. If we added another attribute to identify the kind of component, or required it to be included in the otel.component attribute, is that enough to distinguish received and inserted?

My thinking, in creating a discarded and dropped designation specifically was to have enough decomposition in the data that you could perform the equation as you wrote it, meaning to count received (receivers), subtract discarded, add received (processors), subtract dropped, leaving exported, which is the thing you'll compare with the next segment, potentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing --

Your suggestion about negative-values, as opposed to the positive-only expression I've used, brings to mind several related topics. I think this is the "best" way to do it from a metrics data model perspective, but I want to point out other ways we can map these metric events.

Consider each item of telemetry that enters the pipeline, has with it an associated trace context. There is:

a. The UpDownCounter formulation -- for every item arriving, add 1. for every item departing, subtract 1. this can tell us the number of items for attribute sets that are symmetric. If we add one for every item that is input/consumed, then subtract one for every item that is output/produced, the resulting tally is a number of in-flight items, but this mapping has to ignore the outcome/success labels for the +1/-1 to balance out.
b. The Span formulation -- when the receiver starts a new request (or the processor inserts some new data), there is an effective span start event (or a log about the arrival of some telemetry) for some items of telemetry. When the outcome is known for those points (having called the follower), there is a span finish event which can be annotated w/ the subtotal for each outcome/success matching the number of items consumed.
c. The LogRecord formulation -- (same as span formula, but one log record per event, vs span start/end events).

I'm afraid to keep adding text to the document, but I would go further with the above suggestions. If we are using metrics to monitor the health of all the SDKs, then we will be missing a signal when the metrics SDK itself is failing. I want the metrics SDK to have a span encapsulating each export operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I see a difference between received and inserted. If the telemetry has the component name, it'll be clear whether it was a processor or a receiver, and could be just a semantic question. If we added another attribute to identify the kind of component, or required it to be included in the otel.component attribute, is that enough to distinguish received and inserted?

Looks like I missed an important part of the design: processors are responsible for counting items only when the number changes while passing through a processor

I was thinking was that we should report "received" and "exported" for processors in order to account for situations where data streams are merged. For example, a collector pipeline with two receivers will combine streams into the first processor, so from that processor's perspective it seems important to report the total "received". Likewise, a similar problems could arise from receivers or exporters used in multiple pipelines.

To use a concrete example:

pipelines:
  logs/1:
    receivers: [R1, R2]
    processors: [P1]
    exporters: [E1, E2]
  logs/2:
    receivers: [R1]
    processors: [P2]
    exporters: [E1]
component received discarded inserted dropped exported
R1 10 - - - -
R2 20 - - - -
P1 30 25 0 - 5
P2 10 10 2 - 2
E1 - - - 0 7
E2 - - - 0 5

In this example, it seems much easier to understand what's going on with P1 when it reports receiving 30.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier design for this proposal included what you're suggesting -- the idea that every processor in the pipeline will independently report complete totals. I think is excessive, there is a lot of redundancy, but the problem can be framed this way. In fact, the current design can be applied the way you describe by a simple redefinition rule -- if you consider a pipeline segment to be an individual receiver, an individual processor, or an individual exporter, you'll get the metrics you're expecting. I think this might even be appropriate in complex pipelines.

The defect I'm aware of, when each processor counts independent totals, is that it becomes easy to aggregate adjacent pipeline segments together, which results in overcounting from a pipeline perspective. This is not a unique problem to processor metrics -- the problem arises when a metric query aggregates more than one collector belonging to the same pipeline, or more than one exporter, or more than one processor. My goal is to make it easy to write queries that encompass

In my current proposal, if you aggregate the total for otelcol_consumed_items grouping by all attributes to a single total, the result will be the number of collector pipeline segments times the number of items. If you restrict your query to one segment (meaning one pipeline and one collector), then the aggregate equals the number of items. This property holds because each segment has one exporter and one receiver.

Since there are multiple processors in a pipeline segment, if each processor counts a total, then the aggregate for that segment will equal the number of processors times the number of items, which is not a useful measure to compare against adjacent pipeline segments. When each processor reports a total, you have to aggregate down to an individual processor to understand its behavior. But then, the logic to check whether the receiver and exporter are consistent, given processor behavior, becomes complicated at best--the aggregation would have to filter the dropped and discarded categories from the processor metrics, and then we'd be able to recover the pipeline equations in this proposal.

This is why I ended up proposing that processors count changes in item count, because the changes in item count aggregate correctly despite multiple processors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining further. The tradeoffs are tough here but if we're defining a segment as having only one receiver and one exporter, it excludes a large percentage (maybe substantial majority?) of collector configurations. Even in a simple pipeline like below change counts for P1 have little meaning.

    receivers: [R1, R2]
    processors: [P1]
    exporters: [E1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the example, specifically.
Why are there two paths between R1 and E1? This fact will make it difficult to monitor the pipeline, because it appears to double the input on purpose. The pipeline equations will show this happening, but it will be up to interpretation to say whether it's on purpose or not.

The way I would monitor the setup in your example is to compute all the paths for which I expect the conservation rule to hold. They are:

(R1 + R2) -> P1 -> E1
(R1 + R2) -> P1 -> E2
R1 -> P2 -> E1

Since two paths lead to E1, the pipeline equations have to be combined. For E1, the equation will include a factor of 2 for R1.

2*Received(R1) + Received(R2) = Dropped(P1) + Dropped(P2) + Exported(E1)

This kind of calculation can be automated and derived from the metrics I'm proposing, if you have the graph. I mean, if you want to know that P1 received 30 items of telemetry, just add R1 and R2's consumed item totals, that should be easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're defining a segment as having only one receiver and one exporter

This is an interesting statement -- I've definitely not been clear on this topic. I didn't mean to say one receiver and one exporter. I meant all receivers and one exporter, because that's where the conservation principle holds. The sum of all receivers reaches every exporter, and that is a pipeline segment, so your second example,

    receivers: [R1, R2]
    processors: [P1]
    exporters: [E1]

is exactly the kind of simple pipeline segment that will be easy to monitor, and it will be easy to monitor even if it has a bunch of processors too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two paths between R1 and E1?

I agree it's likely not useful. It's a contrived example but I wanted to include the full set of possible stream merges and fan outs:

  • single pipeline concerns
    • merge before first processor
    • fan out after last processors
  • inter-pipeline concerns
    • fan out after receiver shared by multiple pipelines
    • merge before exporter shared by multiple pipelines

(R1 + R2) -> P1 -> E1
(R1 + R2) -> P1 -> E2

I think is perhaps where I'm getting tripped up. Could we define a segment as being able to have more than one receiver? This still aggregates correctly. I see why we cannot include multiple exporters, because data is fanned out within the segment, but the fanout that occurs when a receiver is shared between pipelines does not affect the counts for an individual pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to say one receiver and one exporter. I meant all receivers and one exporter, because that's where the conservation principle holds.

I commented before seeing this but I see we arrived at the same conclusion. 👍

is properly functioning and instrumented, we expect the sum of
producer and consumer across all outcomes to be equal.

The alternative, which uses one metric instrument per producer outcome
and one metric instrument per consumer outcome, has known
difficulties. To define a ratio between any one outcome and the total
requires a metric formula defined by all the outcomes. On other hand,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requires a metric formula defined by all the outcomes. On other hand,
requires a metric formula defined by all the outcomes. On the other hand,

it is common practice using OpenTelemetry metrics to aggrgate by
djaglowski marked this conversation as resolved.
Show resolved Hide resolved
attribute. It possible and convenient, when a single metric
djaglowski marked this conversation as resolved.
Show resolved Hide resolved
instrument is used per outcome, to define ratios and build area charts
from a single metric instruments.
djaglowski marked this conversation as resolved.
Show resolved Hide resolved

The use of exclusive counters, one per outcome, is also logically
confusing. Existing OpenTelemetry collector metrics for exporters
have both `sent` and `send_failed` metrics. However, `sent` only
counts success outcomes. A user could easily believe that the failure
ratio is defined as `send_failed / sent`, since (logically) something
has to be sent before the send can fail. The correct failure ratio,
using exclusive counters, is `send_failed / (sent + send_failed)`, but
from experience, users can easily miss this detail. Moreover, when
exclusive counters have been defined in this manner, it is impossible
to define new outcomes, as every formula would need to be updated.

#### Processors count as producers and consumers

As specified, processors are responsible for counting items only when
the number changes while passing through a processor. Processors are
responsible for counting producer outcomes when they remove an item of
telemetry from a request, including:

- `discarded` outcomes, which do not pass the data and return success
- `dropped` outcomes, which do not pass the data and return failure.
- `deferred:dropped` outcomes, which (eventually) do not pass the data
but (immediately) return success.

Data that passes through a processor component, otherwise, should not
be counted as produced because following components are responsible
for counting the outcome. Considering the combined producer outcomes
for a pipeline segment, the total will include `discarded` and
`dropped`, subtotals from processors combined with all potential
outcomes from the exporter component.

Data that is inserted by a processor component, meaning new items of
telemetry that were not consumed from the previous component in the
pipeline, should be counted as consumer outcomes by the component that
inserted them, since the preceding component does not know about them.
Processors that insert items will:

1. Wait until the next component returns from the Consume() operation.
2. Count a consumer outcome according to the return value for the
number of points inserted.

By these rules, a processor that inserts data and then immediately
drops or discards the same data will raise the count equally for both
both consumer and producer metrics.

#### Distinct prefixes for SDKs and Collectors

There is a potential to use the same metric names to describe SDK
producers and Collector producers. However, we find two reasons this
unification should be avoided.

First, we seek to avoid aggregations combining first-class SDKs
producers, SDK consumer components (i.e., bridges) and Collector
producers, which do not have corresponding consumer metrics, and
Collector producers.

Second (really a specific case of the first), we seek to avoid
aggregations that combine OpenTelemetry Collector pipeline metrics
with SDK pipeline metrics in the same process, when the OpenTelemetry
SDK instruments the OpenTelemetry Collector.

#### Pipeline equations

The behavior of a pipeline section consisting of one or more elements
can be reduced to distinct operation categories. In all cases (i.e.,
for both producer and consumer metric instruments), the item of
telemetry has a definite outcome determined by a single component,
with a list of outcomes specified below. Each item also has a
definite success or failure boolean property.

The consumer categories, leading to the first pipeline segment equation:

- **Received**: An item of telemetry was exported from a preceding pipeline segment
- **Inserted**: An item of telemetry was inserted by this pipeline segment

The first equation:

```
Consumed(Segment) == Recieved(Segment) + Inserted(Segment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consumed(Segment) == Recieved(Segment) + Inserted(Segment)
Consumed(Segment) == Received(Segment) + Inserted(Segment)

```

The producer categories, leading to the second pipeline segment equation:

- **Exported**: An attempt was made to export the telemetry to a following pipeline segment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an attempt or rather the data was successfully exported to the follower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An attempt. When the attempt is made, there is at least some expectation that the next pipeline segment has seen the data. Exported includes success and failed cases, and I'm not sure how I can change the words to improve this understanding. I mean to count cases where an RPC was made, essentially, whether it fails or not, because it sets up our expectation for the next segment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just to be clear, which metric do we use for an exporter that failed to even establish a connection to a downstream receiver?
For example, if I configure the collector with an OTLP exporter with a bad endpoint, and the HTTP/GRPC connection cannot be made, the export will "fail" but there is no expectation that any following receiver will ever see the data (so won't count it).
It seems Exported doesn't fit here by your definition. Would it be Dropped?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Exported doesn't fit here by your definition. Would it be Dropped?

Yes

- **Discarded**: Considered success, an item of telemetry was eliminated (i.e., export never attempted)
- **Dropped**: Considered failure, an item of telemetry was eliminated (i.e., export never attempted)

The second equation:

```
Produced(Segment) == Discarded(Segment) + Dropped(Segment) + Exported(Segment)
```

The third equation states that the sum of all items-consumed outcomes
for a pipeline segment equals the sum of all items-produced outcomes
for that segment.

```
Consumed(Segment) == Produced(Segment)
```

These invariants are an idealization, because the consumer and
producer operations happen independently, without ordering
requirements. Nevertheless, after a pipeline has been drained and the
individual components shut down, we expect the producer and consumer
instrument values to match exactly.

#### Producer and consumer outcomes are asymmetric

There are a several of reasons why the producer and consumer outcomes
counted by pipeline monitoring will reflect contradictory information.

For example, when timeouts are configured independently, as for
example when a preceding segment's timeout is smaller than a following
stage's timeout. The preceding exporter's timeout, if less than the
following exporter's timeout may cause consumer `timeout` outcomes
without corresponding producer `timeout` outcomes.

#### Outcomes may be deferred

In some configurations, Collector pipeline segments have asynchronous
elements, in an arrangement where the `Consume()` operation called by
the preceder on the follower returns success, and responsibility for
delivery transfers to the follower. When deferred outcomes are in
use, we consumer metrics will generally indicate 100% `accepted`
outcomes.

For these cases, exporters are expected to use the `deferred:` outcome
categories to signal to monitoring systems especially the failure
outcomes that were not seen by producers.

#### WIP

Considering an Exporter and Receiver pair connecting two OpenTelemetry
Collector pipeline segments, we expect:

```
Exported(Preceder) == Received(Follower)
```

Dropped and discarded are special

#### Resource-exhausted is special

Be specific about this one, it impacts SLOs. Dot apply to "this"
component.

### WIP

The specified counter names are:

- `otelcol_consumed_items`
- `otelcol_produced_items`
- `otelsdk_produced_items`


### Recommended conventional attributes

- `otel.success` (boolean): This is true or false depending on whether the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is otel being used to namespace these attributes so they wouldn't conflict with other attribute names? I think we should add some more clarity in the name to make it clear these are attributes of an otel pipeline: how do you feel about the otel.pipeline. prefix?

component considers the outcome a success or a failure.
- `otel.outcome` (string): This describes the outcome in a more specific
way than `otel.success`, with recommended values specified below.
- `otel.signal` (string): This is the name of the signal (e.g., "logs",
"metrics", "traces")
- `otel.component` (string): Name of the component in a pipeline.
- `otel.pipeline` (string): Name of the pipeline in a collector.

### Specified `otel.outcome` attribute values

The `otel.outcome` attribute indicates extra information about a
success or failure. A set of standard conventional attribute values
is supplied and is considered a closed set. If these outcomes do not
accurately explain the reason for a success or failure outcome, they
SHOULD be extended by OpenTelemetry.

For success=true:

- `accepted`: Indicates a normal, synchronous request success case.
The item was consumed by the next stage of the pipeline, which
returned success. Note the item could have been deferred by a
subsequent component, but as far as this component knows, the
request successful.
- `discarded`: Indicates a successful outcome in which the next stage
of the pipeline does not handle the event, as by a sampling
processor.
- `deferred:<failure outcome>`: Deferred cases are where the
caller receives a success response and the true outcome is failure,
but this is not known until later. The item is counted as
`deferred:` combined with the failure outcome that would otherwise
have been counted.

For success=false, transient and potentially retryable cases:

- `dropped`: The component introduced an original failure and did not
send to the next stage in the pipeline.
- `timeout`: The item was in the process of being sent but the request
timed out, or its deadline was exceeded. In this case, it
undetermined whether the consuming pipeline saw the item or not.
- `exhausted`: The item was handled by the next stage of the pipeline,
which returned an error code indicating that it was overloaded. If
the resource being exhausted is local and the item was not handled
by the next stage of the pipeline, record the item `dropped` and
return a resource-exhausted status code to the producer, who will
record a `exhausted` outcome.
- `retryable`: The item was handled by the next stage of the pipeline,
which returned a retryable error status not covered by any of the
above values.

For success=false, permanent cases:

- `rejected`: The item was handled by the next stage of the pipeline,
which returned a permanent error status or partial success status
indicating that some items could not be accepted.
- `unknown`: May be used when the component is suppressing errors and
not actually counting successes and failures. As a special case,
the outcome `deferred:unknown` indicates that a success response
was given and no information about the actual outcome is available.

##



#### Success, Outcome matrix

| Outcome | Export Attempted? | Caller Success? | Metrics Success? | Meaning |
|--------------------|-------------------|-----------------|------------------|---------------------------------------------------------------|
| accepted | true | true | true | Data (successfully) sent |
| discarded | false | true | true | Data (successfully) discarded |
| dropped | false | false | false | Request never started, error returned |
| timeout | true | false | false | Request started, timed out, error returned |
| exhausted | true | false | false | Request started, insufficient resources, error returned |
| retryable | true | false | false | Request started, retryable error status, error returned |
| rejected | true | false | false | Request completed, permanent error status, error returned |
| deferred:dropped | false | true | false | Request never started, error NOT returned |
| deferred:timeout | true | true | false | Request started, timed out, error NOT returned |
| deferred:exhausted | true | true | false | Request started, insufficient resources, error NOT returned |
| deferred:retryable | true | true | false | Request started, retryable error status, error NOT returned |
| deferred:rejected | true | true | false | Request completed, permanent error status, error NOT returned |
| deferred:unknown | true | true | false | Request has unknown outcome, error NOT returned |

#### Examples of each outcome

##### Success, Accepted

This is the common success case. The item(s) were sent to the next
stage in the pipeline while blocking the producer.

##### Success, Dropped

A processor was configured with instructions not to pass certain data.

##### Success, Deferred-Accepted

A component returned success to its producer, and later the outcome
was successful.

##### Failure, Dropped and Success, Deferred-Dropped

(If deferred: A component returned success to its producer, then ...)

The component never sent the item(s) due to limits in effect. For
example, shutdown was ordered and the queue could not be drained in
time due to a limit on parallelism.

##### Failure, Deadline exceeded and Success, Deferred-Deadline exceeded

(If deferred: A component returned success to its producer, then ...)

The component attempted sending the item(s), but the item(s) did not
succeed before the deadline expired. If there were attempts to retry,
this is outcome of the final attempt.

##### Failure, Resource exhausted and Success, Deferred-Resource exhausted

(If deferred: A component returned success to its producer, then ...)

The component attempted sending the item(s), but the consumer
indicated its (or its consumers') resources were exceeded. If there
were attempts to retry, this is outcome of the final attempt.

##### Failure, Retryable and Success, Deferred-Retryable

(If deferred: A component returned success to its producer, then ...)

A component returned success to its producer, and then it attempted
sending the item(s), but the consumer indicated some kind of transient
condition other than deadline- or resource-related (e.g., connection
not accepted). If there were attempts to retry, this is outcome of
the final attempt.

##### Failure, Rejected and Success, Deferred-Rejected

(If deferred: A component returned success to its producer, then ...)

A compmnent returned success to its producer, and then it attempted
sending the item(s), but the consumer returned a permanent error.