Should the stage
tag for component_errors_total
be removed?
#17538
Labels
domain: observability
Anything related to monitoring/observing Vector
source: internal_metrics
Anything `internal_metrics` source related
type: tech debt
A code change that does not add user value.
Context
During some internal discussion around the purpose of
stage
tag for error events, the question came up around whether or not that tag should be removed entirely.The error event already dictates specifying an
error_type
tag, which, in practice, is already fairly descriptive:out_of_order
,acknowledgement_failed
,invalid_metric
, and so on. Thestage
tag is ostensibly meant to indicate where in the component the error occurred --receiving
,processing
, andsending
mapping roughly to a component getting events in, processing them as necessary, and then sending those events out -- but in general, the specificity oferror_type
can generally inform what stage the error occurred in.Even further, the
stage
tag generally doesn't actually facet the metrics, which is to say, during a cursory search, I could not find an example where the sameerror_type
is emitted in a component but at multiple, distinct stages. We're specifying and emittingstage
seemingly for no actual benefit to breaking down the error metric any further than theerror_type
.Question / Proposal
Should we simply drop the
stage
tag from the specification for component errors? This would greatly simplify the emission of those events in the codebase, and remove an unnecessary additional tag that otherwise pollutes the tag set for internal metrics.The text was updated successfully, but these errors were encountered: