Skip to content

Cloud output v2 #3117

Closed
Closed
@codebien

Description

Context

#2954 introduces the new experimental Coud output with a Protobuf-based protocol.

Memory usage

After the first iteration, the memory usage is higher than required. Especially for the Trend metrics is very easy to saturate the bandwidth in a range from tons of KiloBytes up to the remote limit (1 MB).

We also decided to denormalize some fields to reduce the workload and keep the implementation simple on the remote server but the load generated on the client is high, we should revisit this decision.

Fault tolerance

The current flush process could be more fault tolerant, it doesn't retry on failures.

Validation

__name__ and test_run_id are reserved labels for the remote service and if a test also sets them then there are conflicts generating unexpected behavior for the user. A more dev-friendly UX should be implemented.

Proposal

We identified some actions that should drive us to the goal:

  • A more compact Protobuf representation for Histogram.
  • Split in multiple requests when the flush process gets a number of time series higher than the MaxMetricSamplesPerPackage variable.
  • Normalize as MetricSet's fields the common fields across time series.
  • Fault-tolerant flush operation.
  • Exclude __name__ and test_run_id from the allowed tag names.

Acceptance criteria

Change the Cloud output default version to 2.

Worklog

Nice to have (in case we need to reduce the scope)

Preview Give feedback
  1. codebien
  2. codebien
  3. cloud lower prio tests
    olegbespalov
  4. codebien
  5. cloud enhancement performance

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions