vector gauge to distribution aggregation mode


### Discussed in https://github.com/vectordotdev/vector/discussions/23970

<div type='discussions-op-text'>

<sup>Originally posted by **jlambatl** October  9, 2025</sup>
### Question

Hello,

**Problem Statement**
Currently, Vector emits individual metric events, which can result in high event volumes, especially in high-cardinality environments. This impacts:

* Network bandwidth and storage costs
* Downstream system performance
* Processing overhead in analytics pipelines

**Proposed Solution**

Add a new Distribution aggregation mode to the aggregate transform that:

* Collects multiple absolute metrics with the same series (name, tags, etc.) during a flush interval. 
* Aggregates them into a single Distribution metric containing all individual values as samples. 
* Ignores incremental metrics (counters) to maintain their semantic meaning. 
* Reduces payload count while increasing payload density.

**Benefits**

* Volume Reduction: N individual metrics → 1 distribution metric per series
* Preserved Accuracy: All original values retained as distribution samples
* Downstream Flexibility: Analytics databases can convert to histograms at query time
* Cost Optimisation: Lower network/storage costs due to reduced event count

**Use Cases**

* Time-series databases that support native distribution types (e.g., ClickHouse, InfluxDB)
* Analytics workflows requiring statistical analysis across measurement windows
* High-cardinality metrics where overall event volume reduction is critical
* Cost-sensitive environments where network/storage efficiency matters

**Example**

Before (3 separate events):
```
gauge_cpu_usage{host="web01"} 45.2
gauge_cpu_usage{host="web01"} 47.1  
gauge_cpu_usage{host="web01"} 44.8
```

After (1 distribution event):

```
gauge_cpu_usage{host="web01"} distribution{samples: [45.2, 47.1, 44.8]}
```

This approach is particularly well-suited for analytics databases that can convert distributions into equal-width histogram buckets or other aggregations at query time, providing flexibility for various downstream use cases.

We can contribute to this, but as the contribution guidelines mention, I wanted to raise the question of seeking collaboration before coding the solution.

Thanks in advance.

### Vector Config

Using the existing [TOML example](https://vector.dev/docs/reference/configuration/transforms/aggregate/#example-configurations) with the proposed mode `distribution`.

```toml
[transforms.my_transform_id]
type = "aggregate"
inputs = [ "my-source-or-transform-id" ]
interval_ms = 10_000
mode = "distribution"

```

### Vector Logs

_No response_</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

vector gauge to distribution aggregation mode #24001

Discussed in #23970

Question

Vector Config

Vector Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

vector gauge to distribution aggregation mode #24001

Description

Discussed in #23970

Question

Vector Config

Vector Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions