Skip to content

Commit

Permalink
Receive: fix thanos_receive_write_{timeseries,samples} stats (#7643)
Browse files Browse the repository at this point in the history
* Revert "Receive: fix stats (#7373)"

This reverts commit 66841fb.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

* Receive: fix `thanos_receive_write_{timeseries,samples}` stats

There are two path data can be written to a receiver: through the HTTP
or the gRPC endpoint, and `thanos_receive_write_{timeseries,samples}` only
count the number of timeseries/samples received through the HTTP
endpoint.

So, there is no risk that a sample will be counted twice, once as a
remote write and once as a local write. On the other hand, we still need
to account for the replication factor, and only count local writes is
not enough as there might be no local writes at all (e.g. in RouterOnly
mode).

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

---------

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
  • Loading branch information
cincinnat authored Sep 3, 2024
1 parent 1c5e7f1 commit 7465177
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 17 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#7592](https://github.com/thanos-io/thanos/pull/7592) Ruler: Only increment `thanos_rule_evaluation_with_warnings_total` metric for non PromQL warnings.
- [#7614](https://github.com/thanos-io/thanos/pull/7614) *: fix debug log formatting.
- [#7492](https://github.com/thanos-io/thanos/pull/7492) Compactor: update filtered blocks list before second downsample pass.
- [#7643](https://github.com/thanos-io/thanos/pull/7643) Receive: fix thanos_receive_write_{timeseries,samples} stats
- [#7644](https://github.com/thanos-io/thanos/pull/7644) fix(ui): add null check to find overlapping blocks logic
- [#7679](https://github.com/thanos-io/thanos/pull/7679) Query: respect store.limit.* flags when evaluating queries

Expand Down
2 changes: 1 addition & 1 deletion docs/components/receive.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ Please see the metric `thanos_receive_forward_delay_seconds` to see if you need

The following formula is used for calculating quorum:

```go mdox-exec="sed -n '990,999p' pkg/receive/handler.go"
```go mdox-exec="sed -n '999,1008p' pkg/receive/handler.go"
func (h *Handler) writeQuorum() int {
// NOTE(GiedriusS): this is here because otherwise RF=2 doesn't make sense as all writes
// would need to succeed all the time. Another way to think about it is when migrating
Expand Down
41 changes: 25 additions & 16 deletions pkg/receive/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -681,31 +681,40 @@ type remoteWriteParams struct {
alreadyReplicated bool
}

func (h *Handler) gatherWriteStats(localWrites map[endpointReplica]map[string]trackedSeries) tenantRequestStats {
func (h *Handler) gatherWriteStats(rf int, writes ...map[endpointReplica]map[string]trackedSeries) tenantRequestStats {
var stats tenantRequestStats = make(tenantRequestStats)

for er := range localWrites {
for tenant, series := range localWrites[er] {
samples := 0
for _, write := range writes {
for er := range write {
for tenant, series := range write[er] {
samples := 0

for _, ts := range series.timeSeries {
samples += len(ts.Samples)
}
for _, ts := range series.timeSeries {
samples += len(ts.Samples)
}

if st, ok := stats[tenant]; ok {
st.timeseries += len(series.timeSeries)
st.totalSamples += samples
if st, ok := stats[tenant]; ok {
st.timeseries += len(series.timeSeries)
st.totalSamples += samples

stats[tenant] = st
} else {
stats[tenant] = requestStats{
timeseries: len(series.timeSeries),
totalSamples: samples,
stats[tenant] = st
} else {
stats[tenant] = requestStats{
timeseries: len(series.timeSeries),
totalSamples: samples,
}
}
}
}
}

// adjust counters by the replication factor
for tenant, st := range stats {
st.timeseries /= rf
st.totalSamples /= rf
stats[tenant] = st
}

return stats
}

Expand Down Expand Up @@ -736,7 +745,7 @@ func (h *Handler) fanoutForward(ctx context.Context, params remoteWriteParams) (
return stats, err
}

stats = h.gatherWriteStats(localWrites)
stats = h.gatherWriteStats(len(params.replicas), localWrites, remoteWrites)

// Prepare a buffered channel to receive the responses from the local and remote writes. Remote writes will all go
// asynchronously and with this capacity we will never block on writing to the channel.
Expand Down

0 comments on commit 7465177

Please sign in to comment.