Skip to content

Don't track as error an HA tracker CAS operation intentionally aborted #3745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pracucci
Copy link
Contributor

What this PR does:
The PR #3668 introduced a little regression causing intentionally aborted CAS operations (by HA tracker) to be tracked as 500 in the cortex_kv_request_duration_seconds metric instead of 200. The reason of this subtle bug is in the introduction of replicasNotMatchError (which can be returned by the CAS() operation function), while the previous version was returning a httpgrpc.Errorf(http.StatusAccepted) which is correctly converted into a 200 error code by pkg/ring/kv/metrics.go.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>

// GetSumOfHistogramSampleCount returns the sum of samples count of histograms matching the provided metric name
// and optional label matchers. Returns 0 if no metric matches.
func GetSumOfHistogramSampleCount(families []*dto.MetricFamily, metricName string, matchers labels.Selector) uint64 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unit tests I've added in the distributor I couldn't use the typical testutil.GatherAndCompare() because the histogram tracks the request duration (which is unpredictable). Since we don't have a way to just test the histogram sample count, I've introduced this function to easy it (and future tests too).

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting and fixing this!

@pracucci pracucci merged commit 75ec1ff into cortexproject:master Jan 26, 2021
@pracucci pracucci deleted the fix-ha-tracker-error-handling branch January 26, 2021 10:39
@gouthamve
Copy link
Contributor

This should be cherry-picked to 1.7.0 branch or not? @khaines @pracucci

@khaines
Copy link
Contributor

khaines commented Jan 26, 2021

@gouthamve I'm of the opinion it should be brought into the 1.7.0 branch.

khaines pushed a commit that referenced this pull request Jan 26, 2021
#3745)

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Ken Haines <khaines@microsoft.com>
@khaines
Copy link
Contributor

khaines commented Jan 26, 2021

#3746 - adds this change to the 1.7 Branch

khaines added a commit that referenced this pull request Jan 26, 2021
#3745) (#3746)

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Ken Haines <khaines@microsoft.com>

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants