Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of improve client RPC metrics consistency into release/1.17.x #19843

Conversation

hc-github-team-consul-core
Copy link
Contributor

Backport

This PR is auto-generated from #19721 to be assessed for backporting due to the inclusion of the label backport/1.17.

The below text is copied from the body of the original PR.


The client.rpc metric now excludes internal retries for consistency with client.rpc.exceeded and client.rpc.failed. All of these metrics now increment at most once per RPC method call, allowing for accurate calculation of failure / rate limit application occurrence.

Additionally, if an RPC fails because no servers are present, client.rpc.failed is now incremented.

Note: The client.rpc.failed metric used to increment on internal retries as well, but that was changed in this commit (CC @kisunji).

Testing & Reproduction steps

Manual testing conducted by reviewer (@rboyer).

  1. Set up Consul cluster.
  2. Run curl -sL -XPUT 172.17.0.3:8500/v1/catalog/register -d'{"Node":"foo", "Address":"4.4.4.4", "ID":"blah"}'. Note: this should succeed on the first try.
  3. Run curl -sL '172.17.0.3:8500/v1/catalog/nodes?filter=^^^''. Note: this should fail and have several internal retries.
  4. Read relevant metrics: $ curl -sL 172.17.0.3:8500/v1/agent/metrics | jq '.Counters[] | select (.Name | test("^consul\\.client\\.rpc"))'

Expect:

  • consul.client.rpc: 2
  • consul.client.rpc.failed: 1

Actual: Matches expectation

{
  "Name": "consul.client.rpc",
  "Count": 2,
  "Rate": 0.2,
  "Sum": 2,
  "Min": 1,
  "Max": 1,
  "Mean": 1,
  "Stddev": 0,
  "Labels": {}
}
{
  "Name": "consul.client.rpc.failed",
  "Count": 1,
  "Rate": 0.1,
  "Sum": 1,
  "Min": 1,
  "Max": 1,
  "Mean": 1,
  "Stddev": 0,
  "Labels": {
    "server": "651d982bed6b"
  }
}

PR Checklist

  • updated test coverage: no - manual test performed, no existing applicable metrics test to modify
  • external facing docs updated: Will be updated in a separate PR if applicable
  • appropriate backport labels added
  • not a security concern

Additional Notes

Added the pr/no-metrics-test label since the affected metric predates that CI check, currently has no tests, and correct behavior was manually verified as a part of this review.


Overview of commits

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@vercel vercel bot temporarily deployed to Preview – consul December 6, 2023 18:28 Inactive
@jkirschner-hashicorp jkirschner-hashicorp merged commit 1a5ace0 into release/1.17.x Dec 6, 2023
88 of 90 checks passed
@jkirschner-hashicorp jkirschner-hashicorp deleted the backport/improve-client-rpc-metrics-consistency/thankfully-vital-duck branch December 6, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants