Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better observability to queryReadiness #5946

Merged
merged 9 commits into from
Apr 28, 2022

Conversation

DylanGuedes
Copy link
Contributor

@DylanGuedes DylanGuedes commented Apr 18, 2022

What this PR does / why we need it:
Adds a new query_readiness_duration_seconds metric, that reports query readiness duration of a tablemanager/index gateway instance. We should use it later to report performance against the ring mode.

Adds a new usersToBeQueryReadyForTotal metric, that reports number of users involved in the query readiness operation. We should use it later to correlate number of users with the query readiness duration.

Logs the users involved in the query readiness operation. Will be especially useful for the ring mode so that we can track down the users assigned to each instance.

Logs the duration spent in the query readiness operation.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:
N/A

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

- Add a new `query_readiness_duration_seconds` metric, that reports
  query readiness duration of a tablemanager/index gateway instance. We
  should use it later to report performance against the ring mode
- Add a new `usersToBeQueryReadyForTotal` metric, that reports number of
  users involved in the query readiness operation. We should use it
  later to correlate number of users with the query readiness duration.
@DylanGuedes DylanGuedes marked this pull request as ready for review April 18, 2022 14:33
@DylanGuedes DylanGuedes requested a review from a team as a code owner April 18, 2022 14:33
@DylanGuedes
Copy link
Contributor Author

Is the plan to do this point_up soon?

I'm working on it right now, I'm just not sure how long it will take to have this merged as I'm having some difficulties in the implementation 😅 do you think it makes sense to remove this metric for now and only add it in a more appropriate time?

@cstyan
Copy link
Contributor

cstyan commented Apr 20, 2022

Personally I would err on the side of adding it earlier rather than later. It's not that big of a deal to remove a metric later on (especially given that the metric wouldn't make it into a big release for quite a while). OTOH, it can be a pain in the ass to build and deploy a new image in a live environment just because you don't have a metric you need to debug something during an incident.

@DylanGuedes
Copy link
Contributor Author

DylanGuedes commented Apr 20, 2022

Personally I would err on the side of adding it earlier rather than later. It's not that big of a deal to remove a metric later on (especially given that the metric wouldn't make it into a big release for quite a while). OTOH, it can be a pain in the ass to build and deploy a new image in a live environment just because you don't have a metric you need to debug something during an incident.

I agree! But in the end I decided to remove it just because I already added a log message that will serve a similar purpose (i.e: we can derive/debug how is the balancing going with Loki instead of Prometheus).

Copy link
Contributor

@JordanRushing JordanRushing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- This is necessary since go-kit doesn't support array type.
@DylanGuedes DylanGuedes changed the title Add more queryReadiness metrics to the IndexGateway Add better observability to queryReadiness Apr 28, 2022
- It is redundant with a recently added log line.
@DylanGuedes
Copy link
Contributor Author

fyi: after some suggestion I decided to remove the queryReadinessDuration metric as it was redundant with one of the new log lines.

Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants