-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MON-1666: CMO deployment: pass enabled-remote-write #1416
Conversation
in order to switch telemeter over to Prometheus remote write. Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jan--f The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/hold |
Generally this seems to work. However some wrinkles need to be ironed out.
cc @simonpasquier @ianbillett |
Let me repost our DM here for completeness... The default value of the limit_bytes flag is 512k - in prod we set it to 5.1M. |
I've noticed from the logs that Prometheus sends metadata by default but I presume that we don't want this for telemeter. I believe that it should be turned off explicitly in the RemoteWrite spec. |
IIUC the |
Yeak makes sense. Tbh I'm not 100% sure yet what the impact is and whether telemeter can actually make use of this metadata. I'll investigate more, but until then lets turn it off. |
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Nice investigation 👍 That limit sounds to be ample and reasonable. |
Looking at the number of sent samples, we're at about 2k samples per minute. Knowing that the remote write is configured with a maximum number of samples per send = 10k and a batch deadline of 1m, it means that in the CI runs, we never reach the 10k limit. "Real" environments might generate more samples (e.g. more OLM operators = more telemetry data) and we may hit the 10k samples per send limit, meaning larger requests. I think that we should account for it by increasing the request limit on the telemeter server side (even more than 128k) and/or reducing the number of samples per send. |
Agreed, the idea is to set a "reasonable" default in telemeter and deploy that in the staging environment. Then for production the limit will be explicitly set and likely a lot higher. This would be similar to how the |
/retest |
2 similar comments
/retest |
/retest |
@jan--f: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/close |
@jan--f: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
in order to switch telemeter over to Prometheus remote write.
Signed-off-by: Jan Fajerski jfajersk@redhat.com