MON-1666: CMO deployment: pass enabled-remote-write #1416

jan--f · 2021-10-06T13:23:12Z

in order to switch telemeter over to Prometheus remote write.

Signed-off-by: Jan Fajerski jfajersk@redhat.com

I added CHANGELOG entry for this change.
No user facing changes, so no entry in CHANGELOG was needed.

in order to switch telemeter over to Prometheus remote write. Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

openshift-ci · 2021-10-06T13:23:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jan--f]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

arajkumar · 2021-10-06T17:39:04Z

/retest

jan--f · 2021-10-07T06:46:53Z

/hold

jan--f · 2021-10-07T07:14:31Z

Generally this seems to work. However some wrinkles need to be ironed out.

An origin test needs a change.
The current telemeter endpoint rejects (at least most of) the remote write requests with status 413 : request too big. We might look at either remote write tuning or relax the endpoint restrictions a bit.

cc @simonpasquier @ianbillett

bill3tt · 2021-10-11T14:26:49Z

Let me repost our DM here for completeness...

The default value of the limit_bytes flag is 512k - in prod we set it to 5.1M.
IIRC we don't actually use the receive endpoint in telemeter at the moment - we use the upload endpoint which is a telemeter specific thing. See the /receive panel in this dashboard

simonpasquier · 2021-10-13T09:37:19Z

I've noticed from the logs that Prometheus sends metadata by default but I presume that we don't want this for telemeter. I believe that it should be turned off explicitly in the RemoteWrite spec.

simonpasquier · 2021-10-13T09:52:19Z

IIUC the /api/v1/receive endpoint has a size limit of 15k while the Prometheus could send up to 10,000 samples. That would explain the "request too big" errors.

jan--f · 2021-10-13T12:44:10Z

I've noticed from the logs that Prometheus sends metadata by default but I presume that we don't want this for telemeter. I believe that it should be turned off explicitly in the RemoteWrite spec.

Yeak makes sense. Tbh I'm not 100% sure yet what the impact is and whether telemeter can actually make use of this metadata. I'll investigate more, but until then lets turn it off.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

jan--f · 2021-10-14T13:58:12Z

Going by the prometheus_remote_storage_bytes_total metric of prometheus, it seems like the requests should be around 70kB large (we have roughly 500kB over 7 requests and ~413kB over 6 requests).

Should we start with a 128kB request limit on the telemeter side?

matej-g · 2021-10-14T15:20:21Z

Should we start with a 128kB request limit on the telemeter side?

Nice investigation 👍 That limit sounds to be ample and reasonable.

simonpasquier · 2021-10-15T09:28:35Z

Looking at the number of sent samples, we're at about 2k samples per minute. Knowing that the remote write is configured with a maximum number of samples per send = 10k and a batch deadline of 1m, it means that in the CI runs, we never reach the 10k limit. "Real" environments might generate more samples (e.g. more OLM operators = more telemetry data) and we may hit the 10k samples per send limit, meaning larger requests. I think that we should account for it by increasing the request limit on the telemeter server side (even more than 128k) and/or reducing the number of samples per send.

jan--f · 2021-10-15T14:12:29Z

Agreed, the idea is to set a "reasonable" default in telemeter and deploy that in the staging environment. Then for production the limit will be explicitly set and likely a lot higher. This would be similar to how the upload endpoint is treated (512kB default and 5.1MB in production.

jan--f · 2021-11-10T09:53:15Z

/retest

jan--f · 2021-11-15T13:28:32Z

/retest

jan--f · 2021-11-16T07:42:15Z

/retest

openshift-ci · 2021-11-16T09:21:11Z

@jan--f: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-agnostic-operator	`27fabbc`	link	true	`/test e2e-agnostic-operator`
ci/prow/e2e-agnostic	`27fabbc`	link	true	`/test e2e-agnostic`
ci/prow/e2e-aws-single-node	`27fabbc`	link	false	`/test e2e-aws-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2022-02-27T11:09:11Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

jan--f · 2022-03-15T12:20:47Z

/close
Needs further research

openshift-ci · 2022-03-15T12:21:17Z

@jan--f: Closed this PR.

In response to this:

/close
Needs further research

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CMO deployment: pass enabled-remote-write

a4a1ae6

in order to switch telemeter over to Prometheus remote write. Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

openshift-ci bot requested review from arajkumar and dgrisonnet October 6, 2021 13:23

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2021

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2021

telemeter: don't send metadata when using remote_write

27fabbc

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

jan--f mentioned this pull request Nov 22, 2021

prometheus: adjust telemetry test for remote_write switch openshift/origin#26631

Merged

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022

openshift-ci bot closed this Mar 15, 2022

simonpasquier mentioned this pull request Sep 6, 2022

MON-2807: Use bearer token file for remote write authentication with telemeter #1733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MON-1666: CMO deployment: pass enabled-remote-write #1416

MON-1666: CMO deployment: pass enabled-remote-write #1416

jan--f commented Oct 6, 2021

openshift-ci bot commented Oct 6, 2021

arajkumar commented Oct 6, 2021

jan--f commented Oct 7, 2021

jan--f commented Oct 7, 2021

bill3tt commented Oct 11, 2021

simonpasquier commented Oct 13, 2021 •

edited

Loading

simonpasquier commented Oct 13, 2021

jan--f commented Oct 13, 2021

jan--f commented Oct 14, 2021

matej-g commented Oct 14, 2021 •

edited

Loading

simonpasquier commented Oct 15, 2021

jan--f commented Oct 15, 2021

jan--f commented Nov 10, 2021

jan--f commented Nov 15, 2021

jan--f commented Nov 16, 2021

openshift-ci bot commented Nov 16, 2021 •

edited

Loading

openshift-bot commented Feb 27, 2022

jan--f commented Mar 15, 2022

openshift-ci bot commented Mar 15, 2022

MON-1666: CMO deployment: pass enabled-remote-write #1416

MON-1666: CMO deployment: pass enabled-remote-write #1416

Conversation

jan--f commented Oct 6, 2021

openshift-ci bot commented Oct 6, 2021

arajkumar commented Oct 6, 2021

jan--f commented Oct 7, 2021

jan--f commented Oct 7, 2021

bill3tt commented Oct 11, 2021

simonpasquier commented Oct 13, 2021 • edited Loading

simonpasquier commented Oct 13, 2021

jan--f commented Oct 13, 2021

jan--f commented Oct 14, 2021

matej-g commented Oct 14, 2021 • edited Loading

simonpasquier commented Oct 15, 2021

jan--f commented Oct 15, 2021

jan--f commented Nov 10, 2021

jan--f commented Nov 15, 2021

jan--f commented Nov 16, 2021

openshift-ci bot commented Nov 16, 2021 • edited Loading

openshift-bot commented Feb 27, 2022

jan--f commented Mar 15, 2022

openshift-ci bot commented Mar 15, 2022

simonpasquier commented Oct 13, 2021 •

edited

Loading

matej-g commented Oct 14, 2021 •

edited

Loading

openshift-ci bot commented Nov 16, 2021 •

edited

Loading