Skip to content

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #3514

Proposed Changes

  • Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
  • Allow configuration of the monitoring endpoint frequency using --monitoring-endpoint-frequency N where N is a value in seconds.

@michaelsproul michaelsproul added ready-for-review The code is ready for review low-hanging-fruit Easy to resolve, get it before someone else does! v3.1.2 Release after v3.1.0 (formerly v3.1.1) labels Aug 31, 2022
@winksaville
Copy link
Contributor

Please add some documentation, I searched the "Lighthouse Book" and all I could find was this and that gives no details as to what metric information nor the frequency.

When I say "what metric information" the link would be real helpful in that documentation. But some additional verbiage on "exactly" what subset lighthouse is providing when is needed. Also, it would be nice to clarify when to use '--validator-monitor-autoand--monitoring-endpoint`.

For example, when using --monitoring-endpoint on vn it sends its metrics and "system" metrics. And on the bn it send its metrics and "system" metrics. Thus the monitoring endpoint has redundant "system" metrics and twice as many requests, that seems wasteful. I wonder if adding validator-monitoring-auto would reduce the requests and eliminate the redundant "system" metrics?

Hopefully, this makes sense :)

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pawanjay176
Copy link
Member

@winksaville The monitoring-endpoint is an externally provided service. Lighthouse just confirms to the specifications of the data format described in the link you mentioned. When adding this feature, the assumption was that the monitoring service (e.g. beaconcha.in) would be providing the documentation on how to run this service on the different client implementations https://kb.beaconcha.in/beaconcha.in-explorer/mobile-app-less-than-greater-than-beacon-node

The validator monitor is a Lighthouse specific feature that provides info about specified validators. The documentation we have here https://lighthouse-book.sigmaprime.io/validator-monitoring.html is for that feature.

Happy to add some docs linking to the beaconcha.in service and clarifying that it is different from the lighthouse validator monitor if that would reduce confusion :)

Thus the monitoring endpoint has redundant "system" metrics and twice as many requests, that seems wasteful.

This was done intentionally as we also have setups where the bn and vc are on separate machines. We could add a flag to the vc to prevent redundant sending of system metrics.

@michaelsproul
Copy link
Member Author

.long("monitoring-endpoint-period")
.value_name("SECONDS")
.help("Defines how many seconds to wait between each message sent to \
the monitoring-endpoint. Default: 60s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

.help(format("Defines how many seconds to wait between each message sent to the monitoring-endpoint. Default: {}", monitoring_api::DEFAULT_UPDATE_DURATION))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't easily put a dynamic value here. The problem is that clap requires the arguments to live as long as the entire App struct, and a format!() string only lives as long the scope in which it's defined. We could plumb the dynamic values around (unwieldy) or do what I was doing before where we leak the string. Leaking the string is quite a nasty hack with a small runtime downside, so I'd prefer to just hardcode this and risk it going out of date. Sometimes worse is better.

I'm quite disappointed in clap for this and other reasons, so I'd support a PR in future to move us away from it completely. We're likely to overhaul all our CLI parsing when adding config file support (#3079), and can decide whether to keep clap and leak strings, or migrate to something better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to update this, string. maybe something like:

the monitoring-endpoint. Default: 60s") // Update when DEFAULT_UPDATE_DURATION changes

Copy link
Contributor

@winksaville winksaville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM one minor tweak if you like and thanks for adding the note to validator-monitoring.md! And a big thanks for doing this PR.

@@ -16,7 +16,7 @@ use types::*;
pub use types::ProcessType;

/// Duration after which we collect and send metrics to remote endpoint.
pub const UPDATE_DURATION: u64 = 60;
pub const DEFAULT_UPDATE_DURATION: u64 = 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment something like:

pub const DEFAULT_UPDATE_DURATION: u64 = 60; // Search for this and update hard-coded strings.

.long("monitoring-endpoint-period")
.value_name("SECONDS")
.help("Defines how many seconds to wait between each message sent to \
the monitoring-endpoint. Default: 60s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to update this, string. maybe something like:

the monitoring-endpoint. Default: 60s") // Update when DEFAULT_UPDATE_DURATION changes

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 5, 2022
@michaelsproul
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Sep 5, 2022
## Issue Addressed

Closes #3514

## Proposed Changes

- Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
- Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.
@bors bors bot changed the title Configurable monitoring endpoint frequency [Merged by Bors] - Configurable monitoring endpoint frequency Sep 5, 2022
@bors bors bot closed this Sep 5, 2022
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
## Issue Addressed

Closes sigp#3514

## Proposed Changes

- Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
- Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-hanging-fruit Easy to resolve, get it before someone else does! ready-for-merge This PR is ready to merge. v3.1.2 Release after v3.1.0 (formerly v3.1.1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants