Open
Description
openedon May 27, 2022
Here's the end-user flow we'd like:
- Through the console (or perhaps the CLI?) a user can view metrics for some category of information. For example: "show me the metrics for HTTP endpoint latency. Show me metrics for disk/network usage. etc.
- Point to consider: "operator" usage vs "end-user" usage -- each may see different metrics. We will want a different set of ACLs, at bare minimum, even if the Nexus implementation is mechanically similar.
- Open question: how many endpoints? what query parameters are exposed? What would be useful for console?
- This should trigger a request to the external Nexus API, which itself should be able to make requests to Clickhouse
- Presumably, Nexus will act as an ACL validator + proxy to Clickhouse. Hopefully not too much post-processing of data is necessary.
What already exists:
- There's machinery around oximeter to collect metrics from services, and store such information within Clickhouse itself. Although we should definitely add more metrics here (see: Upstairs disk stats -> Oximeter crucible#341 as an example), this half of the problem space is considered out-of-scope for this issue.
- Since we already have HTTP endpoint latency wired up and dumped into Clickhouse, this may be an easy "first target". For utility, however, user-visible metrics (instance stats, disk/networking metrics, etc) will be high-value targets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment