Pull metrics out of Clickhouse, expose 'em through Nexus' API

Here's the end-user flow we'd like:

- Through the console (or perhaps the CLI?) a user can view *metrics* for some category of information. For example: "show me the metrics for HTTP endpoint latency. Show me metrics for disk/network usage. etc.
  - Point to consider: "operator" usage vs "end-user" usage -- each may see different metrics. We will want a different set of ACLs, at bare minimum, even if the Nexus implementation is mechanically similar.
  - Open question: how many endpoints? what query parameters are exposed? What would be useful for console?
- This should trigger a request to the external Nexus API, which itself should be able to make requests to Clickhouse
  - Presumably, Nexus will act as an ACL validator + proxy to Clickhouse. Hopefully not *too* much post-processing of data is necessary.

What already exists:
- There's machinery around *oximeter* to collect metrics from services, and store such information within Clickhouse itself. Although we should definitely add more metrics here (see: https://github.com/oxidecomputer/crucible/issues/341 as an example), this half of the problem space is considered out-of-scope for this issue.
- Since we already have HTTP endpoint latency wired up and dumped into Clickhouse, this may be an easy "first target". For utility, however, user-visible metrics (instance stats, disk/networking metrics, etc) will be high-value targets.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull metrics out of Clickhouse, expose 'em through Nexus' API #1131

smklein
openedon May 27, 2022

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pull metrics out of Clickhouse, expose 'em through Nexus' API #1131

Description

smkleinopenedon May 27, 2022

Metadata