Skip to content

Fix regressions introduced by #39 and don't expose broken metrics by default #43

Open

Description

#39 introduces significant semantic regressions compared to the earlier implementation.

In the new implementation, by default:

  • if you simply install the library and fetch metrics via prometheus, system metrics don't work at all because they never get updated
  • if you don't update other metrics at regular intervals, system metrics become stale and don't get updated
  • if you update some other metrics, you end up paying an unexpected performance price because the system metrics are randomly updated - this includes reading and parsing files, system clocks and taking several global locks
  • system metrics are updated too often, if your polling interval in prometheus is higher than 10s, causing an unnecessary processing cost
  • system metrics are not updated often enough, if your polling interval in prometheus is higher than 10s

#41 adds additional workarounds: all the above issues now apply to per-thread metrics as well

Prometheus uses a pull model for metrics - it means that when metrics are fetched via http, they should be updated in that moment - in that model, all metrics that reasonably can be loaded on demand at the time of the poll should be loaded then - this includes memory statistics. It follows naturally that the previous implementation, except for nim gc metrics, was better in all aspects.

Because the system GC metrics come with several caveats and costs that make them unreliable unless used correctly, a better approach would be:

  • the process metrics (process_ and nim_gc_heap_instance_) should be updated from a callback run when metrics are fetched, in prometheus
  • the metrics that are broken should not be exposed at all, by default (nim_gc_mem_*) - instead the user should explicitly enable them, thus acknowledging their faults
  • when used with a push model (statsd etc), the user should explicitly call an update function that pushes out system metrics at a desired interval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions