Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade package metrics #758

Merged
merged 74 commits into from
Dec 13, 2024
Merged

Conversation

gzliudan
Copy link
Collaborator

@gzliudan gzliudan commented Dec 10, 2024

Proposed changes

This PR applies the following PR about package metrics from geth:

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Regular KTLO or any of the maintaince work. e.g code style
  • CICD Improvement

Impacted Components

Which part of the codebase this PR will touch base on,

Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

@gzliudan gzliudan changed the title upgrade package metrics [WIP] upgrade package metrics Dec 10, 2024
@gzliudan gzliudan changed the title [WIP] upgrade package metrics upgrade package metrics Dec 10, 2024
gzliudan and others added 27 commits December 13, 2024 14:00
…7025)

Makes the float-gauges lock-free

name                      old time/op  new time/op  delta
CounterFloat64Parallel-8  1.45µs ±10%  0.85µs ± 6%  -41.65%  (p=0.008 n=5+5)

---------

Co-authored-by: Exca-DK <dev@DESKTOP-RI45P4J.localdomain>
Co-authored-by: Martin Holst Swende <martin@swende.se>
…sb) (ethereum#24877)

This chang creates a GaugeInfo metrics type for registering informational (textual) metrics, e.g. geth version number. It also improves the testing for backend-exporters, and uses a shared subpackage in 'internal' to provide sample datasets and ordered registry.

Implements ethereum#21783

---------

Co-authored-by: Martin Holst Swende <martin@swende.se>
This change includes a lot of things, listed below.

The interfaces have been split up into one write-interface and one read-interface, with `Snapshot` being the gateway from write to read. This simplifies the semantics _a lot_.

Example of splitting up an interface into one readonly 'snapshot' part, and one updatable writeonly part:

```golang
type MeterSnapshot interface {
	Count() int64
	Rate1() float64
	Rate5() float64
	Rate15() float64
	RateMean() float64
}

// Meters count events to produce exponentially-weighted moving average rates
// at one-, five-, and fifteen-minutes and a mean rate.
type Meter interface {
	Mark(int64)
	Snapshot() MeterSnapshot
	Stop()
}
```

This PR makes the concurrency model clearer. We have actual meters and snapshot of meters. The `meter` is the thing which can be accessed from the registry, and updates can be made to it.

- For all `meters`, (`Gauge`, `Timer` etc), it is assumed that they are accessed by different threads, making updates. Therefore, all `meters` update-methods (`Inc`, `Add`, `Update`, `Clear` etc) need to be concurrency-safe.
- All `meters` have a `Snapshot()` method. This method is _usually_ called from one thread, a backend-exporter. But it's fully possible to have several exporters simultaneously: therefore this method should also be concurrency-safe.

TLDR: `meter`s are accessible via registry, all their methods must be concurrency-safe.

For all `Snapshot`s, it is assumed that an individual exporter-thread has obtained a `meter` from the registry, and called the `Snapshot` method to obtain a readonly snapshot. This snapshot is _not_ guaranteed to be concurrency-safe. There's no need for a snapshot to be concurrency-safe, since exporters should not share snapshots.

Note, though: that by happenstance a lot of the snapshots _are_ concurrency-safe, being unmutable minimal representations of a value. Only the more complex ones are _not_ threadsafe, those that lazily calculate things like `Variance()`, `Mean()`.

Example of how a background exporter typically works, obtaining the snapshot and sequentially accessing the non-threadsafe methods in it:
```golang
		ms := metric.Snapshot()
                ...
		fields := map[string]interface{}{
			"count":    ms.Count(),
			"max":      ms.Max(),
			"mean":     ms.Mean(),
			"min":      ms.Min(),
			"stddev":   ms.StdDev(),
			"variance": ms.Variance(),
```

TLDR: `snapshots` are not guaranteed to be concurrency-safe (but often are).

I also changed the `Sample` type: previously, it iterated the samples fully every time `Mean()`,`Sum()`, `Min()` or `Max()` was invoked. Since we now have readonly base data, we can just iterate it once, in the constructor, and set all four values at once.

The same thing has been done for runtimehistogram.

Back when ResettingTImer was implemented, as part of ethereum#15910, Anton implemented a `Percentiles` on the new type. However, the method did not conform to the other existing types which also had a `Percentiles`.

1. The existing ones, on input, took `0.5` to mean `50%`. Anton used `50` to mean `50%`.
2. The existing ones returned `float64` outputs, thus interpolating between values. A value-set of `0, 10`, at `50%` would return `5`, whereas Anton's would return either `0` or `10`.

This PR removes the 'new' version, and uses only the 'legacy' percentiles, also for the ResettingTimer type.

The resetting timer snapshot was also defined so that it would expose the internal values. This has been removed, and getters for `Max, Min, Mean` have been added instead.

A lot of types were exported, but do not need to be. This PR unexports quite a lot of them.

metrics: refactor metrics (28035)
The rpc/duration/all meter was in nanoseconds, the individual meter in microseconds.
This PR changes it so both of them use nanoseconds.
* cmd, core, metrics: always report expensive metrics

* core, metrics: report block processing metrics as resetting timer

* metrics: update reporter tests
…hereum#29047)

metrics/influxdb: fix failed cases caused by float64 precision on arm64
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.

This has several drawbacks:
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.

- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.

This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.

The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.

The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
@gzliudan gzliudan merged commit 0f8abf5 into XinFinOrg:dev-upgrade Dec 13, 2024
9 checks passed
@gzliudan gzliudan deleted the upgrade-metrics branch December 13, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants