perf: avoid expensive lookup of storage metrics #7495

jakmeier · 2022-08-29T17:15:13Z

Finding the counter of a metric is a relatively expensive operation,
when compared to just updating its value. A performance regression
of about 20% on read perfromance has been observed due to additional
metrics added in #7429.

This commit stores the counters in TrieCacheInne and
TrieCachingStorag to avoid the lookup in hot paths like when
we hit the shard or chunk cache.

Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in near#7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache.

matklad

LGTM, but maybe consider bunching these up into

struct TrieCacheMetrics {
    ...
}

struct, so that call-site looks like

self.metrics.chunk_cache_misses.inc();

and makes it abundantly clear that we are dealing with metrics, and not with something affecting semantics.

Longarithm

+1 to @matklad. We actually need two groups of metrics - TrieCacheMetrics and TrieCacheInnerMetrics

Longarithm · 2022-08-29T17:29:43Z

core/store/src/trie/trie_storage.rs

@@ -73,6 +75,14 @@ pub struct TrieCacheInner {
    shard_id: u32,
    /// Whether cache is used for view calls execution.
    is_view: bool,
+
+    // Store counteres here to avoid overhead of looking them up on hot paths.


Suggested change

// Store counteres here to avoid overhead of looking them up on hot paths.

// Counters tracking operations happening inside the shard cache. Stored here to avoid overhead of looking them up on hot paths.

Longarithm · 2022-08-29T17:30:37Z

core/store/src/trie/trie_storage.rs

@@ -369,6 +381,16 @@ pub struct TrieCachingStorage {
    pub(crate) db_read_nodes: Cell<u64>,
    /// Counts trie nodes retrieved from the chunk cache.
    pub(crate) mem_read_nodes: Cell<u64>,
+
+    // Store counteres here to avoid overhead of looking them up on hot paths.


Suggested change

// Store counteres here to avoid overhead of looking them up on hot paths.

// Counters tracking events happening while querying shard cache. Stored here to avoid overhead of looking them up on hot paths.

- group metrics together in two new structs - update comments

Longarithm · 2022-08-29T17:41:36Z

Thinking out loud: is it possible here to have wrappers over Lazy<IntCounterVec> and Lazy<IntGaugeVec>?
These wrappers could accept shard_id, is_view as parameters and cache counters internally, so we could avoid adding new fields to trie caches.
Maybe the problem is that we would need to query some hash map again... We could store counters in a vector and index them like shard_id * 2 + int(is_view), but it sounds less reasonable.

jakmeier · 2022-08-29T18:03:00Z

Hm, I see, that does sound kind of nice. But I think you are spot on with the problem that it still requires something like a hash map internally. Doing some manual mapping into a vector seems like a good solution performance wise but it seems a bit fragile to future changes. And not as broadly applicable as I would like it to be.

The current implementation certainly is not the most elegant. But for a fix that we need fast, I prefer blunt and simple code. I would suggest we think about refactoring this in the larger context. There are still more metrics that use the inefficient pattern and maybe we can find a solution that fits them all.

If nobody raises any concern quick enough, I will merge this PR soon. I would like it to be in before the nightly CE job triggers.

matklad

LGTM!

Opened https://nearinc.atlassian.net/browse/ND-203 to track a larger problem.

* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments

jakmeier requested review from matklad and Longarithm August 29, 2022 17:15

jakmeier requested a review from a team as a code owner August 29, 2022 17:15

matklad reviewed Aug 29, 2022

View reviewed changes

Longarithm approved these changes Aug 29, 2022

View reviewed changes

apply reviewer suggestions

5dd4a6b

- group metrics together in two new structs - update comments

matklad approved these changes Aug 29, 2022

View reviewed changes

jakmeier merged commit 1a27088 into near:master Aug 29, 2022

jakmeier deleted the metrics-perf branch August 30, 2022 13:28

jakmeier mentioned this pull request Sep 23, 2022

Time based continuous estimation #7683

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid expensive lookup of storage metrics #7495

perf: avoid expensive lookup of storage metrics #7495

jakmeier commented Aug 29, 2022

matklad left a comment

Longarithm left a comment

Longarithm Aug 29, 2022

Longarithm Aug 29, 2022

Longarithm commented Aug 29, 2022

jakmeier commented Aug 29, 2022

matklad left a comment

	// Store counteres here to avoid overhead of looking them up on hot paths.
	// Counters tracking operations happening inside the shard cache. Stored here to avoid overhead of looking them up on hot paths.

	// Store counteres here to avoid overhead of looking them up on hot paths.
	// Counters tracking events happening while querying shard cache. Stored here to avoid overhead of looking them up on hot paths.

perf: avoid expensive lookup of storage metrics #7495

perf: avoid expensive lookup of storage metrics #7495

Conversation

jakmeier commented Aug 29, 2022

matklad left a comment

Choose a reason for hiding this comment

Longarithm left a comment

Choose a reason for hiding this comment

Longarithm Aug 29, 2022

Choose a reason for hiding this comment

Longarithm Aug 29, 2022

Choose a reason for hiding this comment

Longarithm commented Aug 29, 2022

jakmeier commented Aug 29, 2022

matklad left a comment

Choose a reason for hiding this comment