Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest finalized block metrics #12339

Merged
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
af480ab
Add LatestFinalizedBlock to HeadTracker
dhaidashenko Feb 19, 2024
9da949b
Added LatestFinalizedHead to Head
dhaidashenko Feb 19, 2024
9e580cc
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 19, 2024
112679e
remove unused func
dhaidashenko Feb 20, 2024
334a4d1
fix flakey nil pointer
dhaidashenko Feb 20, 2024
83cf834
improve logs & address lint issue
dhaidashenko Feb 20, 2024
26e8d50
nitpicks
dhaidashenko Feb 20, 2024
962464c
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 20, 2024
b86c872
fixed copy on heads on MarkFinalized
dhaidashenko Feb 21, 2024
00769f0
Merge branch 'feature/BCI-2649-latest-finalized-block' of github.com:…
dhaidashenko Feb 21, 2024
81774b4
error instead of panic
dhaidashenko Feb 21, 2024
c942663
return error instead of panic
dhaidashenko Feb 21, 2024
72a2380
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 21, 2024
a01fb86
nitpicks
dhaidashenko Feb 21, 2024
d7a9d4e
Merge branch 'feature/BCI-2649-latest-finalized-block' of github.com:…
dhaidashenko Feb 21, 2024
faf61d9
Finalized block based history depth
dhaidashenko Feb 23, 2024
89a75b3
simplify trimming
dhaidashenko Feb 23, 2024
f7ab489
nit fixes
dhaidashenko Feb 23, 2024
d9d422c
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 23, 2024
e77f529
fix build issues caused by merge
dhaidashenko Feb 23, 2024
908acf7
regen
dhaidashenko Feb 23, 2024
93b835d
FIx rpc client mock generation
dhaidashenko Feb 23, 2024
2f55403
nit fixes
dhaidashenko Feb 26, 2024
9f26066
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 26, 2024
71a0803
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 26, 2024
35c3302
nit fixes
dhaidashenko Feb 26, 2024
bd1ea1e
update comments
dhaidashenko Feb 27, 2024
6cc4fec
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 27, 2024
83ea5d1
ensure that we trim redundant blocks both in slice and in chain in Heads
dhaidashenko Feb 28, 2024
2d5ae65
nit fix
dhaidashenko Feb 28, 2024
c99cea6
Merge branch 'develop' into feature/BCI-2649-latest-finalized-block
dhaidashenko Feb 28, 2024
f77a8ab
Update common/headtracker/head_tracker.go
dhaidashenko Feb 28, 2024
f7c786f
HeadTracker backfill test with 0 finality depth
dhaidashenko Feb 29, 2024
dee11fc
Merge branch 'feature/BCI-2649-latest-finalized-block' of github.com:…
dhaidashenko Feb 29, 2024
4b27f75
latest finalized block metrics
dhaidashenko Mar 7, 2024
672e09a
changelog & go generate fix
dhaidashenko Mar 7, 2024
4372344
Merge branch 'develop' into feature/BCI-2663-rpc-metrics-for-finalize…
dhaidashenko Mar 18, 2024
abab9f0
move nodeConfig back into the test pkg
dhaidashenko Mar 18, 2024
c32050e
rollback fields renaming
dhaidashenko Mar 18, 2024
3dd3f3c
nit
dhaidashenko Mar 18, 2024
971f35f
changeset
dhaidashenko Mar 18, 2024
63ec90e
removed unused func
dhaidashenko Mar 18, 2024
38871bd
Set default value for FinalizedBlockPollInterval
dhaidashenko Mar 21, 2024
d79c8b1
updated docs
dhaidashenko Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/poor-melons-vanish.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"chainlink": minor
---

Add the `pool_rpc_node_highest_finalized_block` metric that tracks the highest finalized block seen per RPC. If `FinalityTagEnabled = true`, a positive `NodePool.FinalizedBlockPollInterval` is needed to collect the metric. If the finality tag is not enabled, the metric is populated with a calculated latest finalized block based on the latest head and finality depth.
18 changes: 18 additions & 0 deletions common/client/mock_head_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 28 additions & 0 deletions common/client/mock_node_client_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 30 additions & 0 deletions common/client/mocks/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package mocks

import (
"time"

commonconfig "github.com/smartcontractkit/chainlink/v2/common/config"
)

type ChainConfig struct {
IsFinalityTagEnabled bool
FinalityDepthVal uint32
NoNewHeadsThresholdVal time.Duration
ChainTypeVal commonconfig.ChainType
}

func (t ChainConfig) ChainType() commonconfig.ChainType {
return t.ChainTypeVal
}

func (t ChainConfig) NodeNoNewHeadsThreshold() time.Duration {
return t.NoNewHeadsThresholdVal
}

func (t ChainConfig) FinalityDepth() uint32 {
return t.FinalityDepthVal
}

func (t ChainConfig) FinalityTagEnabled() bool {
return t.IsFinalityTagEnabled
}
34 changes: 22 additions & 12 deletions common/client/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"github.com/smartcontractkit/chainlink-common/pkg/logger"
"github.com/smartcontractkit/chainlink-common/pkg/services"

commonconfig "github.com/smartcontractkit/chainlink/v2/common/config"
"github.com/smartcontractkit/chainlink/v2/common/types"
)

Expand Down Expand Up @@ -43,6 +44,14 @@ type NodeConfig interface {
SelectionMode() string
SyncThreshold() uint32
NodeIsSyncingEnabled() bool
FinalizedBlockPollInterval() time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, do we need different configs for each type of poll?
What if we just reuse the PollInterval for all polling?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some chains, we may also look for new heads via polling, not via subscribe.
In that case too, we wouldn't like to poll separately for new heads and new finalized heads. Mostly just make a single batch call to get both.
So that's why I am thinking, could we club all things to be polled under a same config, and fetch them together in a batch call?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar impression, it would be more efficient to batch whenever we can instead of introducing a new ticker that will increase pressure on the RPC. AFAIK we already poll RPC to verify if it's healthy so probably we could use that logic for adding latest finalized block. If we manage to bundle that together into a single batch then we will get finality tracking for free

Maybe reusing existing <-pollCh?

	for {
		select {
		case <-n.nodeCtx.Done():
			return
		case <-pollCh:

Copy link
Collaborator Author

@dhaidashenko dhaidashenko Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favour of merging all of the polls into a single ticker, as they check different properties of the RPC.
Poll checks that RPC is reachable, this is super basic check and we want to do it often and be aggressive with the timeouts, if an RPC needs > 1s to return it's version, it not healthy, while for finalized block it seems ok.

Regarding the new heads polling, it makes sense to batch poll in this case.

}

type ChainConfig interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me reason through why this configuration belongs to ChainConfig vs NodeConfig or if ChainConfig is living in the correct place (vs in MultiNode)?

I'd expect configuration here to be node specific and for chain details to live at a higher level in the abstraction hierarchy, or for node to store a map[string]*ChainConfig to support multiple chain configurations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FinalizedBlockPollInterval belongs to NodeConfig and defines how often a Node instance should poll for a new finalized block. It's not a property of a chain; it's a property of the component that performs the health assessment of an RPC.

Node is responsible for the health assessment of a single RPC that works only with one chain. Node does not store map[string]*ChainConfig as there is no reason for it to be aware of other chain configurations.

NodeNoNewHeadsThreshold() time.Duration
FinalityDepth() uint32
FinalityTagEnabled() bool
ChainType() commonconfig.ChainType
}

//go:generate mockery --quiet --name Node --structname mockNode --filename "mock_node_test.go" --inpackage --case=underscore
Expand Down Expand Up @@ -73,14 +82,14 @@ type node[
RPC NodeClient[CHAIN_ID, HEAD],
] struct {
services.StateMachine
lfcLog logger.Logger
name string
id int32
chainID CHAIN_ID
nodePoolCfg NodeConfig
noNewHeadsThreshold time.Duration
order int32
chainFamily string
lfcLog logger.Logger
name string
id int32
chainID CHAIN_ID
nodePoolCfg NodeConfig
chainCfg ChainConfig
order int32
chainFamily string

ws url.URL
http *url.URL
Expand All @@ -90,8 +99,9 @@ type node[
stateMu sync.RWMutex // protects state* fields
state nodeState
// Each node is tracking the last received head number and total difficulty
stateLatestBlockNumber int64
stateLatestTotalDifficulty *big.Int
stateLatestBlockNumber int64
stateLatestTotalDifficulty *big.Int
stateLatestFinalizedBlockNumber int64

// nodeCtx is the node lifetime's context
nodeCtx context.Context
Expand All @@ -113,7 +123,7 @@ func NewNode[
RPC NodeClient[CHAIN_ID, HEAD],
](
nodeCfg NodeConfig,
noNewHeadsThreshold time.Duration,
chainCfg ChainConfig,
lggr logger.Logger,
wsuri url.URL,
httpuri *url.URL,
Expand All @@ -129,7 +139,7 @@ func NewNode[
n.id = id
n.chainID = chainID
n.nodePoolCfg = nodeCfg
n.noNewHeadsThreshold = noNewHeadsThreshold
n.chainCfg = chainCfg
n.ws = wsuri
n.order = nodeOrder
if httpuri != nil {
Expand Down
45 changes: 42 additions & 3 deletions common/client/node_lifecycle.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ var (
Name: "pool_rpc_node_highest_seen_block",
Help: "The highest seen block for the given RPC node",
}, []string{"chainID", "nodeName"})
promPoolRPCNodeHighestFinalizedBlock = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "pool_rpc_node_highest_finalized_block",
Help: "The highest seen finalized block for the given RPC node",
}, []string{"chainID", "nodeName"})
promPoolRPCNodeNumSeenBlocks = promauto.NewCounterVec(prometheus.CounterOpts{
Name: "pool_rpc_node_num_seen_blocks",
Help: "The total number of new blocks seen by the given RPC node",
Expand Down Expand Up @@ -88,7 +92,7 @@ func (n *node[CHAIN_ID, HEAD, RPC]) aliveLoop() {
}
}

noNewHeadsTimeoutThreshold := n.noNewHeadsThreshold
noNewHeadsTimeoutThreshold := n.chainCfg.NodeNoNewHeadsThreshold()
pollFailureThreshold := n.nodePoolCfg.PollFailureThreshold()
pollInterval := n.nodePoolCfg.PollInterval()

Expand Down Expand Up @@ -134,6 +138,14 @@ func (n *node[CHAIN_ID, HEAD, RPC]) aliveLoop() {
lggr.Debug("Polling disabled")
}

var pollFinalizedHeadCh <-chan time.Time
if n.nodePoolCfg.FinalizedBlockPollInterval() > 0 {
lggr.Debugw("Finalized block polling enabled")
pollT := time.NewTicker(n.nodePoolCfg.FinalizedBlockPollInterval())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the NewTicker() panic if the parameter is 0?
I think in the fallback file, you should choose a default same as Eth mainnet, that's maybe 5 seconds?
Also, in config validation, ensure that this value is more than 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the NewTicker() panic if the parameter is 0?

Yes, the NewTicker panics if the parameter is 0, that's why we do not initialize it unless provided value is > 0

I think in the fallback file, you should choose a default same as Eth mainnet, that's maybe 5 seconds?

Sounds good.

Also, in config validation, ensure that this value is more than 0.

IMHO, it should be possible to disable the check. Other health checks are optional, I do not see why this one should be an exception

defer pollT.Stop()
pollFinalizedHeadCh = pollT.C
}

_, highestReceivedBlockNumber, _ := n.StateAndLatest()
var pollFailures uint32

Expand Down Expand Up @@ -201,6 +213,13 @@ func (n *node[CHAIN_ID, HEAD, RPC]) aliveLoop() {
outOfSyncT.Reset(noNewHeadsTimeoutThreshold)
}
n.setLatestReceived(bh.BlockNumber(), bh.BlockDifficulty())
if !n.chainCfg.FinalityTagEnabled() {
latestFinalizedBN := max(bh.BlockNumber()-int64(n.chainCfg.FinalityDepth()), 0)
if latestFinalizedBN > n.stateLatestFinalizedBlockNumber {
promPoolRPCNodeHighestFinalizedBlock.WithLabelValues(n.chainID.String(), n.name).Set(float64(latestFinalizedBN))
n.stateLatestFinalizedBlockNumber = latestFinalizedBN
}
}
case err := <-sub.Err():
lggr.Errorw("Subscription was terminated", "err", err, "nodeState", n.State())
n.declareUnreachable()
Expand All @@ -214,13 +233,33 @@ func (n *node[CHAIN_ID, HEAD, RPC]) aliveLoop() {
lggr.Criticalf("RPC endpoint detected out of sync; %s %s", msgCannotDisable, msgDegradedState)
// We don't necessarily want to wait the full timeout to check again, we should
// check regularly and log noisily in this state
outOfSyncT.Reset(zombieNodeCheckInterval(n.noNewHeadsThreshold))
outOfSyncT.Reset(zombieNodeCheckInterval(noNewHeadsTimeoutThreshold))
continue
}
}
n.declareOutOfSync(func(num int64, td *big.Int) bool { return num < highestReceivedBlockNumber })
return
case <-pollFinalizedHeadCh:
ctx, cancel := context.WithTimeout(n.nodeCtx, n.nodePoolCfg.FinalizedBlockPollInterval())
latestFinalized, err := n.RPC().LatestFinalizedBlock(ctx)
cancel()
if err != nil {
lggr.Warnw("Failed to fetch latest finalized block", "err", err)
continue
}

if !latestFinalized.IsValid() {
lggr.Warn("Latest finalized block is not valid")
continue
}

latestFinalizedBN := latestFinalized.BlockNumber()
if latestFinalizedBN > n.stateLatestFinalizedBlockNumber {
promPoolRPCNodeHighestFinalizedBlock.WithLabelValues(n.chainID.String(), n.name).Set(float64(latestFinalizedBN))
n.stateLatestFinalizedBlockNumber = latestFinalizedBN
}
}

}
}

Expand Down Expand Up @@ -316,7 +355,7 @@ func (n *node[CHAIN_ID, HEAD, RPC]) outOfSyncLoop(isOutOfSync func(num int64, td
return
}
lggr.Debugw(msgReceivedBlock, "blockNumber", head.BlockNumber(), "blockDifficulty", head.BlockDifficulty(), "nodeState", n.State())
case <-time.After(zombieNodeCheckInterval(n.noNewHeadsThreshold)):
case <-time.After(zombieNodeCheckInterval(n.chainCfg.NodeNoNewHeadsThreshold())):
if n.nLiveNodes != nil {
if l, _, _ := n.nLiveNodes(); l < 1 {
lggr.Critical("RPC endpoint is still out of sync, but there are no other available nodes. This RPC node will be forcibly moved back into the live pool in a degraded state")
Expand Down
Loading
Loading