Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to TXM to detect and purge stuck transactions #12881

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
246b04d
Added the EVM stuck tx detector component
amit-momin Apr 18, 2024
d47d88f
Added stuck tx handling in Confirmer
amit-momin Apr 19, 2024
8c713dc
Fixed linting
amit-momin Apr 19, 2024
05c5a11
Fixed toml config decoding
amit-momin Apr 19, 2024
360d9fa
Fixed config tests
amit-momin Apr 19, 2024
0a04fff
Fixed web resolver config tests
amit-momin Apr 19, 2024
d26473c
Fixed config docs test
amit-momin Apr 19, 2024
1541c5a
Added zkEVM overflow detection and added unit tests for the detector
amit-momin Apr 23, 2024
1d4113d
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 23, 2024
c71ac14
Fixed broken tests after merge
amit-momin Apr 23, 2024
2e46dde
Reverted AutoPurgeConfig validation changes and fixed config tests
amit-momin Apr 23, 2024
8185dc8
Added changeset and fixed config validation logic
amit-momin Apr 23, 2024
20bfb9d
Fixed linting
amit-momin Apr 23, 2024
82e3a5d
Fixed linting
amit-momin Apr 23, 2024
db5264e
Fixed purge attempt builder and added tests
amit-momin Apr 24, 2024
0d5b318
Updated evm.txes contraint to allow non-null nonce if fatal_error
amit-momin Apr 24, 2024
25105a1
Added confirmer test
amit-momin Apr 24, 2024
491f9e3
Adjusted confirmer test to better reflect actual process
amit-momin Apr 24, 2024
338ab26
Fixed linting
amit-momin Apr 24, 2024
dc17b98
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 25, 2024
468b73b
Added purge block num loading on Confirmer startup
amit-momin Apr 25, 2024
736eb89
Updated EVM tx store mock
amit-momin Apr 25, 2024
b2902c6
Fixed linting and testdata
amit-momin Apr 25, 2024
6eb5dc5
Updated stuck tx fatal error messages
amit-momin Apr 25, 2024
a5663ee
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 25, 2024
0cea37b
Updated sql migration file sequence
amit-momin Apr 25, 2024
a3cd9b7
Skipped loading purge block num if auto-purge feature disabled and fi…
amit-momin Apr 26, 2024
02306fe
Fixed linting
amit-momin Apr 26, 2024
a13605b
Renamed function and moved log
amit-momin Apr 26, 2024
c2deae5
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 26, 2024
a5ea4aa
Added stricter config validation
amit-momin Apr 26, 2024
1cfb5cc
Fixed linting
amit-momin Apr 27, 2024
335033a
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 29, 2024
a5ba3a0
Updated auto-purge feature configs to adhere to config naming standards
amit-momin Apr 29, 2024
7513d36
Fixed config doc test and updated changeset
amit-momin Apr 29, 2024
4b41b4b
Updated Scroll and zkEVM config defaults and linted common config
amit-momin Apr 29, 2024
cc5764d
Updated config doc
amit-momin Apr 29, 2024
b1faa07
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin Apr 30, 2024
d67d79b
Generated config doc and fixed linting
amit-momin Apr 30, 2024
c4062f5
Updated config description for AutoPurge.MinAttempts
amit-momin May 9, 2024
54b9c7c
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin May 9, 2024
d244012
Fixed sql migration conflict
amit-momin May 9, 2024
0915400
Fixed linting
amit-momin May 9, 2024
c95d261
Updated stuck tx detector to use PriceMax config and added comments
amit-momin May 13, 2024
db5e5d9
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin May 13, 2024
31d69ba
Fixed linting
amit-momin May 13, 2024
90ec4a3
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin May 22, 2024
d81da52
Updated DetectionApiUrl config example
amit-momin May 22, 2024
a33460b
Merge branch 'develop' into BCI-2941-Implement-auto-purge-stuck-trans…
amit-momin May 23, 2024
c08504b
Fixed issues from latest merge
amit-momin May 23, 2024
88826a7
Renumbered sql migration file
amit-momin May 23, 2024
e400a36
Fixed testdata
amit-momin May 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed config tests
  • Loading branch information
amit-momin committed Apr 19, 2024
commit 360d9fadce403229944c07297a70467884d3c2c0
11 changes: 5 additions & 6 deletions core/chains/evm/config/toml/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -456,18 +456,17 @@ func (a *AutoPurgeConfig) ValidateConfig() (err error) {
return commonconfig.ErrInvalid{Name: "AutoPurgeDetectionApiUrl", Value: a.AutoPurgeDetectionApiUrl.Scheme, Msg: "must be http or https"}
}
}
var errorList []error
if a.AutoPurgeThreshold == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: optional, but if you collapse the null and 0 checks in same line, then code is less, and more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept them separate so I could differentiate between ErrMissing and ErrInvalid but I can maybe combine these into just ErrInvalid

errorList = append(errorList, commonconfig.ErrMissing{Name: "AutoPurgeThreshold", Msg: "needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
err = errors.Join(err, commonconfig.ErrMissing{Name: "AutoPurgeThreshold", Msg: "needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
} else if *a.AutoPurgeThreshold == 0 {
errorList = append(errorList, commonconfig.ErrInvalid{Name: "AutoPurgeThreshold", Value: *a.AutoPurgeThreshold, Msg: "cannot be 0 if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
err = errors.Join(err, commonconfig.ErrInvalid{Name: "AutoPurgeThreshold", Value: *a.AutoPurgeThreshold, Msg: "cannot be 0 if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
}
if a.AutoPurgeMinAttempts == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also validate here that gas bumping is enabled?

Copy link
Contributor Author

@amit-momin amit-momin Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should for robustness but it could get messy with how we scope these configs. This one being under Transactions but that one being under GasEstimator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the config validations under Chain's so I could add the BumpThreshold check. I also made the validations chain specific since they require different configs based on their detection method.

errorList = append(errorList, commonconfig.ErrMissing{Name: "AutoPurgeMinAttempts", Msg: "needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
err = errors.Join(err, commonconfig.ErrMissing{Name: "AutoPurgeMinAttempts", Msg: "needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
} else if *a.AutoPurgeMinAttempts == 0 {
errorList = append(errorList, commonconfig.ErrInvalid{Name: "AutoPurgeMinAttempts", Value: *a.AutoPurgeMinAttempts, Msg: "cannot be 0 if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
err = errors.Join(err, commonconfig.ErrInvalid{Name: "AutoPurgeMinAttempts", Value: *a.AutoPurgeMinAttempts, Msg: "cannot be 0 if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set"})
}
return errors.Join(errorList...)
return
}

func (a *AutoPurgeConfig) setFrom(f *AutoPurgeConfig) {
Expand Down
46 changes: 32 additions & 14 deletions core/chains/evm/txmgr/test_helpers.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package txmgr

import (
"math/big"
"net/url"
"testing"
"time"

Expand All @@ -11,6 +13,7 @@ import (
evmconfig "github.com/smartcontractkit/chainlink/v2/core/chains/evm/config"
"github.com/smartcontractkit/chainlink/v2/core/chains/evm/utils"
"github.com/smartcontractkit/chainlink/v2/core/config"
"github.com/smartcontractkit/chainlink/v2/core/internal/testutils"
)

func ptr[T any](t T) *T { return &t }
Expand Down Expand Up @@ -42,22 +45,28 @@ func (d *TestDatabaseConfig) Listener() config.Listener {

type TestEvmConfig struct {
evmconfig.EVM
MaxInFlight uint32
ReaperInterval time.Duration
ReaperThreshold time.Duration
ResendAfterThreshold time.Duration
BumpThreshold uint64
MaxQueued uint64
MaxInFlight uint32
ReaperInterval time.Duration
ReaperThreshold time.Duration
ResendAfterThreshold time.Duration
BumpThreshold uint64
MaxQueued uint64
AutoPurgeStuckTxs bool
AutoPurgeThreshold uint32
AutoPurgeMinAttempts uint32
AutoPurgeDetectionApiUrl *url.URL
}

func (e *TestEvmConfig) Transactions() evmconfig.Transactions {
return &transactionsConfig{e: e}
return &transactionsConfig{e: e, autoPurge: &autoPurgeConfig{}}
}

func (e *TestEvmConfig) NonceAutoSync() bool { return true }

func (e *TestEvmConfig) FinalityDepth() uint32 { return 42 }

func (e *TestEvmConfig) ChainID() *big.Int { return testutils.FixtureChainID }

type TestGasEstimatorConfig struct {
bumpThreshold uint64
}
Expand Down Expand Up @@ -115,15 +124,24 @@ func (b *TestBlockHistoryConfig) TransactionPercentile() uint16 { return 42

type transactionsConfig struct {
evmconfig.Transactions
e *TestEvmConfig
e *TestEvmConfig
autoPurge evmconfig.AutoPurgeConfig
}

func (*transactionsConfig) ForwardersEnabled() bool { return true }
func (t *transactionsConfig) MaxInFlight() uint32 { return t.e.MaxInFlight }
func (t *transactionsConfig) MaxQueued() uint64 { return t.e.MaxQueued }
func (t *transactionsConfig) ReaperInterval() time.Duration { return t.e.ReaperInterval }
func (t *transactionsConfig) ReaperThreshold() time.Duration { return t.e.ReaperThreshold }
func (t *transactionsConfig) ResendAfterThreshold() time.Duration { return t.e.ResendAfterThreshold }
func (*transactionsConfig) ForwardersEnabled() bool { return true }
func (t *transactionsConfig) MaxInFlight() uint32 { return t.e.MaxInFlight }
func (t *transactionsConfig) MaxQueued() uint64 { return t.e.MaxQueued }
func (t *transactionsConfig) ReaperInterval() time.Duration { return t.e.ReaperInterval }
func (t *transactionsConfig) ReaperThreshold() time.Duration { return t.e.ReaperThreshold }
func (t *transactionsConfig) ResendAfterThreshold() time.Duration { return t.e.ResendAfterThreshold }
func (t *transactionsConfig) AutoPurge() evmconfig.AutoPurgeConfig { return t.autoPurge }

type autoPurgeConfig struct{}

func (a *autoPurgeConfig) AutoPurgeStuckTxs() bool { return false }
func (a *autoPurgeConfig) AutoPurgeThreshold() uint32 { return 0 }
func (a *autoPurgeConfig) AutoPurgeMinAttempts() uint32 { return 0 }
func (a *autoPurgeConfig) AutoPurgeDetectionApiUrl() *url.URL { return &url.URL{} }

type MockConfig struct {
EvmConfig *TestEvmConfig
Expand Down
10 changes: 10 additions & 0 deletions core/config/docs/chains-evm.toml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,16 @@ ReaperThreshold = '168h' # Default
# ResendAfterThreshold controls how long to wait before re-broadcasting a transaction that has not yet been confirmed.
ResendAfterThreshold = '1m' # Default

[EVM.Transactions.AutoPurge]
# AutoPurgeStuckTxs enables or disables automatically purging transactions that have been idenitified as terminally stuck (will never be included on-chain). This feature is only expected to be used by ZK chains.
AutoPurgeStuckTxs = false # Default
# AutoPurgeDetectionApiUrl configures the base url of a custom endpoint used to identify terminally stuck transactions.
AutoPurgeDetectionApiUrl = 'https://venus.scroll.io' # Example
# AutoPurgeThreshold configures the number of blocks a transaction has to remain unconfirmed before it is evaluated for being terminally stuck. This threshold is only applied if there is no custom API to identify stuck transactions provided by the chain.
AutoPurgeThreshold = 5 # Example
# AutoPurgeMinAttempts cnofigures the minimum number of broadcasted attempts a transaction has to have before it is evaluated further for being terminally stuck. This threshold is only applied if there is no custom API to identify stuck transactions provided by the chain.
AutoPurgeMinAttempts = 3 # Example

[EVM.BalanceMonitor]
# Enabled balance monitoring for all keys.
Enabled = true # Default
Expand Down
20 changes: 19 additions & 1 deletion core/services/chainlink/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,9 @@ func TestConfig_Marshal(t *testing.T) {
ReaperThreshold: &minute,
ResendAfterThreshold: &hour,
ForwardersEnabled: ptr(true),
AutoPurge: evmcfg.AutoPurgeConfig{
AutoPurgeStuckTxs: ptr(false),
},
},

HeadTracker: evmcfg.HeadTracker{
Expand Down Expand Up @@ -968,6 +971,9 @@ ReaperInterval = '1m0s'
ReaperThreshold = '1m0s'
ResendAfterThreshold = '1h0m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down Expand Up @@ -1177,6 +1183,15 @@ func TestConfig_full(t *testing.T) {
got.EVM[c].Nodes[n].Order = ptr(int32(100))
}
}
if got.EVM[c].Transactions.AutoPurge.AutoPurgeThreshold == nil {
got.EVM[c].Transactions.AutoPurge.AutoPurgeThreshold = ptr(uint32(0))
}
if got.EVM[c].Transactions.AutoPurge.AutoPurgeMinAttempts == nil {
got.EVM[c].Transactions.AutoPurge.AutoPurgeMinAttempts = ptr(uint32(0))
}
if got.EVM[c].Transactions.AutoPurge.AutoPurgeDetectionApiUrl == nil {
got.EVM[c].Transactions.AutoPurge.AutoPurgeDetectionApiUrl = new(commoncfg.URL)
}
}

cfgtest.AssertFieldsNotNil(t, got)
Expand Down Expand Up @@ -1221,11 +1236,14 @@ func TestConfig_Validate(t *testing.T) {
- WSURL: missing: required for primary nodes
- HTTPURL: missing: required for all nodes
- 1.HTTPURL: missing: required for all nodes
- 1: 6 errors:
- 1: 7 errors:
- ChainType: invalid value (Foo): must not be set with this chain id
- Nodes: missing: must have at least one node
- ChainType: invalid value (Foo): must be one of arbitrum, celo, gnosis, kroma, metis, optimismBedrock, scroll, wemix, zksync or omitted
- HeadTracker.HistoryDepth: invalid value (30): must be equal to or greater than FinalityDepth
- Transactions.AutoPurge: 2 errors:
- AutoPurgeThreshold: missing: needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set
- AutoPurgeMinAttempts: missing: needs to be set if AutoPurgeStuckTxs is enabled and AutoPurgeDetectionApiUrl not set
- GasEstimator: 2 errors:
- FeeCapDefault: invalid value (101 wei): must be equal to PriceMax (99 wei) since you are using FixedPrice estimation with gas bumping disabled in EIP1559 mode - PriceMax will be used as the FeeCap for transactions instead of FeeCapDefault
- PriceMax: invalid value (1 gwei): must be greater than or equal to PriceDefault
Expand Down
3 changes: 3 additions & 0 deletions core/services/chainlink/testdata/config-full.toml
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,9 @@ ReaperInterval = '1m0s'
ReaperThreshold = '1m0s'
ResendAfterThreshold = '1h0m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down
3 changes: 3 additions & 0 deletions core/services/chainlink/testdata/config-invalid.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ ChainID = '1'
ChainType = 'Foo'
FinalityDepth = 32

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = true

[EVM.GasEstimator]
Mode = 'FixedPrice'
BumpThreshold = 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,9 @@ ReaperInterval = '1h0m0s'
ReaperThreshold = '168h0m0s'
ResendAfterThreshold = '1m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down Expand Up @@ -365,6 +368,9 @@ ReaperInterval = '1h0m0s'
ReaperThreshold = '168h0m0s'
ResendAfterThreshold = '1m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down Expand Up @@ -450,6 +456,9 @@ ReaperInterval = '1h0m0s'
ReaperThreshold = '168h0m0s'
ResendAfterThreshold = '1m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down
3 changes: 3 additions & 0 deletions core/web/resolver/testdata/config-full.toml
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,9 @@ ReaperInterval = '1m0s'
ReaperThreshold = '1m0s'
ResendAfterThreshold = '1h0m0s'

[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false

[EVM.BalanceMonitor]
Enabled = true

Expand Down
34 changes: 34 additions & 0 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6947,6 +6947,40 @@ ResendAfterThreshold = '1m' # Default
```
ResendAfterThreshold controls how long to wait before re-broadcasting a transaction that has not yet been confirmed.

## EVM.Transactions.AutoPurge
```toml
[EVM.Transactions.AutoPurge]
AutoPurgeStuckTxs = false # Default
AutoPurgeDetectionApiUrl = 'https://venus.scroll.io' # Example
AutoPurgeThreshold = 5 # Example
AutoPurgeMinAttempts = 3 # Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these are scoped under EVM.Transactions.AutoPurge could these be

Enabled = false
DetectionApiUrl = '...'
Threshold = 5
MinAttempts = 3

or what is the convention we prefer for the configs? I am looking at EVM.BalanceMonitor for context, but also see others that are slightly different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I just looked at the other configs too and agree I shouldn't prepend the feature name. I've updated these in the latest commit.

```


### AutoPurgeStuckTxs
```toml
AutoPurgeStuckTxs = false # Default
```
AutoPurgeStuckTxs enables or disables automatically purging transactions that have been idenitified as terminally stuck (will never be included on-chain). This feature is only expected to be used by ZK chains.

### AutoPurgeDetectionApiUrl
```toml
AutoPurgeDetectionApiUrl = 'https://venus.scroll.io' # Example
```
AutoPurgeDetectionApiUrl configures the base url of a custom endpoint used to identify terminally stuck transactions.

### AutoPurgeThreshold
```toml
AutoPurgeThreshold = 5 # Example
```
AutoPurgeThreshold configures the number of blocks a transaction has to remain unconfirmed before it is evaluated for being terminally stuck. This threshold is only applied if there is no custom API to identify stuck transactions provided by the chain.

### AutoPurgeMinAttempts
```toml
AutoPurgeMinAttempts = 3 # Example
```
AutoPurgeMinAttempts cnofigures the minimum number of broadcasted attempts a transaction has to have before it is evaluated further for being terminally stuck. This threshold is only applied if there is no custom API to identify stuck transactions provided by the chain.

## EVM.BalanceMonitor
```toml
[EVM.BalanceMonitor]
Expand Down
Loading