Excessive resource utilization when ethstats reporting is enabled #3896

nicexe · 2022-03-22T15:13:46Z

Describe the bug
Nethermind is using excessive resources

To Reproduce
Steps to reproduce the behavior:

Setup an AuRa network such xdaichain/posdao-testnet-prepare
Before launching the nodes for the first time (./run_all.sh), segregate the nodes into a different server for each one.
1. Modify each node's docker-compose.yml and update NETHERMIND_NETWORKCONFIG_EXTERNALIP and NETHERMIND_ETHSTATSCONFIG_SERVER environment variables
2. Optionally modify run_all.sh and stop_all.sh to only run/stop the specific node or ethstats for each server
Start nodes
On archive server, deploy staking contract cd .. # go back to the `compose` directory and docker-compose start && docker-compose logs -f
Observe system load. This is the baseline with ethstats reporting enabled
Stop the nodes
Modify each node's docker-compose.yml and update NETHERMIND_ETHSTATSCONFIG_ENABLED to false
Observe system load. This is with ethstats reporting disabled

Screenshots
Here you can see some system metrics with ethstats reporting enabled
Focus on system load average
The only outlier is validator1 which seems to report the wrong information yet has the least resource utilization

Here you can see some system metrics with ethstats reporting disabled

System Information:

Docker host: Ubuntu 20.04
Docker container images: nethermind/nethermind:1.12.6

Additional notes
I cannot really explain the discrepancy between validator1 and the other validators in the ethstats enabled scenario.
All validators have almost identical environment variables set. They only differ on node name, keys and external ip.
Each node lives on it own VM, each on a different host machine, some host machines on a different datacenters.
All VMs have identical specs.

The text was updated successfully, but these errors were encountered:

nicexe · 2022-03-22T15:21:30Z

In the ethstats enabled scenario, validators are starting to struggle to parse/produce blocks on time and this results in lots of benign AuRa reports

nicexe · 2022-03-22T16:49:09Z

Logs in the ethstats disabled scenario:

Logs in the ethstats enabled scenario

nicexe · 2022-03-23T08:01:38Z

I forgot to mention the VM orientation shown in my screenshots:

archive	validator1	validator2
validator3	validator4	validator5

varasev · 2022-03-24T10:53:37Z

@nicexe Did you try that with v1.12.3?

nicexe · 2022-03-24T14:00:29Z

Hi @varasev thanks for the reply.
I just tried 1.12.3 but the nodes cannot start anymore and I'm getting those 2 errors:

2022-03-24 13:55:05.4288|ERROR|13|Step InitializeBlockchainAuRa failed after 151ms Nethermind.Trie.TrieException: Node 0x3adfe602f80cfa36db0ea3b41a64819071037a685399bf9381508d255e8d9f71 is missing from the DB
   at Nethermind.Trie.Pruning.TrieStore.LoadRlp(Keccak keccak, IKeyValueStore keyValueStore) in /src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs:line 331
   at Nethermind.Trie.Pruning.ReadOnlyTrieStore.LoadRlp(Keccak hash) in /src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTrieStore.cs:line 41
   at Nethermind.Trie.TrieNode.ResolveNode(ITrieNodeResolver tree) in /src/Nethermind/Nethermind.Trie/TrieNode.cs:line 234
   at Nethermind.Trie.PatriciaTree.Run(Span`1 updatePath, Int32 nibblesCount, Byte[] updateValue, Boolean isUpdate, Boolean ignoreMissingDelete, Keccak startRootHash) in /src/Nethermind/Nethermind.Trie/PatriciaTree.cs:line 401
   at Nethermind.Trie.PatriciaTree.Get(Span`1 rawKey, Keccak rootHash) in /src/Nethermind/Nethermind.Trie/PatriciaTree.cs:line 319
   at Nethermind.State.StateTree.Get(Address address, Keccak rootHash) in /src/Nethermind/Nethermind.State/StateTree.cs:line 49
   at Nethermind.State.StateProvider.GetState(Address address) in /src/Nethermind/Nethermind.State/StateProvider.cs:line 659
   at Nethermind.State.StateProvider.GetAndAddToCache(Address address) in /src/Nethermind/Nethermind.State/StateProvider.cs:line 674
   at Nethermind.State.StateProvider.GetThroughCache(Address address) in /src/Nethermind/Nethermind.State/StateProvider.cs:line 695
   at Nethermind.State.StateProvider.GetCodeHash(Address address) in /src/Nethermind/Nethermind.State/StateProvider.cs:line 311
   at Nethermind.State.IReadOnlyStateProvider.IsContract(Address address) in /src/Nethermind/Nethermind.State/IReadOnlyStateProvider.cs:line 39
   at Nethermind.Evm.TransactionProcessing.ReadOnlyTransactionProcessor.IsContractDeployed(Address address) in /src/Nethermind/Nethermind.Evm/TransactionProcessing/ReadOnlyTransactionProcessor.cs:line 55
   at Nethermind.Blockchain.Contracts.Contract.ConstantContract.CallRaw(CallInfo callInfo, IReadOnlyTransactionProcessor readOnlyTransactionProcessor) in /src/Nethermind/Nethermind.Blockchain/Contracts/Contract.ConstantContract.cs:line 111
   at Nethermind.Blockchain.Contracts.Contract.ConstantContract.Call(CallInfo callInfo) in /src/Nethermind/Nethermind.Blockchain/Contracts/Contract.ConstantContract.cs:line 104
   at Nethermind.Blockchain.Contracts.Contract.IConstantContract.Call[T](CallInfo callInfo) in /src/Nethermind/Nethermind.Blockchain/Contracts/Contract.ConstantContract.cs:line 43
   at Nethermind.Blockchain.Contracts.Contract.IConstantContract.Call[T](BlockHeader parentHeader, String functionName, Address sender, Object[] arguments) in /src/Nethermind/Nethermind.Blockchain/Contracts/Contract.ConstantContract.cs:line 52
   at Nethermind.Consensus.AuRa.Contracts.ValidatorContract.GetValidators(BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Contracts/ValidatorContract.cs:line 104
   at Nethermind.Consensus.AuRa.Validators.ContractBasedValidator.LoadValidatorsFromContract(BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/ContractBasedValidator.cs:line 246
   at Nethermind.Consensus.AuRa.Validators.ContractBasedValidator.SetFinalizationManager(IAuRaBlockFinalizationManager finalizationManager, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/ContractBasedValidator.cs:line 84
   at Nethermind.Consensus.AuRa.Validators.ContractBasedValidator..ctor(IValidatorContract validatorContract, IBlockTree blockTree, IReceiptFinder receiptFinder, IValidatorStore validatorStore, IValidSealerStrategy validSealerStrategy, IAuRaBlockFinalizationManager finalizationManager, BlockHeader parentHeader, ILogManager logManager, Int64 startBlockNumber, Int64 posdaoTransition, Boolean forSealing) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/ContractBasedValidator.cs:line 71
   at Nethermind.Consensus.AuRa.AuRaValidatorFactory.<CreateValidatorProcessor>g__GetContractBasedValidator|19_2(<>c__DisplayClass19_0& ) in /src/Nethermind/Nethermind.Consensus.AuRa/AuRaValidatorFactory.cs:line 104
   at Nethermind.Consensus.AuRa.AuRaValidatorFactory.CreateValidatorProcessor(Validator validator, BlockHeader parentHeader, Nullable`1 startBlock) in /src/Nethermind/Nethermind.Consensus.AuRa/AuRaValidatorFactory.cs:line 131
   at Nethermind.Consensus.AuRa.Validators.MultiValidator.CreateValidator(Int64 finalizedAtBlockNumber, Validator validatorPrototype, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 222
   at Nethermind.Consensus.AuRa.Validators.MultiValidator.SetCurrentValidator(Int64 finalizedAtBlockNumber, Validator validatorPrototype, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 204
   at Nethermind.Consensus.AuRa.Validators.MultiValidator.SetCurrentValidator(KeyValuePair`2 validatorInfo, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 196
   at Nethermind.Consensus.AuRa.Validators.MultiValidator.InitCurrentValidator(Int64 blockNumber, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 75
   at Nethermind.Consensus.AuRa.Validators.MultiValidator.SetFinalizationManager(IAuRaBlockFinalizationManager finalizationManager, BlockHeader parentHeader) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 186
   at Nethermind.Consensus.AuRa.Validators.MultiValidator..ctor(Validator validator, IAuRaValidatorFactory validatorFactory, IBlockTree blockTree, IValidatorStore validatorStore, IAuRaBlockFinalizationManager finalizationManager, BlockHeader parentHeader, ILogManager logManager, Boolean forSealing) in /src/Nethermind/Nethermind.Consensus.AuRa/Validators/MultiValidator.cs:line 66
   at Nethermind.Consensus.AuRa.AuRaValidatorFactory.CreateValidatorProcessor(Validator validator, BlockHeader parentHeader, Nullable`1 startBlock) in /src/Nethermind/Nethermind.Consensus.AuRa/AuRaValidatorFactory.cs:line 145
   at Nethermind.Consensus.AuRa.InitializationSteps.InitializeBlockchainAuRa.CreateAuRaValidator(IBlockProcessor processor, IReadOnlyTxProcessorSource readOnlyTxProcessorSource) in /src/Nethermind/Nethermind.Consensus.AuRa/InitializationSteps/InitializeBlockchainAuRa.cs:line 131
   at Nethermind.Consensus.AuRa.InitializationSteps.InitializeBlockchainAuRa.CreateBlockProcessor() in /src/Nethermind/Nethermind.Consensus.AuRa/InitializationSteps/InitializeBlockchainAuRa.cs:line 94
   at Nethermind.Init.Steps.InitializeBlockchain.InitBlockchain() in /src/Nethermind/Nethermind.Init/Steps/InitializeBlockchain.cs:line 220
   at Nethermind.Init.Steps.InitializeBlockchain.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/InitializeBlockchain.cs:line 69
2022-03-24 13:55:05.4476|ERROR|10|Error during ethereum runner start Nethermind.Init.Steps.StepDependencyException: BlockchainProcessor
   at Nethermind.Init.Steps.StartBlockProcessor.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/StartBlockProcessor.cs:line 37
   at Nethermind.Init.Steps.EthereumStepsManager.RunOneRoundOfInitialization(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 140
   at Nethermind.Init.Steps.EthereumStepsManager.InitializeAll(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 108
   at Nethermind.Runner.Ethereum.EthereumRunner.Start(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Runner/Ethereum/EthereumRunner.cs:line 51

I'll try keeping all nodes to latest except one where I would configure it to use 1.12.3 and sync from 0

nicexe · 2022-03-24T14:48:54Z

@varasev 1.12.3 definitely helped

system load

ethstats

I restored the archive node and validators from 1 to 4 and configured them to use 1.12.6. I removed all data for validator 5 and configured it to use 1.12.3.

After validator 5 synced with the rest of the nodes I let it rest for a few minutes and then checked the system load as seen in the screenshot.

Validator 5 has a system load average of about 1 on a 4 core VM and the other nodes have a system load average of about 8-10 on a 4 core VM each.

Again, the only outlier is validator 1 which has significantly lower CPU utilization and system load average of about 2 on a 4 core VM but it is also the only node reporting invalid stats to the ethstat server.

LukaszRozmej · 2022-03-24T21:52:37Z

Hi @nicexe looking into your issue, if 1.12.3 solves it then there is slight chance that the actual reason is different and related to: #3901

varasev · 2022-03-27T06:50:42Z

@nicexe Can you try to run the setup using the nethermindeth/nethermind:local_txs_broadcasting docker image? It is the build from #3901

LukaszRozmej · 2022-03-28T10:54:54Z

Also pushed #3906

nicexe · 2022-03-28T12:39:07Z

I tried to use nethermindeth/nethermind:local_txs_broadcasting on validator 5. This brings the VM that validator 5 runs on to a system load of about 9 and 100% CPU utilization just like the rest of the nodes as opposed to a system load of about 1 when using 1.12.3

varasev · 2022-03-29T08:28:32Z

Also pushed #3906

@LukaszRozmej Can you please make a docker image for this?

LukaszRozmej · 2022-03-30T07:57:12Z

@nicexe I fail to actually reproduce this:

Would you be able to provide a performance trace from one of the misbehaving nodes? I am inquiring our devops on how to do that on nodes running inside docker as that is more complicated.

@matilote would you be able to provide a docker image + instructions on how to use dottrace here?

LukaszRozmej · 2022-03-31T09:19:16Z

@nicexe can you help us gather data here?

Please use image nethermindeth/nethermind:test
After running it on one or multiple misbehaving nodes please do:
sudo docker exec -it xdaitestnet-validatorX /bin/bash
mkdir dumps
/opt/tools/dottrace attach PID --save-to=dumps
After some time like a minute or two close dottrace with Ctrl+C, it will take the snapshot and save it to dumps folder
Now you need to send us the dump, you can do it different ways. one potential is:
apt update
apt install curl
apt install zip
zip -9 -r perf.zip dump
curl --upload-file ./perf.zip https://transfer.sh/perf.zip
The last one would give you url with the file being available for download.

MarekM25 · 2022-07-08T08:49:05Z

@nicexe we found one potential issue with our ethstats integration. It could be connected with your problem. We will release the fix soon. The fix is merged to master

LukaszRozmej · 2022-07-11T12:22:30Z

#4249

tkstanczak assigned LukaszRozmej Mar 22, 2022

LukaszRozmej closed this as completed Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive resource utilization when ethstats reporting is enabled #3896

Excessive resource utilization when ethstats reporting is enabled #3896

nicexe commented Mar 22, 2022 •

edited

Loading

nicexe commented Mar 22, 2022

nicexe commented Mar 22, 2022

nicexe commented Mar 23, 2022

varasev commented Mar 24, 2022

nicexe commented Mar 24, 2022

nicexe commented Mar 24, 2022

LukaszRozmej commented Mar 24, 2022

varasev commented Mar 27, 2022

LukaszRozmej commented Mar 28, 2022

nicexe commented Mar 28, 2022

varasev commented Mar 29, 2022

LukaszRozmej commented Mar 30, 2022 •

edited

Loading

LukaszRozmej commented Mar 31, 2022 •

edited

Loading

MarekM25 commented Jul 8, 2022 •

edited

Loading

LukaszRozmej commented Jul 11, 2022

Excessive resource utilization when ethstats reporting is enabled #3896

Excessive resource utilization when ethstats reporting is enabled #3896

Comments

nicexe commented Mar 22, 2022 • edited Loading

nicexe commented Mar 22, 2022

nicexe commented Mar 22, 2022

nicexe commented Mar 23, 2022

varasev commented Mar 24, 2022

nicexe commented Mar 24, 2022

nicexe commented Mar 24, 2022

LukaszRozmej commented Mar 24, 2022

varasev commented Mar 27, 2022

LukaszRozmej commented Mar 28, 2022

nicexe commented Mar 28, 2022

varasev commented Mar 29, 2022

LukaszRozmej commented Mar 30, 2022 • edited Loading

LukaszRozmej commented Mar 31, 2022 • edited Loading

MarekM25 commented Jul 8, 2022 • edited Loading

LukaszRozmej commented Jul 11, 2022

nicexe commented Mar 22, 2022 •

edited

Loading

LukaszRozmej commented Mar 30, 2022 •

edited

Loading

LukaszRozmej commented Mar 31, 2022 •

edited

Loading

MarekM25 commented Jul 8, 2022 •

edited

Loading