Periodically resubscribe to chain event logs #922

lalexgap · 2022-10-04T20:04:00Z

Fixes statechannels/go-nitro-testground#116

It looks like geth has an open issue around SubscribeFilterLogs failing to report events after it's been running for some time. This is similar to the behaviour we see in our long running test failure.

Based on some code snippets on the issue the eth chain service now resubscribes to the event logs every 60 seconds. This avoids the problem by ensuring that we don't have long running subscription. I chose the duration of 60 seconds based on 1) that's what they used and 2) and it seemed reasonable.

Here is a 5 minute testground run with this change that succeeds! Here is a 10 min run🚀

github-actions · 2022-10-04T20:06:35Z

🧪 Testground Run for `c0fb21f`

🧪 Testground Run for `4f75625`

🧪 Testground Run for `67077f0`

🧪 Testground Run for `08be481`

🧪 Testground Run for `0c41f84`

🧪 Testground Run for `b70ab36`

🧪 Testground Run for `b196845`

kerzhner · 2022-10-04T20:51:59Z

Do you think our issue and ethereum/go-ethereum#23845 are related? The geth issue description talks about missing events after days of running a script, not minutes...

In general, having to resubscribe to events every minute seems a bit wild. Also, is it possible to miss events while a subscription is inactive (after cancellation but before resubscription)?

client/engine/chainservice/eth_chainservice.go

kerzhner · 2022-10-04T23:30:42Z

client/engine/chainservice/eth_chainservice.go

+	return nil
+}
+
+func (ecs *EthChainService) listenForLogEvents() {


I am probably missing something, but why is the below logic moved to listenForLogEvents?

Do you think our issue and ethereum/go-ethereum#23845 are related? The geth issue description talks about missing events after days of running a script, not minutes...

Based on our test run behaviour I'm pretty sure that our issue is being caused by the geth issue as:

Without this change (with a long running test) we see the withdraw transaction submitted but no chain events in the logs

Without this change tests will fail if they run over ~3 minutes

In general, having to resubscribe to events every minute seems a bit wild.

Agreed 😔.

Also, is it possible to miss events while a subscription is inactive (after cancellation but before resubscription)?

That's a good point! I haven't run into that problem when running testground runs. I think we may be ok but I'll investigate.

Based on our test run behaviour I'm pretty sure that our issue is being caused by the geth issue as:

Without this change (with a long running test) we see the withdraw transaction submitted but no chain events in the logs

Without this change tests will fail if they run over ~3 minutes

For a test with a long runtime, does the test open/close more on-chain channels than a short running test? Or do longer running tests open/use/close just more virtual channels?

We use a websocket for connecting to an Ethereum JSON rpc endpoint. I wonder if there is some timeout issues with that websocket...

For a test with a long runtime, does the test open/close more on-chain channels than a short running test? Or do longer running tests open/use/close just more virtual channels?

Nope, the long running test would create the same amount of ledger channels, so we should have the same amount of on chain channels (every participant creates a ledger channel with every hub).

I did a little spelunking through go-ethereum and found this:

// Config represents the configuration of the filter system. type Config struct { LogCacheSize int // maximum number of cached blocks (default: 32) Timeout time.Duration // how long filters stay active (default: 5min) }

5 minutes seems roughly in line with the failures we've seen, so that might explain it.

Edit: I've posted my hunch to the issue

Does it makes sense to change our resubscription period to something like 4 minutes?

Does it makes sense to change our resubscription period to something like 4 minutes?

Yup, now that we know it's a 5 minute timeout it makes sense to use a larger value.

workaround for ethereum/go-ethereum#23845

kerzhner · 2022-10-06T14:30:55Z

client/engine/chainservice/eth_chainservice.go

@@ -111,25 +116,41 @@ func (ecs *EthChainService) SendTransaction(tx protocols.ChainTransaction) error
 }

 func (ecs *EthChainService) subcribeToEvents() error {
+


Suggested change

kerzhner

Approving this PR, but I am worried about:

Missing events between unsubscribe and resubscribe.
Duplicate events if resubscribe has a lookback window for events.

lalexgap · 2022-10-06T17:56:51Z

Approving this PR, but I am worried about:

Missing events between unsubscribe and resubscribe.

Duplicate events if resubscribe has a lookback window for events.

As discussed in slack the plan is to merge this in and keep an eye out for any duplicate or missing events

lalexgap changed the title ~~Periodically resubscribe to events~~ Periodically resubscribe to chain event logs Oct 4, 2022

lalexgap requested a review from kerzhner October 4, 2022 20:16

lalexgap force-pushed the eth-resub branch from 4f75625 to 67077f0 Compare October 4, 2022 20:20

lalexgap force-pushed the eth-resub branch from 67077f0 to 08be481 Compare October 4, 2022 21:42

kerzhner reviewed Oct 4, 2022

View reviewed changes

lalexgap added 3 commits October 5, 2022 11:23

use workaround

b876c8c

workaround for ethereum/go-ethereum#23845

Change the nightly test to run for 10 minutes

f1a7e83

simplify

b70ab36

lalexgap force-pushed the eth-resub branch from 0c41f84 to b70ab36 Compare October 5, 2022 18:23

kerzhner reviewed Oct 6, 2022

View reviewed changes

kerzhner approved these changes Oct 6, 2022

View reviewed changes

switch to longer resub time

b196845

lalexgap merged commit c8a154e into main Oct 6, 2022

lalexgap mentioned this pull request Oct 7, 2022

Long running tests seems to stall closing ledgers occasionally statechannels/go-nitro-testground#130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodically resubscribe to chain event logs #922

Periodically resubscribe to chain event logs #922

lalexgap commented Oct 4, 2022 •

edited

Loading

github-actions bot commented Oct 4, 2022 •

edited

Loading

kerzhner commented Oct 4, 2022

kerzhner Oct 4, 2022

lalexgap Oct 5, 2022

kerzhner Oct 5, 2022

lalexgap Oct 5, 2022

This comment was marked as off-topic.

lalexgap Oct 5, 2022 •

edited

Loading

kerzhner Oct 6, 2022

lalexgap Oct 6, 2022

kerzhner Oct 6, 2022

kerzhner left a comment

lalexgap commented Oct 6, 2022

		@@ -111,25 +116,41 @@ func (ecs *EthChainService) SendTransaction(tx protocols.ChainTransaction) error
		}

		func (ecs *EthChainService) subcribeToEvents() error {

Periodically resubscribe to chain event logs #922

Periodically resubscribe to chain event logs #922

Conversation

lalexgap commented Oct 4, 2022 • edited Loading

github-actions bot commented Oct 4, 2022 • edited Loading

🧪 Testground Run for c0fb21f

🧪 Testground Run for 4f75625

🧪 Testground Run for 67077f0

🧪 Testground Run for 08be481

🧪 Testground Run for 0c41f84

🧪 Testground Run for 0c41f84

🧪 Testground Run for b70ab36

🧪 Testground Run for b70ab36

🧪 Testground Run for b196845

kerzhner commented Oct 4, 2022

kerzhner Oct 4, 2022

Choose a reason for hiding this comment

lalexgap Oct 5, 2022

Choose a reason for hiding this comment

kerzhner Oct 5, 2022

Choose a reason for hiding this comment

lalexgap Oct 5, 2022

Choose a reason for hiding this comment

This comment was marked as off-topic.

lalexgap Oct 5, 2022 • edited Loading

Choose a reason for hiding this comment

kerzhner Oct 6, 2022

Choose a reason for hiding this comment

lalexgap Oct 6, 2022

Choose a reason for hiding this comment

kerzhner Oct 6, 2022

Choose a reason for hiding this comment

kerzhner left a comment

Choose a reason for hiding this comment

lalexgap commented Oct 6, 2022

lalexgap commented Oct 4, 2022 •

edited

Loading

github-actions bot commented Oct 4, 2022 •

edited

Loading

🧪 Testground Run for `c0fb21f`

🧪 Testground Run for `4f75625`

🧪 Testground Run for `67077f0`

🧪 Testground Run for `08be481`

🧪 Testground Run for `0c41f84`

🧪 Testground Run for `0c41f84`

🧪 Testground Run for `b70ab36`

🧪 Testground Run for `b70ab36`

🧪 Testground Run for `b196845`

lalexgap Oct 5, 2022 •

edited

Loading