-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically resubscribe to chain event logs #922
Conversation
Do you think our issue and ethereum/go-ethereum#23845 are related? The geth issue description talks about missing events after days of running a script, not minutes... In general, having to resubscribe to events every minute seems a bit wild. Also, is it possible to miss events while a subscription is inactive (after cancellation but before resubscription)? |
return nil | ||
} | ||
|
||
func (ecs *EthChainService) listenForLogEvents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably missing something, but why is the below logic moved to listenForLogEvents
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think our issue and ethereum/go-ethereum#23845 are related? The geth issue description talks about missing events after days of running a script, not minutes...
Based on our test run behaviour I'm pretty sure that our issue is being caused by the geth issue as:
- Without this change (with a long running test) we see the withdraw transaction submitted but no chain events in the logs
- Without this change tests will fail if they run over ~3 minutes
In general, having to resubscribe to events every minute seems a bit wild.
Agreed 😔.
Also, is it possible to miss events while a subscription is inactive (after cancellation but before resubscription)?
That's a good point! I haven't run into that problem when running testground runs. I think we may be ok but I'll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our test run behaviour I'm pretty sure that our issue is being caused by the geth issue as:
- Without this change (with a long running test) we see the withdraw transaction submitted but no chain events in the logs
- Without this change tests will fail if they run over ~3 minutes
For a test with a long runtime, does the test open/close more on-chain channels than a short running test? Or do longer running tests open/use/close just more virtual channels?
We use a websocket for connecting to an Ethereum JSON rpc endpoint. I wonder if there is some timeout issues with that websocket...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a test with a long runtime, does the test open/close more on-chain channels than a short running test? Or do longer running tests open/use/close just more virtual channels?
Nope, the long running test would create the same amount of ledger channels, so we should have the same amount of on chain channels (every participant creates a ledger channel with every hub).
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a little spelunking through go-ethereum
and found this:
// Config represents the configuration of the filter system.
type Config struct {
LogCacheSize int // maximum number of cached blocks (default: 32)
Timeout time.Duration // how long filters stay active (default: 5min)
}
5 minutes seems roughly in line with the failures we've seen, so that might explain it.
Edit: I've posted my hunch to the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it makes sense to change our resubscription period to something like 4 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it makes sense to change our resubscription period to something like 4 minutes?
Yup, now that we know it's a 5 minute timeout it makes sense to use a larger value.
@@ -111,25 +116,41 @@ func (ecs *EthChainService) SendTransaction(tx protocols.ChainTransaction) error | |||
} | |||
|
|||
func (ecs *EthChainService) subcribeToEvents() error { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR, but I am worried about:
- Missing events between unsubscribe and resubscribe.
- Duplicate events if resubscribe has a lookback window for events.
As discussed in slack the plan is to merge this in and keep an eye out for any duplicate or missing events |
Fixes statechannels/go-nitro-testground#116
It looks like
geth
has an open issue aroundSubscribeFilterLogs
failing to report events after it's been running for some time. This is similar to the behaviour we see in our long running test failure.Based on some code snippets on the issue the eth chain service now resubscribes to the event logs every 60 seconds. This avoids the problem by ensuring that we don't have long running subscription. I chose the duration of 60 seconds based on 1) that's what they used and 2) and it seemed reasonable.
Here is a 5 minute testground run with this change that succeeds! Here is a 10 min run🚀