Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client's SubscribeFilterLogs stops reporting events after some time #23845

Open
beeb opened this issue Nov 2, 2021 · 15 comments
Open

Client's SubscribeFilterLogs stops reporting events after some time #23845

beeb opened this issue Nov 2, 2021 · 15 comments
Labels

Comments

@beeb
Copy link

beeb commented Nov 2, 2021

System information

go-ethereum v1.10.8
OS & Version: linux

Expected behaviour

When using ethclient.Client.SubscribeFilterLogs, all events matching the filter are caught.

Actual behaviour

After some time, the script stops reporting new events and it needs to be restarted to start working again.

Steps to reproduce the behaviour

var EthClient *ethclient.Client
// ...
currentBlock, _ := EthClient.BlockNumber(context.Background())
query := ethereum.FilterQuery{
	Addresses: []common.Address{myAddress},
	Topics:    [][]common.Hash{{myTopic}},
	FromBlock: big.NewInt(int64(currentBlock)),
}
logs := make(chan types.Log)
sub, err := EthClient.SubscribeFilterLogs(context.Background(), query, logs)
if err != nil {
	log.Fatal(err)
	return
}

for {
	select {
	case err := <-sub.Err():
		log.Error(err)
		continue
	case vLog := <-logs:
		event, err := MyContract.ParseSomeEvent(vLog)
		if err != nil {
			log.Error(err)
			continue
		}
		// do something here with event

	}
}

Hard for me to convey what needs to be done. I guess try to leave a script running for days and see if it stops catching new events after a while.

I'm not sure what could cause this issue. I automated script restart with cron to try to mitigate this issue but it still happens that the SubscribeFilterLogs stops reporting new events.

@beeb beeb added the type:bug label Nov 2, 2021
@ligi
Copy link
Member

ligi commented Nov 3, 2021

Any chance you could provide some of the geth log around the time it happens?

@beeb
Copy link
Author

beeb commented Nov 9, 2021

Hard for me to identify when it starts failing as I monitor for a few events a day only... Might try to find something by finding events that were missed manually and looking at the time.

@Vanclief
Copy link

Vanclief commented Nov 17, 2021

Having the same issue.

Could it be related to a disconnection and the function is not notified? Maybe keep making a ping to the WS to avoid disconnection?

@beeb
Copy link
Author

beeb commented Nov 17, 2021

Having the same issue.

Could it be related to a disconnection and the function is not notified? Maybe keep making a ping to the WS to avoid disconnection?

Not for me as I'm connected through IPC.

@Vanclief
Copy link

Weird, I am connecting over an Alchemy/Infura websocket and it will intermittently drop events after some time.

I redeployed my service today while including the Subscription call inside the for, to see if by recreating the subscription on each loop, it would not drop. I am not sure if it works on the long term but so far it hasn't given me an issue.

var EthClient *ethclient.Client
// ...
currentBlock, _ := EthClient.BlockNumber(context.Background())
query := ethereum.FilterQuery{
	Addresses: []common.Address{myAddress},
	Topics:    [][]common.Hash{{myTopic}},
	FromBlock: big.NewInt(int64(currentBlock)),
}
logs := make(chan types.Log)

for {
        sub, err := EthClient.SubscribeFilterLogs(context.Background(), query, logs)
        if err != nil {
	        log.Fatal(err)
	        return
        }
	select {
	case err := <-sub.Err():
		log.Error(err)
		continue
	case vLog := <-logs:
		event, err := MyContract.ParseSomeEvent(vLog)
		if err != nil {
			log.Error(err)
			continue
		}
		// do something here with event

	}
}

@beeb
Copy link
Author

beeb commented Nov 18, 2021

Thanks @Vanclief , looks like it could be a helpful workaround if the subscription ends up in a bad state somehow after an event is picked up.

I have for now abandonned the project that was using the code above so I won't be able to test it immediately, but I plan on running it again at some point.

@yelhouti
Copy link

@Vanclief i am facing a similar issue, and it seems to me that the when an error happen, the subscription get stuck in an infinite look emit and error to the channel with null and logging error happened

@xiezm
Copy link

xiezm commented Feb 11, 2022

@yelhouti
Copy link

@xiezm thanks a lot, this could definitely help.

I still think that this issue should be fixed here.

@Vanclief
Copy link

@xiezm Thanks for sharing, that definitely looks like the correct way to do it instead of just spamming new ws connections.

lalexgap added a commit to statechannels/go-nitro that referenced this issue Oct 4, 2022
lalexgap added a commit to statechannels/go-nitro that referenced this issue Oct 5, 2022
@lalexgap
Copy link

lalexgap commented Oct 5, 2022

I did a little spelunking around the go-ethereum code base and found that the filter system config seems to default to 5 minutes:

// Config represents the configuration of the filter system.
type Config struct {
	LogCacheSize int           // maximum number of cached blocks (default: 32)
	Timeout      time.Duration // how long filters stay active (default: 5min)
}

Could that be the underlying cause? The duration of 5 minutes seems to roughly match when we stop receiving events.

It looks like the LogCacheSize can be set using configuration but not Timeout

@beeb
Copy link
Author

beeb commented Oct 12, 2022

@lalexgap the duration was much longer for me, and I think this timeout is only triggered if you do not poll for the duration. As long as you poll the filter, it should stay alive.

lalexgap added a commit to statechannels/go-nitro that referenced this issue Oct 19, 2022
@atulsriv
Copy link

any update on this issue? I'm seeing an error after some hours of using the subscription of a tcp i/o timeout. The case of sub.Err() catches the tcp i/o timeout when it happens and exits the application. Do we need some retry logic for these, anyone ran into this issue?

` sub, err := e.client.SubscribeFilterLogs(ctx, q, e.logUpdate)
if err != nil {
return fmt.Errorf("failed to subscribe to filter logs: %w", err)
}
defer sub.Unsubscribe()

logs, err := e.client.FilterLogs(ctx, q)
if err != nil {
	return fmt.Errorf("failed to execute filter query command: %w", err)
}

go func() {
	for _, l := range logs {
		e.logUpdate <- l
	}
}()

for {
	select {
	case err := <-sub.Err():
		return fmt.Errorf("unexpected subscription err: %w", err)`

@fbac
Copy link
Contributor

fbac commented Oct 5, 2023

Bumping this one.

In my specific case, I avoid getting i/o timeout errors from the websocket by using a customized websocket, and then creating a new ethclient with it.

// This creates a new websocket dialer using "github.com/gorilla/websocket"
dialer := rpc.WithWebsocketDialer(websocket.Dialer{
HandshakeTimeout: 45 * time.Second,
NetDial: (&net.Dialer{
	Timeout:   45 * time.Second,
	KeepAlive: 30 * time.Second,
}).Dial,
})

// Creates an rpc.Client with the customized websocket
client, err := rpc.DialOptions(context.Background(), nodeURL, dialer)
/* ... */

// Create the ethClient with a safe websocket that doesn't timeout
ethClient := ethclient.NewClient(client)

Now, the problem is that the subscription somehow "times out" after some time without receiving new events from the contract. And on this case I don't expect to receive daily events, but from time to time.
Also, from my observations, this always happens before the 24 hours time mark without receiving new events.

The subscription never fails or sends an error, it's just simply silently times out. If new events happen after this point the sub won't ever receive them again.

In order to reactivate it the service must be restarted, and it will receive all the last events happened since it died.

So, at some point in the subscription lifecycle it's dying and not being properly notified with an Unsubscribe.

@SteMak
Copy link

SteMak commented May 9, 2024

Bumping this.

The subscription never fails or sends an error, it's just simply silently times out. If new events happen after this point the sub won't ever receive them again.

I'm running the script in pair with a local hardhat node. An interesting moment: eth_unsubscribe is not called at the moment when the connection actually dies, it is called when I stop my script (I suppose, the time of force closing websocket), so from the node perspective the connection is still alive and the events data should be sent to the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants