Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition #10

Closed
LogvinovLeon opened this issue Nov 14, 2017 · 18 comments
Closed

Race condition #10

LogvinovLeon opened this issue Nov 14, 2017 · 18 comments
Labels
bug refactor required Fixing this issue requires significant design or architectural changes.

Comments

@LogvinovLeon
Copy link

LogvinovLeon commented Nov 14, 2017

Blockstream fetches the logs by block number. If between the block request and log request block reorg happens - it'll leed to a wrong log history and possibly skipped events.
B (2) C (2)
\ ....... /
A (1)

Let's say someone does reconcileBlock with block B.
Blockstream requests logs from block 2. It gets logs from block C cause the reorg happened. Then another reorg happenes and some block mined on top of A is the new head. Blockstream will not request logs from B cause it missed the first reorg. User will never receive those logs.

On the first sight - it involves perfect timings of reorgs and two reorgs which seems to be impractical, but if we claim that it's a 'reliable' block stream - than this should not happen.

The solution will be to request logs by hash not by number. Unfortunately nodes don't support that.
We should either ask nodes to support that or support returning logs as part of getBlock request. It already has a boolean flag to return transactions. Might as well include logs.

@MicahZoltu
Copy link
Collaborator

On mobile for a week so don't have links handy, but this is a known issue (though it should be better documented) and I believe I created an EIP to get this added to the RPC protocol already.

Unfortunately, the EIP process is painfully slow, so I have no idea how long until this gets addressed. Feel free to submit a PR to update to Readme with a known issues section. If not I'll try to remember to update the Readme when I'm back at a computer next week.

@MicahZoltu
Copy link
Collaborator

// TODO: validate logs are part of expected block hash

Looking through the source code, it looks like I already had an idea on how to fix this. Just check the block hash of the returned block to validate it is correct before processing logs. However, it will require a bit of work to implement since he current design doesn't allow an easy way of exposing the inconsistency back up the call stack. Perhaps the best option is to just drop the logs if that condition is reached? Will need to research more to see if the reorg will correctly be deal with us dropping the logs here.

@MicahZoltu
Copy link
Collaborator

I found the EIP: ethereum/EIPs#597

Someone needs to write an implementation for Geth/Parity as a next step.

@MicahZoltu
Copy link
Collaborator

I took a bit of time to dig into how hard doing the comment would be and remembered why I didn't implement it at the time. The root problem is that the block processor does not have a mechanism for the log processor to say, "the block you just had me process is invalid, please roll it back" other than by throwing an exception. The problem is that by throwing an exception, the block processor will unwind its callstack, and therefore any queued up block processing. If we are in the middle of dealing with a reorg when this happens, the reorg will be partially announced, but we won't announce the removals which is arguably just as bad of a bug as we already have. A comment about this can be seen at

// CONSIDER: the user getting this notification won't have any visibility into the updated block history yet. should we announce new blocks in a `setTimeout`? should we provide block history with new logs? an announcement failure will result in unwinding the stack and returning the original blockHistory, if we are in the process of backfilling we may have already announced previous blocks that won't actually end up in history (they won't get removed if a re-org occurs and may be re-announced). we can't catch errors thrown by the callback be cause it may be trying to signal to use that the block has become invalid and is un-processable

🤔 some more on the issue, perhaps it is "safe" to simply swallow any logs that don't match the expected block hash? This would cause them to not be announced, but that is fine because we know that a re-org is about to happen. However, it is also possible that we'll re-org the block back in before actually witnessing the first re-org, which would then cause us to never announce the logs for that block.

All of this brings me back to the conclusion I think I came to originally, which is that there is no great solution to this problem given the current architecture, so we need to pick which problem we want to have until such time as someone can sit down and come up with a new architecture, or a good solution to this problem. Is it worse to receive double logs, no logs, logs announces without rollbacks, or block announces without rollbacks? I believe I settled on the current solution because it is the least code, therefore the least likely to introduce any unwanted side effects and none of the options are particularly great.

@MicahZoltu
Copy link
Collaborator

EIP accepted as a draft, still needs Geth/Parity implementation: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-234.md

@moodysalem
Copy link

You can also just get the logs from the transaction receipt of each transaction hash in the block, which you get when you first get the block B. Then aggregate logs from all the receipts to report the logs for the block. It is more requests (mitigated by batch requests) but if any request in the bunch fails you can drop the block and try again to get the block by number, presumably ending up with C since a failure to get a receipt would likely be due to a reorg.

@MicahZoltu
Copy link
Collaborator

Yeah, I considered that but it would result in a ton of data being shipped from the node to the client since you would have to dig into the transaction receipts for every transaction, even those that didn't have any logs. 😢

@moodysalem
Copy link

It's really not that bad though if you're only fetching them once and using the batch RPC so you only send a single request. I went that route and it works great. BTW https://ethercast.io is live on mainnet and Kovan now if you'd like to try it.

@MicahZoltu
Copy link
Collaborator

@moodysalem I'm assuming that you are working on a server? My concern is that in a browser or on mobile it doesn't take much to overload the client and/or connection. If you are monitoring for some uncommon event, fetching full blocks for every block ever is a significant increase in data volume compared to fetching blocks without transactions. It isn't about the absolute size as that depends on your environment, it is about the relative size difference requirements.

@MicahZoltu
Copy link
Collaborator

Re-opening the issue as I think it is very valuable to keep some kind of documentation about this problem, and a place for people to aggregate thoughts on it. Feel free to unsubscribe of course. 😁

@MicahZoltu
Copy link
Collaborator

I believe the correct architectural change is to make it so when something goes wrong, blockstream fires a rollback for any log/block that it fired since the last known good state. The last known good state is basically anytime the block history or log history promises finalize.

This will require tracking all announced changes, rolling them back in order, and swapping out this.blockHistory and this.logHistory with the last known good (which of course means we need to track LKG).

@MicahZoltu
Copy link
Collaborator

More notes to future implementer (me): Rather than firing block/log callbacks immediately, we can queue them up in a datastructure internal to BlockAndLogStreamer. Then we can wrap reconcileNewBlock in a try/catch and only fire the callbacks after await this.blockHistory returns, because at that point we know that we have successfully synced everything. In the catch block, we would swap this.blockHistory back to last known good. This way we don't have to fire removed announcements because we instead simply wait to fire announcements until we are confident things aren't broken.

@MicahZoltu
Copy link
Collaborator

While #18 helps this issue, we still need EIP-234 to be implemented in all major clients before we can actually fix the issue mention here. The remaining unhandled case is when the following sequence occurs:

  1. Block 123 hash 0xABC is announced to clients
  2. Reorg occurs causing block 123 to now have hash 0xDEF
  3. Client requests logs for block 123
  4. No logs match filter in block 123 hash 0xDEF, server responds with empty log array
  5. Client accepts that there are no present logs and continues along
  6. Reorg occurs causing block 123 to point back to 0xABC
  7. Client never is aware that they didn't receive logs in block 0xABC

Once EIP-234 is implemented we can ask the server for logs for a specific block hash and get an error if the block is not found, rather than being given logs for an incorrect block.

@nionis
Copy link

nionis commented Aug 1, 2018

EIP-234 implemented on geth

@MicahZoltu
Copy link
Collaborator

openethereum/parity-ethereum#9256 implements this on Parity.

@cballou
Copy link

cballou commented Dec 5, 2018

Now that EIP-234 has been implemented in geth and parity, is it safe to say all the pieces are there to patch this?

@MicahZoltu
Copy link
Collaborator

Ah. Sorry. Version 7.0.0 was published to NPM a while ago which fully fixes this. I didn't realize this issue was still open.

On mobile now, I'll read over the issue again when at computer and verify nothing is left and then close.

@MicahZoltu
Copy link
Collaborator

Note that 7.0.0 has some other breaking changes in it, in particular all of the logs for a block are sent in a single callback with an array of logs rather than in a separate callback per log. It should be pretty easy to fan out if you prefer the old behavior with a simple adapter, but it will require some code changes. Also, the signature for the getLogs dependency that the library wants changed due to fetching by hash instead of by number.

If you only want the hash change, then version 6.0.1 will get you that, but I do recommend using 7.0.0+ since that is the line that will receive support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug refactor required Fixing this issue requires significant design or architectural changes.
Projects
None yet
Development

No branches or pull requests

5 participants