Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Bad blocks RPC + reporting #9433

Merged
merged 9 commits into from
Sep 8, 2018
Merged

Bad blocks RPC + reporting #9433

merged 9 commits into from
Sep 8, 2018

Conversation

tomusdrw
Copy link
Collaborator

Closes #8761

  • Prints a verbose error message in case bad block is detected.
  • Stores bad blocks in a cache, the cache is limited to take at most 8MBs (should we rather have a count limit?)
  • Exposes bad blocks over RPC attempting to maintain geth's compatibility, but also returns all the same fields we return for regular block RPCs.
  • Adds reason to the RPC response to see internal error message that occurred during import.

@tomusdrw tomusdrw added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. M6-rpcapi 📣 RPC API. labels Aug 29, 2018
Copy link
Collaborator

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still worry a little about the design around badblocks. Using RPC methods doesn't really feel elegant, and in some cases if under attacks, it may not work well.

@holiman would it work if those bad blocks are printed out as tracing info instead?

  • We are not limited to 8mb bad blocks. For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.
  • For cases where you actually want to debug bad blocks, you always have direct node access. So it makes this RPC indirection a little bit unnecessary.


impl BadBlocks {
/// Reports given RLP as invalid block.
pub fn report(&self, raw: Bytes, message: String) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace.

}
},
Err(err) => {
self.bad_blocks.report(raw, format!("{:?}", err));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this configurable? Such as only enable bad blocks reporting when the debug namespace RPC is enabled, or just use another cli flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.

use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction};

/// Debug rpc implementation.
pub struct DebugClient<C> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mod needs to be enabled in parity/rpc_apis.rs? Or did I missed that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.

@tomusdrw tomusdrw added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A0-pleasereview 🤓 Pull request needs code review. labels Aug 29, 2018
@holiman
Copy link
Contributor

holiman commented Aug 29, 2018

The reason to have this functionality is so that we can run one geth node and one parity node in paralell, and detect immediately if a consensus fork has occurred (one rejecting a block that the other has accepted).

The later steps of analyzing the error is not very difficult, although it definitely helps if the client that rejecteded the block also prints out the rlp, for analysis.

We are not limited to 8mb bad blocks. For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.

So, of the cases where real-world consensus issues have happened, none of them have been at that level of sophistication -- afiak all of them have been incidental. But even so -- how could they be flooded with non-consensus-critical bad blocks? Those would be rejected due to low difficulty even before going into actual block execution.

Copy link
Collaborator Author

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general idea is that logs are easily lost, and accessing over RPC makes it easy for anyone to fetch the details and report back for analysis.

For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.

That would require a targeted attack to your particular machine, other bad blocks would most likely be propagated by consensus-affected peers anyway and re-imported in the future, so they would show up in the cache again.

}
},
Err(err) => {
self.bad_blocks.report(raw, format!("{:?}", err));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.

use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction};

/// Debug rpc implementation.
pub struct DebugClient<C> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.

@sorpaas
Copy link
Collaborator

sorpaas commented Aug 29, 2018

Another idea is to move this outside of parity-ethereum binary -- create a utility bin crate that reads parity-ethereum's tracing log, collect bad blocks there, and expose debug_badblocks rpc method in that utility bin. In that way we won't need to worry about badblocks struct creating any extra attack vector on the main parity-ethereum client.

@tomusdrw
Copy link
Collaborator Author

Those would be rejected due to low difficulty even before going into actual block execution.

@holiman What's the definition of a "bad block" then? In this PR we are reporting any block that fails the validation (even before execution) - so even low difficulty, invalid seal, etc. blocks would be reported here.

@holiman
Copy link
Contributor

holiman commented Aug 29, 2018

I don't understand how a small cache of blocks can be a potential attack target? Am I missing something?

@holiman
Copy link
Contributor

holiman commented Aug 29, 2018

@holiman What's the definition of a "bad block" then? In this PR we are reporting any block that fails the validation (even before execution) - so even low difficulty, invalid seal, etc. blocks would be reported here.

Actually, I'm not quite certain -- it's possible we do too. However, as you've pointed out -- an attacker needs to specifically target the node, in order to get his bad blocks propagated.

@holiman
Copy link
Contributor

holiman commented Aug 29, 2018

@holiman
Copy link
Contributor

holiman commented Aug 30, 2018

Some more context: we've had 'reporting' of bad blocks now for quite some time. By 'reporting' I mean printing out a warning about a bad block being encountered, and some details, in the log stream.

This has triggered only a couple of times over the last couple of years, and each time it has been very good info. IIRC, that's what made people notice when Parity mined bad blocks on Ropsten due to the consensus-bug with having EIP-86 enabled.

The added ability to actually get the same info over RPC will make it possible to get instant alerts immediately if a fork occurs. If the fork-monitor needs to parse log output for the same info, it will be quite messy.

If you see fit to limit the badblock-cache even more, then please do so, but I'd really love to have this feature in, even if it's somewhat limited.

Right now, our fork monitor (http://mon04.ethdevops.io/forkmon) would immediately detect if parity head contains something that geth rejects, but the opposite scenario would not be detected until later.

@tomusdrw tomusdrw added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Aug 30, 2018
@5chdn 5chdn added this to the 2.1 milestone Aug 30, 2018
@5chdn
Copy link
Contributor

5chdn commented Sep 4, 2018

How does this relate to #9207 ?

@tomusdrw
Copy link
Collaborator Author

tomusdrw commented Sep 4, 2018

@5chdn #9207 is around for more than a month and it's nowhere close to be neither correct nor finished - it stores bad_blocks hashes in the Blockchain, and the RPC attempts to retrieve them from the database, but those blocks are bad so they never even get to the db. While I appreciate the contribution, reviewing the PR is pretty much implementing that stuff in comments.

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -647,6 +658,7 @@ impl ApiSet {
public_list
},
ApiSet::SafeContext => {
public_list.insert(Api::Debug);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why enable this only for safe context (given that bad blocks cache is enabled by default)? I understand that the debug namespace might have other methods in the future which are unsafe to expose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't enable for two reasons:

  1. debug is potentially unsafe
  2. It should never be used by regular dapps
  3. We have this implemented to support detecting forks, I don't want to make it a first-class namespace where we implement all the methods, cause that most likely won't happen - the methods are not going through EIPs, and they are pretty much geth-specific debug RPCs.

pub fn backstore(&self) -> &LruCache<K, V> {
&self.inner
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary newline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@5chdn 5chdn added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Sep 6, 2018

/// Recently seen bad blocks.
pub struct BadBlocks {
last_blocks: RwLock<MemoryLruCache<H256, (Unverified, String)>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to add the raw_block data (Vec) into the tuple also, to be able to save "undecodable" BadBlocks? Is that makes any sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I though about it actually. It doesn't allow us to return detailed block in the RPC response though. I think currently it makes sense to have something with the same results as geth to start detecting issues asap.

That said, feel free to log it as a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about this and checked the geth implementation and it seems they also don't store block rlp's that aren't decodable.

@tomusdrw
Copy link
Collaborator Author

tomusdrw commented Sep 6, 2018

To follow up on @sorpaas comment. We spoke through a different channel about this and agreed that it doesn't necessarily suits the core client.

The idea that came up was to expose structured logs via RPC (we already to that for recent logs, we could add a subscription), where one would be able to subscribe to some particular events happening in the core client.
Using events data we could build a separate binary that would be able to expose this (and similar) RPCs. Event sourcing would also be useful for all other stats and attaching to events at the runtime would also be quite useful for debugging purposes - imagine attaching to a running node to inspect incoming transactions or networking communication (currently it requires restarting with different logging flag).

Copy link
Contributor

@lexfrl lexfrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but CI is failing..

@5chdn 5chdn merged commit 61bd47c into master Sep 8, 2018
@5chdn 5chdn deleted the td-badblocks branch September 8, 2018 02:04
@5chdn 5chdn mentioned this pull request Sep 8, 2018
@ordian ordian mentioned this pull request Oct 2, 2018
13 tasks
@ordian ordian mentioned this pull request Oct 23, 2018
11 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. M6-rpcapi 📣 RPC API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants