-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not panic upon receiving invalid state witness #11109
Comments
The backtrace from the linked zulip thread suggests that the panic happened inside 2024-04-16T20:21:23.545144Z DEBUG chunk_tracing{chunk_hash=HnFSQEoLMEnMXK2pxnnnbv7GkwFobanyrd7JJbNS2Rrj}:new_chunk{shard_id=3}:apply_chunk{shard_id=3}:process_state_update:apply{protocol_version=84 num_transactions=19}:process_receipt{receipt_id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP predecessor=operator.meta-pool.near receiver=lockup-meta-pool.near id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP}:run{code.hash=EXekfV3kpFHHsTi4JUDh2MVLCKS3hpKdPbXMuRirxrvY vm_kind=NearVm}: vm: close time.busy=49.3µs time.idle=3.42µs
thread '<unnamed>' panicked at core/store/src/trie/trie_storage.rs:317:16:
!!!CRASH!!!: MissingTrieValue(TrieMemoryPartialStorage, 5FWvfWAJxH1mbCHuzLGwBfL9EYjH8YWVin6Pmp3H8gdM)
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: <near_store::trie::trie_storage::TrieMemoryPartialStorage as near_store::trie::trie_storage::TrieStorage>::retrieve_raw_bytes
4: near_store::trie::Trie::internal_retrieve_trie_node
5: near_store::trie::Trie::retrieve_raw_node
6: near_store::trie::Trie::lookup_from_state_column
7: near_store::trie::Trie::get_optimized_ref
8: near_store::trie::Trie::get
9: near_store::trie::update::TrieUpdate::get
10: near_store::get_code
11: node_runtime::actions::execute_function_call
12: node_runtime::Runtime::apply_action
13: node_runtime::Runtime::apply_action_receipt
14: node_runtime::Runtime::apply::{{closure}}
15: node_runtime::Runtime::apply
16: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
17: near_chain::update_shard::apply_new_chunk
18: core::ops::function::FnOnce::call_once{{vtable.shim}}
19: <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute
20: rayon_core::registry::WorkerThread::wait_until_cold But looking at nearcore/core/store/src/trie/trie_storage.rs Lines 311 to 322 in 0202ed6
That's pretty confusing 0_O. @staffik you mentioned that the code was at commit |
This was commit from Alex PR: b62b6a3. |
Ah thanks for the link, now I see it 👍
I took a look at it's the same code as on |
I see this line (called through validate_chunk_state_witness --> apply_new_chunk --> apply_chunk may also get to MissingTrieValue: nearcore/chain/chain/src/runtime/mod.rs Line 917 in a1a01b4
|
Ahh ok, so the panic was caused by custom code that was added for debug purposes. The code on |
Err(e) => match e {
Error::StorageError(err) => match &err {
StorageError::FlatStorageBlockNotSupported(_)
| StorageError::MissingTrieValue(..) => Err(err.into()),
_ => panic!("{err}"),
},
_ => Err(e),
}, This won't panic on It seems that most of /// Errors which may occur during working with trie storages, storing
/// trie values (trie nodes and state values) by their hashes.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StorageError {
/// Key-value db internal failure
StorageInternalError,
/// Requested trie value by its hash which is missing in storage.
MissingTrieValue(MissingTrieValueContext, CryptoHash),
/// Found trie node which shouldn't be part of state. Raised during
/// validation of state sync parts where incorrect node was passed.
/// TODO (#8997): consider including hash of trie node.
UnexpectedTrieValue,
/// Either invalid state or key-value db is corrupted.
/// For PartialStorage it cannot be corrupted.
/// Error message is unreliable and for debugging purposes only. It's also probably ok to
/// panic in every place that produces this error.
/// We can check if db is corrupted by verifying everything in the state trie.
StorageInconsistentState(String),
/// Flat storage error, meaning that it doesn't support some block anymore.
/// We guarantee that such block cannot become final, thus block processing
/// must resume normally.
FlatStorageBlockNotSupported(String),
/// In-memory trie could not be loaded for some reason.
MemTrieLoadingError(String),
} Maybe we shouldn't panic on |
Idk, it doesn't feel very productive to read the code in hopes of finding of a possible panic. AFAIU the panic that spawned this issue can't happen on I remember that we wanted to fuzz the validation code, maybe that'd be a quicker way to find possible crashes in validation? |
Made an issue about fuzzing: #11132 |
Issue is not valid anymore. Closing it |
Relevant discussion: link
As of now, when a chunk validator receives an invalid state witness (e.g. missing contract code), it will panic and crash. We should change the behavior so it simply logs the issue but move on without panicking.
The text was updated successfully, but these errors were encountered: