-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding cross_contract_calls integration test for the V1->V2 resharding #9402
Conversation
This is what the TestLoop does. It requires of course some sort of a migration to make this test work in that framework. |
chain/chunks/src/logic.rs
Outdated
debug!(target: "chunks", "Reconstructed and decoded chunk {}, encoded length was {}, num txs: {}, I'm {:?}", chunk_hash.0, encoded_chunk.encoded_length(), shard_chunk.transactions().len(), me); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed by accident I will revert it
Ah, I see now what you mean, there is an existing framework for doing this in tests. I will have a look but for now I would rather merge this as is and follow up with improvements later. |
@@ -61,4 +61,4 @@ rusty-tags.vi | |||
# Estimator generated files | |||
costs-*.txt | |||
names-to-stats.txt | |||
data_dump_*.bin | |||
data_dump_*.bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should not be changing this file
client.epoch_manager.get_epoch_id_from_prev_block(&head.last_block_hash).unwrap(); | ||
let block_producer = | ||
client.epoch_manager.get_block_producer(&epoch_id, height).unwrap(); | ||
let _span = tracing::debug_span!(target: "test", "", client=?block_producer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can directly call .entered()
here? Can add some text to the span as well or remove altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looking at the changes, PR includes a lot of refactoring which is not necessarily directly related to title of the PR. It may be better to separate refactoring part out as its own PR for better review-ability
let block_producer_client = env.client(&block_producer); | ||
let mut block = block_producer_client.produce_block(height).unwrap().unwrap(); | ||
set_block_protocol_version(&mut block, block_producer.clone(), protocol_version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just moved into {} block for block var
Generally I agree but I struggle to make that work in github. I couldn't get PRs to be stacked on top of each other, if anyone knows how to do it right I'm all ears. Some of the refactoring is relevant here, like process_tx is replaced some other code in second location. |
5bd6d6c
to
506147b
Compare
I usually just submit the refactoring as a separate PR, with a comment that it's prep for another PR. Usually I don't submit the second PR yet because it requires the changes from the first PR. When I want to submit them both together for better context, I like to submit a dummy PR from the second branch targeting the first PR branch. This way people can review the changes of the second PR in isolation. In any case, changes after reviewing PR0 will mess things up for PR1 and after the squash-merge you will have to rebase + force-push PR1 anyway. So I usually just wait with PR1 until PR0 has landed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks ok :)
Tbh, some of the refactored code could probably use more refactoring. But there could always be more refactoring. Your changes look like an overall improvement to me.
// Test cross contract calls | ||
// This test case tests when there are missing chunks in the produced blocks | ||
// This is to test that all the chunk management logic in sharding split is correct | ||
fn test_shard_layout_upgrade_missing_chunks(p_missing: f64) { | ||
init_test_logger(); | ||
|
||
let resharding_type = ReshardingType::V1; | ||
let genesis_protocol_versino = get_genesis_protocol_version(&resharding_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let genesis_protocol_versino = get_genesis_protocol_version(&resharding_type); | |
let genesis_protocol_version = get_genesis_protocol_version(&resharding_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never know, maybe versino is Spanish for version? ;)
Thanks, good catch, will fix soon.
@@ -18,6 +18,7 @@ rand_hc.workspace = true | |||
serde_json.workspace = true | |||
smart-default.workspace = true | |||
tracing.workspace = true | |||
itertools.workspace = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: are you actually using itertools
in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I usually use collect_vec when printing debug logs. How would you feel about leaving that in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, I don't really like adding dependencies that aren't used in committed code. It's the kind of thing I personally might create a PR, just to remove a dead dependency. So at least a comment why it's in would be useful to prevent that sort of engineers canceling out each others "improvements".
But that said, itertools wouldn't be the worst to have pulled in for no reason, so I won't stop you from adding it if you hate writing collect::<Vec<_>>
that much :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, I'll leave that in and add a comment.
#[cfg(feature = "protocol_feature_simple_nightshade_v2")] | ||
// TODO(resharding) this test is currently broken, potentially due to lack of | ||
// flat storage support. Once flat storage for resharding is fully implemented | ||
// this test should be revisited, fixed and re-enabled. | ||
#[ignore] | ||
#[test] | ||
fn test_shard_layout_upgrade_cross_contract_calls_v2() { | ||
test_shard_layout_upgrade_cross_contract_calls_impl(ReshardingType::V2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue you could link here where readers can figure out or ask about the progress on fixing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few different issues tracking the flat storage progress but let me link the resharding tracking issue.
506147b
to
8a91db0
Compare
…harding (#9402) The goal of this PR is to refactor the existing cross_contract_calls integration test so that it can be reused for testing the new resharding and do that. I have encountered numerous issues when trying to adapt that test because enabling the new resharding also enables all of other protocol features since simple nightshade. The test wasn't quite ready for it. I managed to fix a few issues but not all. I'm leaving it unfinished for now as I suspect the current issue may be due to lack of flat storage support so it's better to wait for that. I marked the new test as #[ignore]. I would still like to merge in all of the refactoring, fixes and improvements I made so far and I will pick it up again once flat storage is supported in resharding. Some of the new features are: - I refactored the cross_contract_calls so that it can handle either resharding type. - I allowed not all clients to process all transactions but also route them. This was needed only before I set all clients to track all shards but is correct, handles both cases and it more future proof. Some of the challenges were: - The test was failing with missing chunk error. This was because the "Partial Encoded Chunk Forward" messages were not correctly handled in the test infra. I fixed this issue in TestEnv::process_partial_encoded_chunks. There are two parts of the fix: - handle the forwards - keeping looping until all messages are processed - this is needed because one message can trigger sending of another message - The protocol version was not getting upgraded and because of that the resharding was not getting triggered (or at least not deterministically where I wanted it to happen). This was because of unlucky block producer assignment where one block producer (out of 4) does not actually get to produce any blocks in the (rather short) first epoch. When that happens voting for the new protocol version is stuck at 3/4 = 75% votes which is lower than the default threshold. I lowered the threshold in the test to work around that. - The state sync process was getting triggered and failing because of lack of some event loop or other async framework setup. I fixed that by setting all nodes to track all shards. (not an issue in V0->V1 because when there is 1 shard everyone tracks it and apparently the node keep tracking the new shards as well).
The goal of this PR is to refactor the existing cross_contract_calls integration test so that it can be reused for testing the new resharding and do that. I have encountered numerous issues when trying to adapt that test because enabling the new resharding also enables all of other protocol features since simple nightshade. The test wasn't quite ready for it. I managed to fix a few issues but not all. I'm leaving it unfinished for now as I suspect the current issue may be due to lack of flat storage support so it's better to wait for that. I marked the new test as #[ignore]. I would still like to merge in all of the refactoring, fixes and improvements I made so far and I will pick it up again once flat storage is supported in resharding.
Some of the new features are:
Some of the challenges were: