-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: introduce reth era export #15909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, this looks pretty good!
I think we should considering merging both import and export into one crate and rename era-import @RomanHodulak
and I believe we're missing tx.commit calls in import
@RomanHodulak
we could also handle those externally but maybe a better idea to commit file after file?
I think merging and renaming the crates fits. Commit per ERA file seems best for this use-case, i.e. the file remains imported even if the process is interupted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, only thing missing really is a rebase and moving a few things around
crates/era-utils/src/export.rs
Outdated
/// Last block to export | ||
pub last_block_number: BlockNumber, | ||
/// Number of blocks per era1 file | ||
pub step: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of blocks per ERA1 file is capped at 8192: https://github.com/eth-clients/e2store-format-specs/blob/main/formats/era1.md
The importer assumes that every file contains 8192 blocks except the last one.
I think the step
should not be configurable and simply set to that constant.
pub step: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I added because of the geth
function that is referred for in the issue :
https://github.com/ethereum/go-ethereum/blob/9b4eab6a29704f55fa7b4b92e296094f0dbcee22/cmd/utils/cmd.go#L403-L405
We can add one better Default otherwise :
impl Default for ExportConfig {
fn default() -> Self {
Self {
dir: PathBuf::new(),
first_block_number: 0,
last_block_number: MAX_BLOCKS_PER_ERA1,
step: MAX_BLOCKS_PER_ERA1,
network: String::new(),
}
}
}
and checked in the cli args it never goes up, just wondering how much for the user experience it can be good to keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I have a final word on it @mattsse @RomanHodulak if we keep it ?
is there an advantage to reduce the number of blocks per file and so have a bigger number of files to export the blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't hurt if we have a setting for this and use the epoch consts as default imo
maybe there's some usecase for dumping them in one big file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
I have a few nits/suggestions and would appreciate some additional docs, because this is a bit tricky to follow for anyone unfamiliar with era format
crates/era-utils/src/export.rs
Outdated
/// Last block to export | ||
pub last_block_number: BlockNumber, | ||
/// Number of blocks per era1 file | ||
pub step: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't hurt if we have a setting for this and use the epoch consts as default imo
maybe there's some usecase for dumping them in one big file
} | ||
} | ||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)] | ||
async fn test_roundtrip_export_after_import() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is test does some advanced era specific math, I wont be able to debug this, because not obvious what this does.
can we document the purpose of the test and the individual steps involved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if it is more clear now.
crates/era-utils/tests/it/history.rs
Outdated
let last_block = EXPORT_FIRST_BLOCK + EXPORT_TOTAL_BLOCKS - 1; | ||
let expected_files = EXPORT_TOTAL_BLOCKS.div_ceil(EXPORT_BLOCK_PER_FILE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this requires era specific knowdlege, non obvious what this checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the required knowledge here is that according to spec each ERA1 file can contain up to 8192 blocks?
The last_block
could also be a const
.
…max_blocks_per_file
mod history; | ||
|
||
const fn main() {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the main
function be kept to make this work as a single binary? cc @mattsse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep the main here because this is a single binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes sorry I will restore it probably some merge conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon looking into this further, I don't think the const fn main() {}
is actually needed or should even be here.
https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html
Then rustc compiles the library with --cfg test. It also injects a generated fn main(), which invokes all functions annotated with #[test]. The result is an executable file which, when run subsequently by Cargo, executes the tests.
https://doc.rust-lang.org/cargo/guide/project-layout.html
If a binary, example, bench, or integration test consists of multiple source files, place a main.rs file along with the extra modules within a subdirectory of the src/bin, examples, benches, or tests directory. The name of the executable will be the directory name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, pending @RomanHodulak
mod history; | ||
|
||
const fn main() {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep the main here because this is a single binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @lean-apple. I appreciate you spending time on this.
May I continue on the cli? |
Closes #15495.
Only start introducing the functional logic (not the interface with the cli) :
For a certain number of block set up in the config, it fetches the data (header, block body, receipts) to reconstruct the block history and write them directly into era files.
Still to do :