Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EpochSync] Measure performance improvements from garbage collecting headers #10945

Closed
Tracked by #9581
posvyatokum opened this issue Apr 4, 2024 · 1 comment
Closed
Tracked by #9581
Assignees

Comments

@posvyatokum
Copy link
Member

Context

We know that Epoch Sync will help with the speed of decentralised node initialisation by eliminating header sync of the whole network history. We only measured this improvement once in POC of Epoch Sync experiments. Epoch Sync from genesis took 2 minutes, header sync would have taken >10 hours.

Epoch Sync can also help to significantly decrease disk usage, as we can start garbage collection block headers. BlockHeaders column is around 500Gb on mainnet. If we leave only headers needed for epoch sync, we can decrease the size of the column x10^3 times.

Another potential benefit may be to the node performance. We can expect that a smaller DB has smaller read latency, although that effect may be limited to affected columns only.
Right now, we don't have any estimations for potential performance improvements of the node with garbage collected headers.

Task

We want to run a simplified experiment, where the node will start with garbage collected headers and a backup snapshot that preserves the full column. Headers will be read from the main DB with a fallback on the backup snapshot. Backup is needed to avoid node crashes.
If this experiment shows significant performance improvement, we will priorities Epoch Sync project, as it is now on pause.

@posvyatokum
Copy link
Member Author

Created a node that runs on garbage collected headers with fallback db.
Here is small dashboard for this experiment https://nearinc.grafana.net/goto/uYatIkaSg?orgId=1
Time range in the link starts from the moment the node caught up to the network.

I see some improvement in CPU usage, and comparable (but slightly bigger) block processing time.
Increase in block processing time is counter-intuitive for me, but it can be explained by reading missing headers from the backup db.

@gmilescu Overall, CPU usage is not a problem for us at the moment, and block processing time wasn't affected, so I don't think that garbage collecting headers column will improve our performance noticeably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant