Skip to content

Conversation

mriccobene
Copy link
Member

@mriccobene mriccobene commented Feb 18, 2022

This PR removes some unneeded limitations so the header request rate is improved a lot.

Notes:

  1. this PR uses dense pre-verified hashes, not the sparse one of the main branch. Initial performance tests have shown that this branch behaves like Erigon. Another branch and PR will be created to compare performances with and without dense pre-verified hashes.

  2. the tests of this PR were done with a different version of the db::access_layer, an old version NOT using mdbx_cursor_bind. In a tests spanning three days no errors was raised by mdbx. Differently using the main branch db::access_layer tests breaks frequently with a MDBX_ERROR. I defer to another PR to understand if there is a problem in the downloader, if there is a problem in db::access_layer or in the mdbx implementation.

@codecov
Copy link

codecov bot commented Feb 18, 2022

Codecov Report

Merging #569 (d3789e8) into master (8ee9d5f) will decrease coverage by 0.24%.
The diff coverage is 20.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #569      +/-   ##
==========================================
- Coverage   81.86%   81.61%   -0.25%     
==========================================
  Files         169      169              
  Lines       13729    13767      +38     
==========================================
- Hits        11239    11236       -3     
- Misses       2490     2531      +41     
Impacted Files Coverage Δ
...de/silkworm/downloader/internals/working_chain.cpp 66.15% <12.96%> (-3.16%) ⬇️
.../silkworm/downloader/internals/persisted_chain.cpp 66.98% <100.00%> (-1.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ee9d5f...d3789e8. Read the comment docs.

@mriccobene mriccobene marked this pull request as ready for review February 28, 2022 14:31
@mriccobene mriccobene changed the title [WIP] Improving downloader performances Improving downloader performances Feb 28, 2022
@mriccobene mriccobene force-pushed the downloader_performances branch 2 times, most recently from ca44c57 to 9a74908 Compare February 28, 2022 15:30
@AndreaLanfranchi
Copy link
Contributor

FYI Due to denser preverified hashes it takes about 1 hour to build release on my machine.

@yperbasis
Copy link
Member

FYI Due to denser preverified hashes it takes about 1 hour to build release on my machine.

If I'm not mistaken, genesis_mainnet.cpp doesn't cause long compilation with MSVC, despite also being quite heavy. I wonder if the reason is that genesis_mainnet.cpp uses much simpler constructs (char []) compared with preverified_hashes_mainnet.cpp. Perhaps switching preverified hashes to something like char [] instead of std::set could help with the compilation time.

@AndreaLanfranchi
Copy link
Contributor

If I'm not mistaken, genesis_mainnet.cpp doesn't cause long compilation with MSVC, despite also being quite heavy. I wonder if the reason is that genesis_mainnet.cpp uses much simpler constructs (char []) compared with preverified_hashes_mainnet.cpp. Perhaps switching preverified hashes to something like char [] instead of std::set could help with the compilation time.

You're right.
In an early commit I changed the preverified header generation tool to an array of char : the purpose was to "load at runtime" the set from a basic static preallocated memory area. But it got reverted nevertheless 🤷

@AndreaLanfranchi
Copy link
Contributor

Besides genesis_mainnet is 44300 line with 16 bytes each
Preverified hashes is 73800 lines with 32bytes each into a set (4 times bigger)

SILK_TRACE << "WorkingChain: trying to extend anchor " << anchor->blockHeight
<< " (chain bundle len = " << anchor->chainLength()
<< ", last link = " << anchor->lastLinkHeight << " )";

return {packet, penalties}; // try (again) to extend this anchor
return {std::move(packet), std::move(penalties)}; // try (again) to extend this anchor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move doesn't make any difference for packet

Copy link
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to revert the changes to preverified_hashes_mainnet.cpp in this PR so that we don't ruin our Windows compilation time. Instead, in a separate PR, switch preverified_hashes_mainnet.cpp to something simple as char[] (which will hopefully help the MSVC compiler) and re-apply the hashes.

@mriccobene
Copy link
Member Author

If I'm not mistaken, genesis_mainnet.cpp doesn't cause long compilation with MSVC, despite also being quite heavy. I wonder if the reason is that genesis_mainnet.cpp uses much simpler constructs (char []) compared with preverified_hashes_mainnet.cpp. Perhaps switching preverified hashes to something like char [] instead of std::set could help with the compilation time.

You're right. In an early commit I changed the preverified header generation tool to an array of char : the purpose was to "load at runtime" the set from a basic static preallocated memory area. But it got reverted nevertheless 🤷

Sorry, I was unaware of this. Since we are already partially supporting MSVC due to the lack of 128-bit integer, in the last commit I tried to mitigate the MSVC compilation problem using the preprocessor so that the MSVC sees only the sparse version of the pre-verified set. I think it is better to void entirely its content but I will let you to decide. Please try.

Copy link
Contributor

@AndreaLanfranchi AndreaLanfranchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please merge master into this PR ... it's still off by MDBX version and third_party renamings. A bit of a pain to test it.

@AndreaLanfranchi
Copy link
Contributor

I'm afraid the situation has not improved much : I stopped release build at 46 minutes (still stuck at cmd/download_headers.cpp)

@mriccobene mriccobene force-pushed the downloader_performances branch from 4abfe18 to 282997a Compare March 2, 2022 15:27
@mriccobene
Copy link
Member Author

Can you please merge master into this PR ... it's still off by MDBX version and third_party renamings. A bit of a pain to test it.

Done!

@AndreaLanfranchi
Copy link
Contributor

@mriccobene @canepat @yperbasis

Somehow I've managed to solve the build problem on MSVC reverting the static collection of headers (which was using a set and literals to convert to a UDT) to an array of scalars uint64_t (pretty much like for genesis data). In branch al-preverified-hashes (which is totally based on this PR).
The work is pretty rough but builds ok local Windows and Linux (still need to polish knobs for CI) and probably overlooked something.

Nevertheless I can now build release in ~3 minutes with a preset of 447k preverified hashes (1 every 32 of entire mainnet chain up to 14.3M blocks). Loading of the set at runtime is pretty fast and, considering is loaded only once, even it should be in the magnitude of seconds would be still acceptable.

Besides I tested downloader more than a couple of times and got aborted after few seconds.
Here is the log

  INFO [03-02|16:31:55.988 UTC] [9032] STARTING
Download Headers - Silkworm
   chain-id: 1
   genesis-hash: d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
   hard-forks: 11
   head hash   = 24eb723279d39f7a8301da905c3708a6d26f9c33187a8794cf9d3bdfba07b70b
   head td     = 42774851402042813481032
   head height = 14307084

  INFO [03-02|16:32:01.156 UTC] [9032] [1/16 Headers] Start
  INFO [03-02|16:32:01.157 UTC] [9032] [1/16 Headers] Waiting for headers... from=14'307'084
  INFO [03-02|16:32:01.748 UTC] [2216] Peer 8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d connected, active 1
  INFO [03-02|16:32:01.750 UTC] [2216] Peer 8d1da4c5bc682f8f8ad141f60d2889fa4232144c7d2b29b88f81ca33e586ef9d disconnected, active 0
  INFO [03-02|16:32:09.650 UTC] [2216] Peer f23ac6da7c02f84a425a47414be12dc2f62172cd16bd4c7e7efa02ebaa045605 connected, active 1
  INFO [03-02|16:32:09.650 UTC] [2216] Peer f23ac6da7c02f84a425a47414be12dc2f62172cd16bd4c7e7efa02ebaa045605 disconnected, active 0
  INFO [03-02|16:32:21.619 UTC] [2216] Peer c845e51a5e470e445ad424f7cb516339237f469ad7b3c903221b5c49ce55863f connected, active 1
  INFO [03-02|16:32:21.620 UTC] [2216] Peer c845e51a5e470e445ad424f7cb516339237f469ad7b3c903221b5c49ce55863f disconnected, active 0
PS C:\Users\Andrea\CMakeBuilds\silkworm\Release\cmd\Release> .\download_headers.exe --chaindata E:\erigon-main\chaindata\
  INFO [03-02|16:42:08.386 UTC] [11340] STARTING
Download Headers - Silkworm
   chain-id: 1
   genesis-hash: d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
   hard-forks: 11
   head hash   = 24eb723279d39f7a8301da905c3708a6d26f9c33187a8794cf9d3bdfba07b70b
   head td     = 42774851402042813481032
   head height = 14307084

  INFO [03-02|16:42:09.140 UTC] [11340] Preverified Hashes            size=447097
  INFO [03-02|16:42:09.202 UTC] [11340] [1/16 Headers] Start
  INFO [03-02|16:42:09.203 UTC] [11340] [1/16 Headers] Waiting for headers... from=14'307'084
  INFO [03-02|16:42:37.263 UTC] [9804] Peer 0c6253589a8d244d57abc1c8ab298e048f24487d6572d7890e0cbb7b70039621 connected, active 1
  INFO [03-02|16:42:37.263 UTC] [9804] Peer 0c6253589a8d244d57abc1c8ab298e048f24487d6572d7890e0cbb7b70039621 disconnected, active 0
  INFO [03-02|16:42:39.203 UTC] [11340] [1/16 Headers] Wrote block headers number=14'307'084 (+0), 0 headers/secs
  INFO [03-02|16:43:09.203 UTC] [11340] [1/16 Headers] Wrote block headers number=14'307'084 (+0), 0 headers/secs
  INFO [03-02|16:43:39.203 UTC] [11340] [1/16 Headers] Wrote block headers number=14'307'084 (+0), 0 headers/secs
  INFO [03-02|16:44:05.040 UTC] [9804] Peer e83a720b2aab2a4b01622d943f2bcb92df92766c8fd14ebc6672d6abfc10a1fc connected, active 1
  INFO [03-02|16:44:09.203 UTC] [11340] [1/16 Headers] Wrote block headers number=14'307'084 (+0), 0 headers/secs
  INFO [03-02|16:44:22.041 UTC] [9804] Peer 39b4103dfb9e90a12d919af5a2086cfca20097fb7b3b1a735737610106220507 connected, active 2
PS C:\Users\Andrea\CMakeBuilds\silkworm\Release\cmd\Release>

Interestingly enough I also get some strange output like this one (after a restart)

  INFO [03-02|17:04:44.288 UTC] [8832] STARTING
Download Headers - Silkworm
   chain-id: 1
   genesis-hash: d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
   hard-forks: 11
   head hash   = 24eb723279d39f7a8301da905c3708a6d26f9c33187a8794cf9d3bdfba07b70b
   head td     = 42774851402042813481032
   head height = 14307084

  INFO [03-02|17:04:44.965 UTC] [8832] Preverified Hashes            size=447097
  INFO [03-02|17:04:45.025 UTC] [8832] [1/16 Headers] Start
  INFO [03-02|17:04:45.026 UTC] [8832] [1/16 Headers] Waiting for headers... from=14'307'084
  INFO [03-02|17:04:49.604 UTC] [116] Peer ce63274a345bac872a6a2df88b3d246f69e52323323e088ec562600224fc22c3 disconnected, active -1
  INFO [03-02|17:05:15.264 UTC] [8832] [1/16 Headers] Wrote block headers number=14'307'084 (+0), 0 headers/secs
  INFO [03-02|17:05:17.869 UTC] [8832] [1/16 Headers] Downloading completed, wrote 1'760 headers, last=14'308'844 duration=32.843s

How is possible the counter of active connections drops to -1 ?

@mriccobene
Copy link
Member Author

@AndreaLanfranchi

The crash is probably due to the bug in the mdbx_cursor_bind, you can see the issue here. However, it happens that the bug occurs only sporadically on the binary compiled with gcc thanks to a special behavior of gcc. In the test case I submitted I managed to turn it into systematic. MSVC probably operates differently and the bug becomes systematic even in the downloader.

The "-1 problem" is a known problem, the downloader receives remote peer connection/disconnection events and use them to estimate the number of remote peers (only to show it to the user) but to do this correctly it need at start time ask the sentry for the current number of peers. Considering that this number is not used (it was useful to me in the debugging) I gave priority to the resolution of the performance problem. I will complete it in a next PR

Thanks for your proposal in the branch al-preverified-hashes. I will review it.

@mriccobene
Copy link
Member Author

@AndreaLanfranchi your proposal in the branch al-preverified-hashes looks fine to me. I propose to squash and merge this PR without changing the pre-verified hashes code only erasing the set (I did the commit). Then I will open a new PR, I will cherry pick from your branch, and we will do a new review focused on that part.

@AndreaLanfranchi AndreaLanfranchi dismissed their stale review March 4, 2022 09:26

Ok then proceed as suggested

mriccobene and others added 8 commits March 4, 2022 12:53
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
(it will be re-implemented in another PR)
@mriccobene mriccobene force-pushed the downloader_performances branch from ac9579b to d3789e8 Compare March 4, 2022 11:54
@mriccobene mriccobene merged commit 6e26a43 into master Mar 4, 2022
@mriccobene mriccobene deleted the downloader_performances branch March 4, 2022 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants