Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rlp, trie: faster trie node encoding (#24126) #606

Merged

Conversation

minh-bq
Copy link
Collaborator

@minh-bq minh-bq commented Oct 18, 2024

commit ethereum/go-ethereum@65ed1a6.

This change speeds up trie hashing and all other activities that require RLP encoding of trie nodes by approximately 20%. The speedup is achieved by avoiding reflection overhead during node encoding.

The interface type trie.node now contains a method 'encode' that works with rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code. trie.hasher, which is pooled to avoid allocations, now maintains an EncoderBuffer. This means memory resources related to trie node encoding are tied to the hasher pool.

This also refactors some functions in rlp package.

goos: linux
goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
DeriveSha200/std_trie-8     725.1µ ± 31%   613.8µ ± 37%        ~ (p=0.481 n=10)
DeriveSha200/stack_trie-8   572.3µ ± 10%   493.1µ ± 13%  -13.85% (p=0.005 n=10)
geomean                     644.2µ         550.1µ        -14.61%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/std_trie-8     287.4Ki ± 0%   283.0Ki ± 0%   -1.53% (p=0.000 n=10)
DeriveSha200/stack_trie-8   56.34Ki ± 0%   42.43Ki ± 0%  -24.69% (p=0.000 n=10)
geomean                     127.2Ki        109.6Ki       -13.88%

                          │   old.txt   │               new.txt               │
                          │  allocs/op  │  allocs/op   vs base                │
DeriveSha200/std_trie-8     2.931k ± 0%   2.917k ± 0%   -0.46% (p=0.000 n=10)
DeriveSha200/stack_trie-8   1.462k ± 0%   1.246k ± 0%  -14.77% (p=0.000 n=10)
geomean                     2.070k        1.907k        -7.90%

                         │   old.txt    │               new.txt                │
                         │    sec/op    │    sec/op     vs base                │
Prove-8                    664.0µ ± 21%   450.2µ ± 27%  -32.20% (p=0.000 n=10)
VerifyProof-8              8.643µ ± 18%   9.009µ ± 33%        ~ (p=0.684 n=10)
VerifyRangeProof10-8       99.18µ ± 25%   67.60µ ± 67%        ~ (p=0.089 n=10)
VerifyRangeProof100-8      496.3µ ± 20%   487.0µ ± 33%        ~ (p=0.739 n=10)
VerifyRangeProof1000-8     5.149m ± 32%   4.095m ± 49%        ~ (p=0.971 n=10)
VerifyRangeProof5000-8     19.79m ± 60%   19.16m ± 28%        ~ (p=0.631 n=10)
VerifyRangeNoProof10-8     499.0µ ± 15%   422.8µ ± 29%  -15.25% (p=0.035 n=10)
VerifyRangeNoProof500-8    1.747m ± 30%   1.417m ± 24%  -18.91% (p=0.023 n=10)
VerifyRangeNoProof1000-8   3.025m ± 29%   2.239m ± 33%  -25.98% (p=0.009 n=10)
geomean                    750.9µ         622.6µ        -17.09%

                     │    old.txt    │               new.txt                │
                     │    sec/op     │    sec/op     vs base                │
HashFixedSize/10-8      60.30µ ± 19%   44.84µ ± 17%  -25.64% (p=0.000 n=10)
HashFixedSize/100-8     205.9µ ± 32%   145.2µ ± 19%  -29.48% (p=0.000 n=10)
HashFixedSize/1K-8     1326.5µ ± 23%   939.2µ ± 25%  -29.20% (p=0.002 n=10)
HashFixedSize/10K-8     14.77m ± 25%   12.74m ± 19%        ~ (p=0.075 n=10)
HashFixedSize/100K-8    135.2m ± 19%   104.1m ± 18%  -23.03% (p=0.003 n=10)
geomean                 2.011m         1.520m        -24.43%

                     │    old.txt    │               new.txt                │
                     │     B/op      │     B/op      vs base                │
HashFixedSize/10-8     11.729Ki ± 0%   9.752Ki ± 0%  -16.85% (p=0.000 n=10)
HashFixedSize/100-8     58.56Ki ± 0%   49.23Ki ± 0%  -15.93% (p=0.000 n=10)
HashFixedSize/1K-8      578.1Ki ± 0%   481.5Ki ± 0%  -16.72% (p=0.000 n=10)
HashFixedSize/10K-8     6.019Mi ± 0%   4.985Mi ± 0%  -17.18% (p=0.000 n=10)
HashFixedSize/100K-8    59.53Mi ± 0%   49.29Mi ± 0%  -17.20% (p=0.000 n=10)
geomean                 683.5Ki        568.8Ki       -16.78%

                     │   old.txt   │              new.txt               │
                     │  allocs/op  │  allocs/op   vs base               │
HashFixedSize/10-8      149.0 ± 0%    142.0 ± 0%  -4.70% (p=0.000 n=10)
HashFixedSize/100-8     772.0 ± 0%    739.0 ± 0%  -4.27% (p=0.000 n=10)
HashFixedSize/1K-8     7.443k ± 0%   7.099k ± 0%  -4.62% (p=0.000 n=10)
HashFixedSize/10K-8    77.09k ± 0%   73.32k ± 0%  -4.89% (p=0.000 n=10)
HashFixedSize/100K-8   767.8k ± 0%   730.5k ± 0%  -4.86% (p=0.000 n=10)
geomean                8.729k        8.321k       -4.67%

@huyngopt1994
Copy link
Collaborator

This PR is LGTM, Could u add description related to some minor refactors in rlp package and help to resolve conflicts before merging too ?

@minh-bq minh-bq force-pushed the faster-node-encoding branch from 58545c7 to 34727c3 Compare October 29, 2024 06:25
commit ethereum/go-ethereum@65ed1a6.

This change speeds up trie hashing and all other activities that require
RLP encoding of trie nodes by approximately 20%. The speedup is achieved by
avoiding reflection overhead during node encoding.

The interface type trie.node now contains a method 'encode' that works with
rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code.
trie.hasher, which is pooled to avoid allocations, now maintains an
EncoderBuffer. This means memory resources related to trie node encoding
are tied to the hasher pool.

This also refactors some functions in rlp package.

goos: linux
goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
DeriveSha200/std_trie-8     725.1µ ± 31%   613.8µ ± 37%        ~ (p=0.481 n=10)
DeriveSha200/stack_trie-8   572.3µ ± 10%   493.1µ ± 13%  -13.85% (p=0.005 n=10)
geomean                     644.2µ         550.1µ        -14.61%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/std_trie-8     287.4Ki ± 0%   283.0Ki ± 0%   -1.53% (p=0.000 n=10)
DeriveSha200/stack_trie-8   56.34Ki ± 0%   42.43Ki ± 0%  -24.69% (p=0.000 n=10)
geomean                     127.2Ki        109.6Ki       -13.88%

                          │   old.txt   │               new.txt               │
                          │  allocs/op  │  allocs/op   vs base                │
DeriveSha200/std_trie-8     2.931k ± 0%   2.917k ± 0%   -0.46% (p=0.000 n=10)
DeriveSha200/stack_trie-8   1.462k ± 0%   1.246k ± 0%  -14.77% (p=0.000 n=10)
geomean                     2.070k        1.907k        -7.90%

                         │   old.txt    │               new.txt                │
                         │    sec/op    │    sec/op     vs base                │
Prove-8                    664.0µ ± 21%   450.2µ ± 27%  -32.20% (p=0.000 n=10)
VerifyProof-8              8.643µ ± 18%   9.009µ ± 33%        ~ (p=0.684 n=10)
VerifyRangeProof10-8       99.18µ ± 25%   67.60µ ± 67%        ~ (p=0.089 n=10)
VerifyRangeProof100-8      496.3µ ± 20%   487.0µ ± 33%        ~ (p=0.739 n=10)
VerifyRangeProof1000-8     5.149m ± 32%   4.095m ± 49%        ~ (p=0.971 n=10)
VerifyRangeProof5000-8     19.79m ± 60%   19.16m ± 28%        ~ (p=0.631 n=10)
VerifyRangeNoProof10-8     499.0µ ± 15%   422.8µ ± 29%  -15.25% (p=0.035 n=10)
VerifyRangeNoProof500-8    1.747m ± 30%   1.417m ± 24%  -18.91% (p=0.023 n=10)
VerifyRangeNoProof1000-8   3.025m ± 29%   2.239m ± 33%  -25.98% (p=0.009 n=10)
geomean                    750.9µ         622.6µ        -17.09%

                     │    old.txt    │               new.txt                │
                     │    sec/op     │    sec/op     vs base                │
HashFixedSize/10-8      60.30µ ± 19%   44.84µ ± 17%  -25.64% (p=0.000 n=10)
HashFixedSize/100-8     205.9µ ± 32%   145.2µ ± 19%  -29.48% (p=0.000 n=10)
HashFixedSize/1K-8     1326.5µ ± 23%   939.2µ ± 25%  -29.20% (p=0.002 n=10)
HashFixedSize/10K-8     14.77m ± 25%   12.74m ± 19%        ~ (p=0.075 n=10)
HashFixedSize/100K-8    135.2m ± 19%   104.1m ± 18%  -23.03% (p=0.003 n=10)
geomean                 2.011m         1.520m        -24.43%

                     │    old.txt    │               new.txt                │
                     │     B/op      │     B/op      vs base                │
HashFixedSize/10-8     11.729Ki ± 0%   9.752Ki ± 0%  -16.85% (p=0.000 n=10)
HashFixedSize/100-8     58.56Ki ± 0%   49.23Ki ± 0%  -15.93% (p=0.000 n=10)
HashFixedSize/1K-8      578.1Ki ± 0%   481.5Ki ± 0%  -16.72% (p=0.000 n=10)
HashFixedSize/10K-8     6.019Mi ± 0%   4.985Mi ± 0%  -17.18% (p=0.000 n=10)
HashFixedSize/100K-8    59.53Mi ± 0%   49.29Mi ± 0%  -17.20% (p=0.000 n=10)
geomean                 683.5Ki        568.8Ki       -16.78%

                     │   old.txt   │              new.txt               │
                     │  allocs/op  │  allocs/op   vs base               │
HashFixedSize/10-8      149.0 ± 0%    142.0 ± 0%  -4.70% (p=0.000 n=10)
HashFixedSize/100-8     772.0 ± 0%    739.0 ± 0%  -4.27% (p=0.000 n=10)
HashFixedSize/1K-8     7.443k ± 0%   7.099k ± 0%  -4.62% (p=0.000 n=10)
HashFixedSize/10K-8    77.09k ± 0%   73.32k ± 0%  -4.89% (p=0.000 n=10)
HashFixedSize/100K-8   767.8k ± 0%   730.5k ± 0%  -4.86% (p=0.000 n=10)
geomean                8.729k        8.321k       -4.67%

Co-authored-by: Felix Lange <fjl@twurst.com>
@minh-bq minh-bq force-pushed the faster-node-encoding branch from 34727c3 to b67bcf7 Compare October 29, 2024 06:28
@huyngopt1994 huyngopt1994 merged commit 1615821 into axieinfinity:path-base-implementing Oct 29, 2024
1 check passed
huyngopt1994 pushed a commit that referenced this pull request Nov 21, 2024
commit ethereum/go-ethereum@65ed1a6.

This change speeds up trie hashing and all other activities that require
RLP encoding of trie nodes by approximately 20%. The speedup is achieved by
avoiding reflection overhead during node encoding.

The interface type trie.node now contains a method 'encode' that works with
rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code.
trie.hasher, which is pooled to avoid allocations, now maintains an
EncoderBuffer. This means memory resources related to trie node encoding
are tied to the hasher pool.

This also refactors some functions in rlp package.

goos: linux
goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
DeriveSha200/std_trie-8     725.1µ ± 31%   613.8µ ± 37%        ~ (p=0.481 n=10)
DeriveSha200/stack_trie-8   572.3µ ± 10%   493.1µ ± 13%  -13.85% (p=0.005 n=10)
geomean                     644.2µ         550.1µ        -14.61%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/std_trie-8     287.4Ki ± 0%   283.0Ki ± 0%   -1.53% (p=0.000 n=10)
DeriveSha200/stack_trie-8   56.34Ki ± 0%   42.43Ki ± 0%  -24.69% (p=0.000 n=10)
geomean                     127.2Ki        109.6Ki       -13.88%

                          │   old.txt   │               new.txt               │
                          │  allocs/op  │  allocs/op   vs base                │
DeriveSha200/std_trie-8     2.931k ± 0%   2.917k ± 0%   -0.46% (p=0.000 n=10)
DeriveSha200/stack_trie-8   1.462k ± 0%   1.246k ± 0%  -14.77% (p=0.000 n=10)
geomean                     2.070k        1.907k        -7.90%

                         │   old.txt    │               new.txt                │
                         │    sec/op    │    sec/op     vs base                │
Prove-8                    664.0µ ± 21%   450.2µ ± 27%  -32.20% (p=0.000 n=10)
VerifyProof-8              8.643µ ± 18%   9.009µ ± 33%        ~ (p=0.684 n=10)
VerifyRangeProof10-8       99.18µ ± 25%   67.60µ ± 67%        ~ (p=0.089 n=10)
VerifyRangeProof100-8      496.3µ ± 20%   487.0µ ± 33%        ~ (p=0.739 n=10)
VerifyRangeProof1000-8     5.149m ± 32%   4.095m ± 49%        ~ (p=0.971 n=10)
VerifyRangeProof5000-8     19.79m ± 60%   19.16m ± 28%        ~ (p=0.631 n=10)
VerifyRangeNoProof10-8     499.0µ ± 15%   422.8µ ± 29%  -15.25% (p=0.035 n=10)
VerifyRangeNoProof500-8    1.747m ± 30%   1.417m ± 24%  -18.91% (p=0.023 n=10)
VerifyRangeNoProof1000-8   3.025m ± 29%   2.239m ± 33%  -25.98% (p=0.009 n=10)
geomean                    750.9µ         622.6µ        -17.09%

                     │    old.txt    │               new.txt                │
                     │    sec/op     │    sec/op     vs base                │
HashFixedSize/10-8      60.30µ ± 19%   44.84µ ± 17%  -25.64% (p=0.000 n=10)
HashFixedSize/100-8     205.9µ ± 32%   145.2µ ± 19%  -29.48% (p=0.000 n=10)
HashFixedSize/1K-8     1326.5µ ± 23%   939.2µ ± 25%  -29.20% (p=0.002 n=10)
HashFixedSize/10K-8     14.77m ± 25%   12.74m ± 19%        ~ (p=0.075 n=10)
HashFixedSize/100K-8    135.2m ± 19%   104.1m ± 18%  -23.03% (p=0.003 n=10)
geomean                 2.011m         1.520m        -24.43%

                     │    old.txt    │               new.txt                │
                     │     B/op      │     B/op      vs base                │
HashFixedSize/10-8     11.729Ki ± 0%   9.752Ki ± 0%  -16.85% (p=0.000 n=10)
HashFixedSize/100-8     58.56Ki ± 0%   49.23Ki ± 0%  -15.93% (p=0.000 n=10)
HashFixedSize/1K-8      578.1Ki ± 0%   481.5Ki ± 0%  -16.72% (p=0.000 n=10)
HashFixedSize/10K-8     6.019Mi ± 0%   4.985Mi ± 0%  -17.18% (p=0.000 n=10)
HashFixedSize/100K-8    59.53Mi ± 0%   49.29Mi ± 0%  -17.20% (p=0.000 n=10)
geomean                 683.5Ki        568.8Ki       -16.78%

                     │   old.txt   │              new.txt               │
                     │  allocs/op  │  allocs/op   vs base               │
HashFixedSize/10-8      149.0 ± 0%    142.0 ± 0%  -4.70% (p=0.000 n=10)
HashFixedSize/100-8     772.0 ± 0%    739.0 ± 0%  -4.27% (p=0.000 n=10)
HashFixedSize/1K-8     7.443k ± 0%   7.099k ± 0%  -4.62% (p=0.000 n=10)
HashFixedSize/10K-8    77.09k ± 0%   73.32k ± 0%  -4.89% (p=0.000 n=10)
HashFixedSize/100K-8   767.8k ± 0%   730.5k ± 0%  -4.86% (p=0.000 n=10)
geomean                8.729k        8.321k       -4.67%

Co-authored-by: Qian Bin <cola.tin.com@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants