-
Notifications
You must be signed in to change notification settings - Fork 523
performance: Agreement state serialization using msgp #4644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
But I also broke consistent sorting for ledger and others
All tests passing assuming that messageHandle doesn't need to be serialized
2222265 to
e7cfbec
Compare
Codecov Report
@@ Coverage Diff @@
## master #4644 +/- ##
==========================================
+ Coverage 54.42% 54.47% +0.05%
==========================================
Files 403 404 +1
Lines 51931 51994 +63
==========================================
+ Hits 28264 28325 +61
- Misses 21288 21292 +4
+ Partials 2379 2377 -2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| // LedgerReader, associated with some round. | ||
| type ConsensusVersionView struct { | ||
| Err serializableError | ||
| _struct struct{} `codec:","` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the comma do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing other than telling msgp to generate marshaller methods for it. In our fork we treat omitempty as the logical default so you have to include the tag that explicitly leaves out omitempty it if you don't want the behavior.
Nothing before the comma means that field isn't renamed and nothing after it means no special directives like omitempty and others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it can't be empty I take it, or no tag at all ... OK
jannotti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good, nice testing of before and after.
I notice we rarely (never?) use omitempty. Is that in order to make the encodings the same before and after? I wonder how much space it would save and whether it's worth turning it on once we're confident enough that we don't care about the before/after tests.
| // LedgerReader, associated with some round. | ||
| type ConsensusVersionView struct { | ||
| Err serializableError | ||
| _struct struct{} `codec:","` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not omitempty? Does this serialize the (usually nil, I assume) Err as a 0 length string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't using omitempty to make encodings as close as possible to what they were before. The one exception to where they wouldn't be identical is message struct since we are using msgp.Raw there.
The main reason @cce asked me to remove the omitempty hints is for the structs that contain maps since we could accidentally try to insert into a nil map at runtime which would cause a panic.
I'm happy to add omitemptys on all structs that don't export maps if we want to do that.
I wasn't planning on removing them since there is no noticeable CPU impact as I mentioned and the persistence.go implementation only ever stores a single row in the DB and with WAL enabled on the DB the small amount of IO increase here shouldn't be a concern either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was (and @iansuvak this is how it works right?) that the old reflection code will encode and decode a nil map differently from an empty map (right?) and if we use omitempty, the msgp-generated code will keep encoding and decoding this the same way.
agreement/persistence.go
Outdated
| s.Player = protocol.Encode(&p) | ||
| } | ||
| s.Clock = t.Encode() | ||
| for _, act := range a { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we preallocate s.ActionTypes and s.Actions to be len(a)?
agreement/persistence_test.go
Outdated
|
|
||
| func TestRandomizedEncodingFullDiskState(t *testing.T) { | ||
| partitiontest.PartitionTest(t) | ||
| router, player := randomizeDiskState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually randomized tests run a bunch of iterations, maybe wrap this up a loop and make 10K iterations or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did 5k since 10k was taking ~ 20s. I would be pro removing this or shrinking it down post-consensus release
| Quit() | ||
| } | ||
|
|
||
| //msgp:ignore cryptoVoteRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, how did you figure out which types to ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that embeds message really and is not messageEvent. The process of adding the fields was adding the _struct field to the top levels embedded things and then seeing which structs didn't get their interfaces implemented after running make msgp. anything that embeds message got accidentally added and ignoring them didn't introduce any unimplemented method errors.
| continue | ||
| } | ||
| if st.Name() == "messageEvent" && f.Name == "Tail" { | ||
| // Don't try and set the Tail field since it's recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, you could have set just one Tail, but this is fine
cce
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try using the "autopsy" tool to analyze the new format of cadaver files?
|
No I didn't touch cadavers with this change. We should take another look at them at some point though since we are still using them on significant portion of stake |
Summary
This change will replace agreement state serialization used for crash recovery purposes from using reflection to using auto-generated code from the msgp library. It currently supports router and player structs, but not actions since they are polymorphic.
Currently I'm using unbound allocbounds since this isn't exposed to network and isn't an attack vector but happy to take suggestions on
TODO
Implement handling for messageHandle field which is of typeinterface{}and is present onmessageEventsHandle randomized generation for messageHandle which is ofinterface{}type and isn't currently supported.Above weren't implemented. messageHandle field was explicitly unexported instead.
Either add allocbounds or handle codec_tester warningsTest Plan
Pass autogenerated tests incodec_tester.gothat compare encodings/decodings of randomly generated objects usingmsgpandgo-codecTesting done
New and existing tests pass
Benchmarks show improvement using both new randomized encoding benchmark as well as using a crash.db collected from a testnet node instrumented to insert new rows for each persist call instead of overwriting it.
Result stats over 20 benchmark tests of decode:

Same data created using master branch (reflection) was successfully decoded and reencoded using both libraries and shown to be equal. Test not included since it depends on having access to a binary DB file which shouldn't be committed to the repo:

Tests over Scenario1 with tps-multiplier of 6 show the following improvements on pprof CPU flamegraph:
Reflect CPU
Encode: 3.19% 12.52s
Decode: 5.06% 19.88s
msgp CPU
Encode: 1.36% 2.26s
Decode: 0.93% 1.53s
Reflect alloc objects
Encode: 3.91% 36323960
Decode: 4.07% 37835267
msgp alloc objects
Encode: 0.002% 43941
Decode: 0.67% 14930739
Performance Testing post removing omitempty
Result stats over 20 benchmark tests of decode:

Msgp time stayed about the same while reflect time went up a lot with removal of omitempty. Scenario1 testing is similar to above