Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize encoding of composites #761

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Apr 4, 2021

Closes #744
Updates #738

Description

Changes

  • Switch from CBOR map to CBOR array to improve speed and preserve ordering.
  • Remove unnecessary field sorting when decoding.
  • Remove old backwards-compatibility decoding code (decoding type ID).
  • Add tests, including round-trip for decoding old format and encoding new format.

Benchmark comparisons

⚠️ NOTE: "faster" refers to time delta in benchstat, so "50% faster" here means 2x speedup.

Encoding speed improved by 44.35% and decoding speed improved by 34.56% for a 776,155 byte value from production data that contains both composites and dictionaries. Memory use also improved. Encoded/decoded data includes Cadence Value types that have not yet been optimized.

Speed gains were only 15-27% for encoding and 16-24% for decoding for smaller production data that did not include dictionaries. However, these will gain more from optimizing remaining Value types (outside the scope of this PR).

Click to expand:

Big Value with Composites and Dictionaries 🚀🚀

Encoding Comparisons

name                                           old time/op    new time/op    delta
EncodeComposite/870comp_837dict_776155bytes-4    35.2ms ± 1%    19.6ms ± 1%  -44.35%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
EncodeComposite/870comp_837dict_776155bytes-4    7.50MB ± 5%    4.35MB ± 0%  -41.99%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
EncodeComposite/870comp_837dict_776155bytes-4      141k ± 0%       76k ± 0%  -46.02%  (p=0.000 n=10+10)

Decoding Comparisons

name                                           old time/op    new time/op    delta
DecodeComposite/870comp_837dict_776155bytes-4    40.3ms ± 1%    26.3ms ± 1%  -34.65%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
DecodeComposite/870comp_837dict_776155bytes-4    15.0MB ± 0%    10.5MB ± 0%  -29.67%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
DecodeComposite/870comp_837dict_776155bytes-4      265k ± 0%      216k ± 0%  -18.56%  (p=0.000 n=10+10)
Values with Composites and No Dictionaries 🚀

Encoding Comparisons

name                                           old time/op    new time/op    delta
EncodeComposite/1comp_0dict_139bytes-4           16.0µs ± 1%    13.6µs ± 1%  -15.27%  (p=0.000 n=10+10)
EncodeComposite/2comp_0dict_160bytes-4           20.9µs ± 1%    15.7µs ± 1%  -25.16%  (p=0.000 n=10+10)
EncodeComposite/104comp_0dict_14850bytes-4        910µs ± 1%     657µs ± 1%  -27.78%  (p=0.000 n=9+10)

name                                           old alloc/op   new alloc/op   delta
EncodeComposite/1comp_0dict_139bytes-4           3.79kB ± 1%    2.98kB ± 2%  -21.32%  (p=0.000 n=10+10)
EncodeComposite/2comp_0dict_160bytes-4           4.85kB ± 2%    3.36kB ± 0%  -30.74%  (p=0.000 n=9+10)
EncodeComposite/104comp_0dict_14850bytes-4        247kB ± 1%     188kB ± 0%  -24.16%  (p=0.000 n=8+10)

name                                           old allocs/op  new allocs/op  delta
EncodeComposite/1comp_0dict_139bytes-4             76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+9)
EncodeComposite/2comp_0dict_160bytes-4              103 ± 0%        73 ± 0%  -29.13%  (p=0.000 n=7+10)
EncodeComposite/104comp_0dict_14850bytes-4        5.03k ± 0%     3.45k ± 0%  -31.28%  (p=0.000 n=8+10)

Decoding Comparisons

name                                           old time/op    new time/op    delta
DecodeComposite/1comp_0dict_139bytes-4           11.6µs ± 1%     9.7µs ± 1%  -16.38%  (p=0.000 n=9+10)
DecodeComposite/2comp_0dict_160bytes-4           16.9µs ± 1%    12.7µs ± 1%  -24.47%  (p=0.000 n=10+9)
DecodeComposite/104comp_0dict_14850bytes-4        757µs ± 1%     569µs ± 1%  -24.90%  (p=0.000 n=9+10)

name                                           old alloc/op   new alloc/op   delta
DecodeComposite/1comp_0dict_139bytes-4           3.90kB ± 0%    3.35kB ± 0%  -13.96%  (p=0.000 n=10+10)
DecodeComposite/2comp_0dict_160bytes-4           5.54kB ± 0%    4.38kB ± 0%  -20.92%  (p=0.000 n=10+10)
DecodeComposite/104comp_0dict_14850bytes-4        296kB ± 0%     238kB ± 0%  -19.61%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
DecodeComposite/1comp_0dict_139bytes-4             56.0 ± 0%      53.0 ± 0%   -5.36%  (p=0.000 n=10+10)
DecodeComposite/2comp_0dict_160bytes-4             97.0 ± 0%      89.0 ± 0%   -8.25%  (p=0.000 n=10+10)
DecodeComposite/104comp_0dict_14850bytes-4        5.05k ± 0%     4.65k ± 0%   -7.98%  (p=0.000 n=10+10)

Benchmarks used Go 1.15.10 on linux_amd64 using production data.

Special thanks to @turbolent for the state decoding tool in #759 and @ramtinms for production data in jsonl!


For contributor use:

  • Targeted PR against feature/storage-optimizations branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Switch from CBOR map to CBOR array to improve speed and preserve
ordering.

Remove unnecessary field sorting when decoding.

Remove old backwards-compatibility decoding code (decoding type ID).

Add tests, including round-trip for decoding old format and encoding
new format.

Closes onflow#744
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just one comment regarding the encoding of fields

Comment on lines 811 to 826
fieldPairs := v.Fields.pairs
fieldLen := len(fieldPairs)

// Sort field names lexicographically.
fieldNames := make([]string, fieldLen)

index := 0
for name := range fieldPairs {
fieldNames[index] = name
index++
}

for pair := v.Fields.Oldest(); pair != nil; pair = pair.Next() {
fieldName := pair.Key
value := pair.Value
sort.Strings(fieldNames)

// Create fields (key and value) array, sorted by field name.
fields := make(cborArray, fieldLen*2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just keep the existing iteration over the ordered map, because that already results in a deterministic encoding. Sorting the fields also works, but is not necessary I think

Copy link
Member Author

@fxamacker fxamacker Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the unsorted approach (it's faster and cleaner) but it causes this test to fail:

  • TestInterpretCompositeValueFieldEncodingOrder in interpreter_test.go.

I strongly prefer your suggestion and am hoping the test might be obsolete or unneeded.

This is the code block incorporating the faster and cleaner code you suggested:

	fields := make(cborArray, v.Fields.Len()*2)

	i := 0
	for pair := v.Fields.Oldest(); pair != nil; pair = pair.Next() {
		fieldName := pair.Key
		value := pair.Value

		valuePath := append(path[:], fieldName)

		prepared, err := e.prepare(value, valuePath, deferrals)
		if err != nil {
			return nil, err
		}

		fields[i], fields[i+1] = fieldName, prepared
		i += 2
	}

I suspect TestInterpretCompositeValueFieldEncodingOrder tests if encoding composites with entries of different orderings is deterministic.

Test expects:

  • CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"a": 1, "b": 2, "c": 3}.
  • CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"b": 2, "c": 3, "a": 1}.

Without sorting, this implementation produces:

  • CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"a": 1, "b": 2, "c": 3}.
  • CBOR data ["b", 2, "c", 3, "a", 1] when encoding composite with entries {"b": 2, "c": 3, "a": 1}.

Any thoughts or suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test checks that no matter what the order of initialization is, the encoding is always the same. That's not required though, we only have to ensure the encoding order is always deterministic, and the ordered map for the fields ensures that.

I've adjusted the test case in f2a268b to prepare the expected encodings for all permutations of initializing the fields


valuePath := append(path[:], fieldName)

prepared, err := e.prepare(value, valuePath, deferrals)
if err != nil {
return nil, err
}
fields[fieldName] = prepared

fields[i*2], fields[i*2+1] = fieldName, prepared
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Comment on lines -650 to -652
// Decode all fields in lexicographic order

sort.Strings(fieldNames)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great we don't need this anymore 👍

Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched sorting to iterating over the ordered map of fields in f2a268b.

I enabled CI for feature branches, including from forks, CI ran, and everything is green 👏

@turbolent turbolent merged commit d302379 into onflow:feature/storage-optimizations Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants