-
Notifications
You must be signed in to change notification settings - Fork 145
Add CCF codec implementing CCF Specification (RC1) #2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is to allow CCF codec to use new StreamEncoder.Close() function provided by latest fxamacker/cbor.
- added NewMeteredUFix64FromUint64 to create metered UFix64 from uint64. - added NewMeteredFix64FromInt64 to create metered Fix64 from int64.
Cadence Compact Format (CCF) is a data format designed for compact, efficient, and deterministic encoding of Cadence external values. CCF obsoletes JSON-Cadence Data Interchange Format (JCDIF) for use cases that do not require JSON. Unlike JCDIF, CCF Specifications explicitly defines requirements for - well-formed encodings - valid encodings - deterministic encodings CCF is hybrid data format. CCF-based messages can be: - fully self-describing or - partially self-describing. Both CCF modes are more compact than JSON-based messages. CCF-based protocols can send Cadence metadata just once for all messages of that type. Malformed data can be detected without Cadence metadata and without creating Cadence objects. CCF codec implements CCF Specification (RC1), which is temporarily at github.com/fxamacker/ccf_draft. The CCF specs will be hosted under github.com/onflow after it is updated, cleaned up, and reformatted.
CCF obsoletes JSON-Cadence Data Interchange Format for use cases
that do not require JSON. Given this, preliminary comparisons are
described here for the CCF codec implementing CCF Specifications (RC1).
PRELIMINARY COMPARISONS
Comparisons used 48,309 events from a single mainnet transaction.
There were 9 event types. To simplify benchmark code, the first event's
value in each of the 9 event types was used.
CCF's partially self-describing mode (aka "detached" mode) would be
even smaller than this (e.g. maybe less than 1/4 the size of JSON when
Flow eventually supports detached mode).
SIZE COMPARISON
Encoding | Num Events | Encoded size | Comments
-------- | ---------- | ------------ | --------
JSON | 48,309 | 13,858,836 | JSON-Cadence Data Interchange
CCF | 48,309 | 6,159,931 | CCF in fully self-describing mode
CCF SPEED AND MEMORY COMPARISONS
This isn't apples to apples comparison. JSON data isn't sorted, etc.
- CCF encoder sorts data to encode event data deterministically.
- CCF decoder also verifies event data is sorted, well-formed, and valid.
- CCF encoding is less than 1/2 size of JSON-Cadence Data Interchange.
ENCODER COMPARISON
48k_events_encode_json.log │ 48k_events_encode_ccf.log
sec/op │ sec/op vs base
89.84m ± 17% 69.28m ± 3% -22.88%
48k_events_encode_json.log │ 48k_events_encode_ccf.log
B/op │ B/op vs base
32.45Mi ± 0% 25.82Mi ± 0% -20.45%
48k_events_encode_json.log │ 48k_events_encode_ccf.log
allocs/op │ allocs/op vs base
756.6k ± 0% 370.4k ± 0% -51.05%
DECODER COMPARISON
48k_events_decode_json.log │ 48k_events_decode_ccf.log
sec/op │ sec/op vs base
646.2m ± 8% 158.3m ± 5% -75.50%
48k_events_decode_json.log │ 48k_events_decode_ccf.log
B/op │ B/op vs base
234.97Mi ± 0% 56.16Mi ± 0% -76.10%
48k_events_decode_json.log │ 48k_events_decode_ccf.log
allocs/op │ allocs/op vs base
4.746M ± 0% 1.288M ± 0% -72.86%
Benchmarked using Go 1.19.6, linux_amd64, i5-13600k. Results are
subject to change because CCF codec reference implementation in Go
has not yet been reviewed or merged into onflow/cadence yet.
Added comment stating that cadence.String and cadence.Character
must be valid UTF-8 and it is the application's responsibility
to provide the CCF encoder with valid UTF-8 strings.
"Valid CCF Encoding Requirements" in CCF Specification states:
"Encoders are not required to check for invalid input items
(e.g. invalid UTF-8 strings, duplicate dictionary keys, etc.)
Applications MUST NOT provide invalid items to encoders."
Cadence Benchstat comparisonThis branch with compared with the base branch onflow:master commit 1541525 Collapsed results for better readability
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Codecov Report
@@ Coverage Diff @@
## master #2364 +/- ##
==========================================
- Coverage 78.58% 78.28% -0.31%
==========================================
Files 315 325 +10
Lines 68892 72307 +3415
==========================================
+ Hits 54137 56603 +2466
- Misses 12946 13627 +681
- Partials 1809 2077 +268
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
These informal and preliminary benchmarks used commit f911063 in onflow/cadence#2364.
Composite initializer parameters have natual sorting, and shouldn't be changed. Thanks Bastian for spotting this during discussion!
Prior to this change, more initializers were allowed. Currently, Cadence doesn't support more than one initializer and adding this limit to CCF removes the need to sort initializers. Thanks Bastian for suggesting this!
Cadence recently added FunctionType.TypeParameters, so add support for it in CCF codec.
Add support for nullable types: - type bound in FunctionType.TypeParameters - type of RestrictedType - borrow type of CapabilityType - add more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! 👏
I reviewed everything except the test cases again and it's looking great!
Except for some minor suggestions, the only work remaining I can see is adding more tests for the unhappy paths of the decoder, but maybe it's better to get this PR in and add those in a follow-up PR, because this one is already very large.
One part I'm still trying to wrap my head around is the special handling of optional and reference types (needToEncodeRuntimeType, getTypeToEncodeAsCCFInlineType, getOptionalInnerTypeToEncodeAsCCFInlineType). Maybe we can have a follow-up sync around this next week.
Co-authored-by: Bastian Müller <bastian@axiomzen.co>
Cadence recently improved PathValue in PR 2427, so the CCF codec was modified to use the updated PathValue. Also added more tests to encode different types of PathValue.
Co-authored-by: Bastian Müller <bastian@axiomzen.co>
Co-authored-by: Bastian Müller <bastian@axiomzen.co>
Thanks @turbolent for spotting this!
Thanks @turbolent for identifying potential coding changes in the future that could require determinism inside a function when metering is added. "Even though I can't currently see a problem, i.e. the function only has local side-effects, this might not stay true forever (e.g. we are going to add metering)."
Reduce default max elements limit to 20_000_000 for arrays and maps. These limits are large (and can be reduced more if needed): - Current grpc limit is 20 MB and these limits are large enough to support unrealistic CCF message with zero-overhead and elements of 1 byte size using up entire 20 MB grpc limit. - It would typically take many thousands of "normal" CCF-encoded events to get near the 20 MB grpc limit with one transaction. Also added comments explaining the security considerations for having limits. Thanks @turbolent for reminder to document limits!
As a CBOR security consideration, make CCF decoder reject messages containing indefinite length CBOR byte string, text string, arrays, and maps.
turbolent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent changes look great! 👌
SupunS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
turbolent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work @fxamacker! 👏👏
This prevents an edge case from encoding more data than necessary. getTypeToEncodeAsCCFInlineType() returns runtime type to be encoded after removing redundant type info that is present in staticType because staticType is already encoded at higher level. This applies to optional type container that is present in both static type and runtime type. This commit unwraps reference type from static type if present and tries again because reference type is only present in static type.
|
@turbolent @SupunS Thanks for reviews, fuzzing, and meetings! After merging this, I'll open followup issue & PR to add API to allow some default CBOR limits to be overridden (e.g. I spotted an edge case where it was possible for encoder to needlessly encode redundant data, related to reference type. Can you take a look at commit 51552f1 to confirm it makes sense before I merge this PR? 🙏 |
Thanks @turbolent for spotting this!
Replaced recursive call with for-loop. Thanks @turbolent for suggesting this!
Description
Updates #2157
Cadence Compact Format (CCF) is a data format designed for compact, efficient, and deterministic encoding of Cadence external values.
CCF obsoletes JSON-Cadence Data Interchange Format (JSON-CDC) for use cases that do not require JSON.
Unlike JSON-CDC, etc. CCF Specifications explicitly defines requirements for:
CCF is hybrid data format. CCF-based messages can be:
Both CCF modes are more compact than JSON-based messages. CCF-based protocols can send Cadence metadata just once for all messages of that type. Malformed data can be detected without Cadence metadata and without creating Cadence objects.
CCF codec implements CCF Specification (RC1), which is temporarily at github.com/fxamacker/ccf_draft. The CCF specs
will be hosted under github.com/onflow after it is updated, cleaned up, and reformatted.
API
CCF codec provides the same API as the existing JSON-Cadence Data Interchange codec.
Given this, integration effort to replace the old codec should be minimal.
PRELIMINARY COMPARISONS
Comparisons used 48,309 events from a single mainnet transaction.
There were 9 event types. To simplify benchmark code, the first event's value in each of the 9 event types was used.
CCF's partially self-describing mode (aka "detached" mode) would be even smaller than this (e.g. maybe less than 1/14 the size of JSON when Flow eventually supports detached mode).
SIZE COMPARISON
CCF SPEED AND MEMORY COMPARISONS
This isn't apples to apples comparison. JSON data isn't sorted, etc.
Benchmarked using Go 1.19.6, linux_amd64, i5-13600k. Results are subject to change because CCF codec reference implementation in Go has not yet been reviewed.
NEXT STEPS
👉 After this PR is merged, there's additional work that should be completed before using or deploying this codec.
Add more tests to handle edge cases (as time allows). Currently,
go test -coverreports 77.3% for CCF codec.Add and run fuzz tests. It would be very unusual for fuzzing to not find any problems with a brand new codec for a new data format. I think Cadence Team is best qualified for this task (i.e. expert in Cadence types & values).
Integration with FVM to use CCF codec for events.
Integration work related to converting CCF encodings to JSON-Cadence Data Interchange.
Add integration tests in relevant projects.
EDIT: updated benchmarks to match commit f911063
masterbranchFiles changedin the Github PR explorer