Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: zero copy go API #15130

Closed
wants to merge 5 commits into from
Closed

POC: zero copy go API #15130

wants to merge 5 commits into from

Conversation

aaronc
Copy link
Member

@aaronc aaronc commented Feb 22, 2023

Description

This proof of concept arose from team discussions on ADR 054 (#11802) around approach B and a zero-copy message passing approach.

It choose to implement an approach where all struct elements are at fixed offsets.

This has the pros that:

  • in Rust it could likely be represented directly with structs without needing as many getters/setters as in go
  • the performance overhead should be pretty low because offsets don't need to be computed

The big downside of this approach is that (unlike something like https://capnproto.org/encoding.html), it would be hard to add new fields to existing structs in the case where the writer has an older schema version than the reader because the reader would improperly read struct offsets which don't exist. To mitigate this, I can think of two work-arounds:

  • encode the largest offset in the struct and use getters/setters in rust as opposed to struct fields
  • do stateful encoding where the encoder needs to be made aware of the size of structs that readers are expecting

We could explore the Rust side of things to see what approach makes sense if we want to go in this direction.

Another downside of this approach is that it doesn't optimize for storage space so it is likely not a great on-disk or over the wire format without compression (we could look at something like https://capnproto.org/encoding.html#packing). For the time-being, I don't think this is a big issue because we're only taking about using this between modules in the same process but in separate VMs so encoded size is not a concern. But if we can do some sort of packing too, it might be nice to consider this for other use cases.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

return ReadArray[Coin](m.ctx, msgSendCoinsOffset)
}

func (m *MsgSend) InitCoins(size int) Array[*Coin] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays must be allocated as fixed size with this approach and can't be appended to (at least without re-organizing the memory buffer). To change the size the array needs to be re-initialized and re-populated

@alexanderbez
Copy link
Contributor

What is "around approach B" and why is this needed? What are we trying to solve for here? Passing (encoded?) messages between module/keepers?

@aaronc
Copy link
Member Author

aaronc commented Feb 23, 2023

What is "around approach B" and why is this needed? What are we trying to solve for here? Passing (encoded?) messages between module/keepers?

Approach B is based around message passing between modules potentially over a VM boundary (i.e. modules written in either Rust or Go and maybe running inside a WASM VM). We're trying to have a zero copy solution for the inter-module boundary even if the modules are running in separate VMs and written in different languages

@kocubinski
Copy link
Member

What is "around approach B" and why is this needed? What are we trying to solve for here? Passing (encoded?) messages between module/keepers?

This an implementation of one of the more interesting problems required to performantly build and ship message passing. If module APIs are specified by protobuf IDL, and each SDK module implements one separately, the naive solution is for each module to communicate with others via binary proto. But this is a lot of wasted energy if the modules are in the same process and if we're in full control the type domain (we are). Ergo we arrive at zero copy, just passing a memory location around with a schema derived from the same proto IDL.

@kocubinski
Copy link
Member

@alexanderbez There is a lot of ground to cover before this would be used, and lots of refactoring required to give it placement. So I can understand the missing context! Hopefully it generates some interest in message passing as solution in the SDK like it does for me 👏

@yihuang
Copy link
Collaborator

yihuang commented Feb 23, 2023

I like the idea of writing module in rust or wasm, but it seems we are reinventing the wheels of capnproto or flatbuffers(https://github.com/google/flatbuffers).
Flatbuffers is much simpler than capnproto.

@aaronc
Copy link
Member Author

aaronc commented Feb 23, 2023

I like the idea of writing module in rust or wasm, but it seems we are reinventing the wheels of capnproto or flatbuffers(https://github.com/google/flatbuffers).

I'd say that's true. I'd prefer to stick with proto files for describing schemas because adding an additional layer for flatbuffers or capnproto would make the dev UX more complex. Although just going straight to flat buffers for instance isn't out of the question.

We did evaluate capnproto back in 2020 and considered its ecosystem too immature. The ergonomics of generated code wasn't great and I think we could improve on that substantially.

@itsdevbear
Copy link
Contributor

@aaronc there's almost for sure a way to write code generate via go generate with this approach, which is awesome.

@tac0turtle
Copy link
Member

going to close this for now but we will reopen it in the future

@tac0turtle tac0turtle closed this Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants