Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Unknown fields #769

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jasonahills
Copy link

A proof-of-concept implementation allowing users to opt in for preserving unknown fields. There is more cleanup to do on this implementation, but if the approach is promising, it would address the issue for adding support for unknown fields.

This PR adds a prost-examples crate which illustrates the generate_unknown_fields build option.

@jasonahills
Copy link
Author

You can try it out by running the test suite in prost-examples.


impl PartialEq for UnknownField {
fn eq(&self, other: &Self) -> bool {
self.tag == other.tag && self.wire_type == other.wire_type && self.bytes == other.bytes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont actually think this is correct. without knowing the message each field originated from, the fact that they have the same tag, wire type, and bytes doesnt mean they are the same field semantically. you can make the choice that this is how you implement it, but choosing that unknown things are not Eq at all is equally valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, and I see why electing not to implement PartialEq could be a reasonable choice. For example, if I received a Foo message and a Bar message, each with a single unknown field at tag 2 with wire type LEN and zero bytes, I wouldn't know if these fields represent the same thing. Indeed, maybe Foo includes an empty string, but Bar includes a default value Baz message. It might be safest to just not compare these unknown fields.

On the other hand, it might be reasonable to say that qua protobuf unknown field, these really are the same thing. An analogy: the semantics of the number 42 are different in different places (e.g. in one struct it might represent a sequential id, but in another it might represent a quantity), but qua number, 42 is 42. Of course, I recognize that there are deficiencies in this analogy, and it is hardly dispositive.

That said, one of the reasons I would like to implement PartialEq for unknown fields is to avoid breaking the PartialEq implementations of structs in generated code. If I update my build.rs script so that my generated structs include unknown fields, I don't want to lose the ability to compare generated structs with each other. I see at least a couple of options here:

  1. Give up the PartialEq impl for these generated structs.
  2. Generate a custom PartialEq impl for the generated struct which does not compare the unknown fields (ie. two structs which vary only in the unknown fields would be considered equal).
  3. Generate a custom PartialEq impl for the generated struct which compares the unknown fields as if there were a PartialEq impl for them (compare tag, wiretype, and bytes), but don't actually provide a PartialEq impl for the UnknownField and UnknownFields structs.
  4. Include a PartialEq impl for UnknownFields and UnknownField.

There may also be some other options which involve generating message-specific unknown fields types (e.g. FooUnknownFields for the Foo struct), or other approaches.

The present PR suggests (4). You have noted that (1) is also a reasonable choice. Between (2) and (3), I would favor (3), since I can easily imagine being surprised to find that Foo with some unknown fields is the "same" as Foo without any unknown fields.

The present PR does not attempt to fully deserialize the unknown fields. That is, if an unknown field contains a message, the present PR would just create a single unknown field struct for that nested message, no matter how big and deep it is; indeed, I don't think we could fully deserialze things, at the risk of mistaking a string for a message (though I could be mistaken about this; using the --decode-raw option with protoc seems to be able to distinguish the two). Because of this, there are some other deficiencies in (3) and (4). In particular, if an unknown field contains a message, the present implementation would consider two otherwise identical messages to be different if their field order over the wire differed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually would favour (2), for whats its worth. i feel that Eq on the message containing unknown feels should check the fields we know/care about and assert Eq correctly. the case that for comparing opaque unknown fields can be handled just as easily by comparing their byte repr directly, and is more clear to a user that you are not taking any semantic information into account by doing so (per your analogy, we are asserting that 42 is indeed 42, regardless of context). Doing Eq on a byte buffer containing a key and some bytes is equivalent to your implementation, minus validating that the key is valid.


impl PartialEq for UnknownFields {
fn eq(&self, other: &Self) -> bool {
self.fields == other.fields
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that we treat the unknown fields as different if their order differs.

@jasonahills
Copy link
Author

Hey, I just wanted to follow up on this draft PR. Is the feature desirable? Is the general approach (adding a catch-all list of unknown fields, keeping their values as raw bytes) reasonable? Would it be worth me attempting to refine the PR with an eye toward getting it merged? Thanks!

@LucioFranco
Copy link
Member

Hi sorry for the delay, I think the discussion around uknown fields/extensions etc is being discussed here #674. Since these are quite big changes to prost I'd like to go through a proper design process.

@jasonahills
Copy link
Author

Thanks for getting back to me. A full design process for something like this makes sense. Indeed, this draft PR is mostly given in the spirit of starting a conversation like that.

I should note that the particular problem I am interested in resolving involves unknown fields as such, rather than extensions or custom options, though I can see how the way one deals with one of these problems might inform the other.

I manage a couple of tonic services such that service A sends a message to service C, proxied through service B (there are some additional details about the proxying which make it difficult for service B to just pass the encoded protobuf through to C without inspecting it and even changing it). I would like to be able to add a field to my message and update services A and C to include that field, without having to update service B as well. This new field is not meant to be available for third-party extensions though; it's just part of the new/future versions of my API.

Anyway, thanks again for your response. Please let me know how best I can help with this feature, the design discussion, etc. Thanks!

@QuentinPerez
Copy link
Contributor

QuentinPerez commented Jan 10, 2023

Hey, I just wanted to follow up on this draft PR. Is the feature desirable? Is the general approach (adding a catch-all list of unknown fields, keeping their values as raw bytes) reasonable? Would it be worth me attempting to refine the PR with an eye toward getting it merged? Thanks!

Hello, Just wanted to let you know that I'm very interested in this feature too, we have multiple processes which write in the same local DB (KV) where those one can have different version of the schema.

@LucioFranco
Copy link
Member

@jasonahills Yeah, I think they are all related and the solution we have for extensions should cover your use case as well. I don't have much time to push forward on actually writing this feature but if someone wants to champion the design/implementation I would be more than happy to mentor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants