-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Unknown fields #769
base: master
Are you sure you want to change the base?
Conversation
d6ce46f
to
377aaa4
Compare
You can try it out by running the test suite in |
|
||
impl PartialEq for UnknownField { | ||
fn eq(&self, other: &Self) -> bool { | ||
self.tag == other.tag && self.wire_type == other.wire_type && self.bytes == other.bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont actually think this is correct. without knowing the message each field originated from, the fact that they have the same tag, wire type, and bytes doesnt mean they are the same field semantically. you can make the choice that this is how you implement it, but choosing that unknown things are not Eq at all is equally valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, and I see why electing not to implement PartialEq
could be a reasonable choice. For example, if I received a Foo
message and a Bar
message, each with a single unknown field at tag 2
with wire type LEN
and zero bytes, I wouldn't know if these fields represent the same thing. Indeed, maybe Foo
includes an empty string, but Bar
includes a default value Baz
message. It might be safest to just not compare these unknown fields.
On the other hand, it might be reasonable to say that qua protobuf unknown field, these really are the same thing. An analogy: the semantics of the number 42
are different in different places (e.g. in one struct it might represent a sequential id, but in another it might represent a quantity), but qua number, 42
is 42
. Of course, I recognize that there are deficiencies in this analogy, and it is hardly dispositive.
That said, one of the reasons I would like to implement PartialEq
for unknown fields is to avoid breaking the PartialEq
implementations of structs in generated code. If I update my build.rs
script so that my generated structs include unknown fields, I don't want to lose the ability to compare generated structs with each other. I see at least a couple of options here:
- Give up the
PartialEq
impl for these generated structs. - Generate a custom
PartialEq
impl for the generated struct which does not compare the unknown fields (ie. two structs which vary only in the unknown fields would be considered equal). - Generate a custom
PartialEq
impl for the generated struct which compares the unknown fields as if there were aPartialEq
impl for them (compare tag, wiretype, and bytes), but don't actually provide aPartialEq
impl for theUnknownField
andUnknownFields
structs. - Include a
PartialEq
impl forUnknownFields
andUnknownField
.
There may also be some other options which involve generating message-specific unknown fields types (e.g. FooUnknownFields
for the Foo
struct), or other approaches.
The present PR suggests (4). You have noted that (1) is also a reasonable choice. Between (2) and (3), I would favor (3), since I can easily imagine being surprised to find that Foo
with some unknown fields is the "same" as Foo
without any unknown fields.
The present PR does not attempt to fully deserialize the unknown fields. That is, if an unknown field contains a message, the present PR would just create a single unknown field struct for that nested message, no matter how big and deep it is; indeed, I don't think we could fully deserialze things, at the risk of mistaking a string for a message (though I could be mistaken about this; using the --decode-raw
option with protoc
seems to be able to distinguish the two). Because of this, there are some other deficiencies in (3) and (4). In particular, if an unknown field contains a message, the present implementation would consider two otherwise identical messages to be different if their field order over the wire differed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually would favour (2), for whats its worth. i feel that Eq on the message containing unknown feels should check the fields we know/care about and assert Eq correctly. the case that for comparing opaque unknown fields can be handled just as easily by comparing their byte repr directly, and is more clear to a user that you are not taking any semantic information into account by doing so (per your analogy, we are asserting that 42 is indeed 42, regardless of context). Doing Eq on a byte buffer containing a key and some bytes is equivalent to your implementation, minus validating that the key is valid.
|
||
impl PartialEq for UnknownFields { | ||
fn eq(&self, other: &Self) -> bool { | ||
self.fields == other.fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that we treat the unknown fields as different if their order differs.
Hey, I just wanted to follow up on this draft PR. Is the feature desirable? Is the general approach (adding a catch-all list of unknown fields, keeping their values as raw bytes) reasonable? Would it be worth me attempting to refine the PR with an eye toward getting it merged? Thanks! |
Hi sorry for the delay, I think the discussion around uknown fields/extensions etc is being discussed here #674. Since these are quite big changes to prost I'd like to go through a proper design process. |
Thanks for getting back to me. A full design process for something like this makes sense. Indeed, this draft PR is mostly given in the spirit of starting a conversation like that. I should note that the particular problem I am interested in resolving involves unknown fields as such, rather than extensions or custom options, though I can see how the way one deals with one of these problems might inform the other. I manage a couple of tonic services such that service A sends a message to service C, proxied through service B (there are some additional details about the proxying which make it difficult for service B to just pass the encoded protobuf through to C without inspecting it and even changing it). I would like to be able to add a field to my message and update services A and C to include that field, without having to update service B as well. This new field is not meant to be available for third-party extensions though; it's just part of the new/future versions of my API. Anyway, thanks again for your response. Please let me know how best I can help with this feature, the design discussion, etc. Thanks! |
Hello, Just wanted to let you know that I'm very interested in this feature too, we have multiple processes which write in the same local DB (KV) where those one can have different version of the schema. |
@jasonahills Yeah, I think they are all related and the solution we have for extensions should cover your use case as well. I don't have much time to push forward on actually writing this feature but if someone wants to champion the design/implementation I would be more than happy to mentor. |
A proof-of-concept implementation allowing users to opt in for preserving unknown fields. There is more cleanup to do on this implementation, but if the approach is promising, it would address the issue for adding support for unknown fields.
This PR adds a
prost-examples
crate which illustrates thegenerate_unknown_fields
build option.