Draft: Unknown fields #769

jasonahills · 2022-11-30T23:21:11Z

A proof-of-concept implementation allowing users to opt in for preserving unknown fields. There is more cleanup to do on this implementation, but if the approach is promising, it would address the issue for adding support for unknown fields.

This PR adds a prost-examples crate which illustrates the generate_unknown_fields build option.

jasonahills · 2022-11-30T23:31:49Z

You can try it out by running the test suite in prost-examples.

VladimirBramstedt · 2022-12-02T12:12:18Z

src/unknown.rs

+
+impl PartialEq for UnknownField {
+    fn eq(&self, other: &Self) -> bool {
+        self.tag == other.tag && self.wire_type == other.wire_type && self.bytes == other.bytes


i dont actually think this is correct. without knowing the message each field originated from, the fact that they have the same tag, wire type, and bytes doesnt mean they are the same field semantically. you can make the choice that this is how you implement it, but choosing that unknown things are not Eq at all is equally valid.

I see your point, and I see why electing not to implement PartialEq could be a reasonable choice. For example, if I received a Foo message and a Bar message, each with a single unknown field at tag 2 with wire type LEN and zero bytes, I wouldn't know if these fields represent the same thing. Indeed, maybe Foo includes an empty string, but Bar includes a default value Baz message. It might be safest to just not compare these unknown fields.

On the other hand, it might be reasonable to say that qua protobuf unknown field, these really are the same thing. An analogy: the semantics of the number 42 are different in different places (e.g. in one struct it might represent a sequential id, but in another it might represent a quantity), but qua number, 42 is 42. Of course, I recognize that there are deficiencies in this analogy, and it is hardly dispositive.

That said, one of the reasons I would like to implement PartialEq for unknown fields is to avoid breaking the PartialEq implementations of structs in generated code. If I update my build.rs script so that my generated structs include unknown fields, I don't want to lose the ability to compare generated structs with each other. I see at least a couple of options here:

Give up the PartialEq impl for these generated structs.

Generate a custom PartialEq impl for the generated struct which does not compare the unknown fields (ie. two structs which vary only in the unknown fields would be considered equal).

Generate a custom PartialEq impl for the generated struct which compares the unknown fields as if there were a PartialEq impl for them (compare tag, wiretype, and bytes), but don't actually provide a PartialEq impl for the UnknownField and UnknownFields structs.

Include a PartialEq impl for UnknownFields and UnknownField.

There may also be some other options which involve generating message-specific unknown fields types (e.g. FooUnknownFields for the Foo struct), or other approaches.

The present PR suggests (4). You have noted that (1) is also a reasonable choice. Between (2) and (3), I would favor (3), since I can easily imagine being surprised to find that Foo with some unknown fields is the "same" as Foo without any unknown fields.

The present PR does not attempt to fully deserialize the unknown fields. That is, if an unknown field contains a message, the present PR would just create a single unknown field struct for that nested message, no matter how big and deep it is; indeed, I don't think we could fully deserialze things, at the risk of mistaking a string for a message (though I could be mistaken about this; using the --decode-raw option with protoc seems to be able to distinguish the two). Because of this, there are some other deficiencies in (3) and (4). In particular, if an unknown field contains a message, the present implementation would consider two otherwise identical messages to be different if their field order over the wire differed.

i actually would favour (2), for whats its worth. i feel that Eq on the message containing unknown feels should check the fields we know/care about and assert Eq correctly. the case that for comparing opaque unknown fields can be handled just as easily by comparing their byte repr directly, and is more clear to a user that you are not taking any semantic information into account by doing so (per your analogy, we are asserting that 42 is indeed 42, regardless of context). Doing Eq on a byte buffer containing a key and some bytes is equivalent to your implementation, minus validating that the key is valid.

jasonahills · 2022-12-02T18:59:36Z

src/unknown.rs

+
+impl PartialEq for UnknownFields {
+    fn eq(&self, other: &Self) -> bool {
+        self.fields == other.fields


This assumes that we treat the unknown fields as different if their order differs.

jasonahills · 2023-01-10T16:14:35Z

Hey, I just wanted to follow up on this draft PR. Is the feature desirable? Is the general approach (adding a catch-all list of unknown fields, keeping their values as raw bytes) reasonable? Would it be worth me attempting to refine the PR with an eye toward getting it merged? Thanks!

LucioFranco · 2023-01-10T16:19:34Z

Hi sorry for the delay, I think the discussion around uknown fields/extensions etc is being discussed here #674. Since these are quite big changes to prost I'd like to go through a proper design process.

jasonahills · 2023-01-10T17:23:45Z

Thanks for getting back to me. A full design process for something like this makes sense. Indeed, this draft PR is mostly given in the spirit of starting a conversation like that.

I should note that the particular problem I am interested in resolving involves unknown fields as such, rather than extensions or custom options, though I can see how the way one deals with one of these problems might inform the other.

I manage a couple of tonic services such that service A sends a message to service C, proxied through service B (there are some additional details about the proxying which make it difficult for service B to just pass the encoded protobuf through to C without inspecting it and even changing it). I would like to be able to add a field to my message and update services A and C to include that field, without having to update service B as well. This new field is not meant to be available for third-party extensions though; it's just part of the new/future versions of my API.

Anyway, thanks again for your response. Please let me know how best I can help with this feature, the design discussion, etc. Thanks!

QuentinPerez · 2023-01-10T17:46:25Z

Hey, I just wanted to follow up on this draft PR. Is the feature desirable? Is the general approach (adding a catch-all list of unknown fields, keeping their values as raw bytes) reasonable? Would it be worth me attempting to refine the PR with an eye toward getting it merged? Thanks!

Hello, Just wanted to let you know that I'm very interested in this feature too, we have multiple processes which write in the same local DB (KV) where those one can have different version of the schema.

LucioFranco · 2023-01-10T17:49:05Z

@jasonahills Yeah, I think they are all related and the solution we have for extensions should cover your use case as well. I don't have much time to push forward on actually writing this feature but if someone wants to champion the design/implementation I would be more than happy to mentor.

jasonahills added 6 commits November 30, 2022 16:23

adds example for testing

40861e4

can generate the code

47838c8

adds unknown fields to derive macro (non-functional)

522fefd

i cannot believe it works

eb820e2

tests with one-of

6a63ace

a little cleanup

377aaa4

jasonahills force-pushed the unknown-fields branch from d6ce46f to 377aaa4 Compare November 30, 2022 23:25

removes unused variables

eadf4d9

formatting changes

50f2484

VladimirBramstedt reviewed Dec 2, 2022

View reviewed changes

jasonahills commented Dec 2, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Unknown fields #769

Draft: Unknown fields #769

jasonahills commented Nov 30, 2022

jasonahills commented Nov 30, 2022

VladimirBramstedt Dec 2, 2022

jasonahills Dec 2, 2022

VladimirBramstedt Dec 4, 2022

jasonahills Dec 2, 2022

jasonahills commented Jan 10, 2023

LucioFranco commented Jan 10, 2023

jasonahills commented Jan 10, 2023

QuentinPerez commented Jan 10, 2023 •

edited

Loading

LucioFranco commented Jan 10, 2023

Draft: Unknown fields #769

Are you sure you want to change the base?

Draft: Unknown fields #769

Conversation

jasonahills commented Nov 30, 2022

jasonahills commented Nov 30, 2022

VladimirBramstedt Dec 2, 2022

Choose a reason for hiding this comment

jasonahills Dec 2, 2022

Choose a reason for hiding this comment

VladimirBramstedt Dec 4, 2022

Choose a reason for hiding this comment

jasonahills Dec 2, 2022

Choose a reason for hiding this comment

jasonahills commented Jan 10, 2023

LucioFranco commented Jan 10, 2023

jasonahills commented Jan 10, 2023

QuentinPerez commented Jan 10, 2023 • edited Loading

LucioFranco commented Jan 10, 2023

QuentinPerez commented Jan 10, 2023 •

edited

Loading