-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protobuf format #295
Add protobuf format #295
Conversation
@zpencer can you wrap things at 80 columns to be consistent with the other docs we have? |
@duglin Done |
my OCD thanks you :-) |
protobuf-format.md
Outdated
the type safe top level field. | ||
|
||
|
||
### 2.4 Relation to CloudEvents JSON format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the protobuf tooling can also produce JSON in addition to the binary format is a nice feature of that tooling, but because we have a normative JSON event format, this section is superfluous because it's not compliant with the normative JSON event format and does not add any normative value. This specification cannot define a normative JSON event format. This specification can only define a normative protobuf binary format based on the wire specification for the protobuf wire format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only allow JSON payloads then the proto message's standard JSON is the same as the CE normative JSON. Binary data would be a base64 encoded JSON string object and protos would be serialized as binary data, so we won't take advantage of proto's more efficient representations of them.
protobuf-format.md
Outdated
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||
document are to be interpreted as described in RFC2119. | ||
|
||
## 2. Protobuf format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An encoding specification must refer to an encoding definition, here the Protobuf wire format, and not to specific tooling. The appropriate part of your documentation is https://developers.google.com/protocol-buffers/docs/encoding
There are alternate implementations of the protobuf wire format that do not use any of the Google bits and those must be able to use this specification exactly as well as the Google tooling, e.g. https://github.com/mgravell/protobuf-net
References to a specific implementation that ties users of the specification to using that implementation are inappropriate for an open protocol standard.
It would be acceptable to provide non-normative references to tooling constructs like the IDL for situations where the wire specification for protobuf were falling short to explain specific constructs you introduce here, but this specification ultimately needs to allow for a clean-room "bytes on wire" implementation solely based on normative specification documents without resorting to existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the language to say that the IDL is used only as a concise way to illustrate what happens on the binary level. I'll state that the spec does not mandate the use of any particular protobuf implementation or IDL.
protobuf-format.md
Outdated
|
||
### 2.3 Extensions: | ||
|
||
`google.protobuf.Struct` ([more info][PROTO_STRUCT]) represents an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could indeed take this construct and move it up one level and make the entire event a key/value pair map. Since you seem to be okay with doing that for extensions, why not also do that for the standard attributes?
If you're principally worried about wire footprint, you could just use special-prefixed keys like "$0" and "$1" as shorthand for the well-known names.
The complete wire footprint in proto for "$0", including type identifier and length prefix is 4 bytes, literally "x12 x02 x24 x30". We have 8 well-known attributes right now (with at least 1 of them in doubt and only 4 required). If we'd reach into the ASCII control character range instead of resorting to printable chars, I could cut another byte, e.g. x12 x01 x01
If you allow for keys to be ether Varint or String, which is distinguishable from the wire encoding, the key footprint for a well-known element shrinks to 2 byte (type flag + shorthand integer).
When we look at the required attributes, that's between 8 and 16 bytes extra "overhead" compared to pure positional encoding, and that's even within the TLS padding range, i.e. if you save those 8-16 bytes, TLS (the AES 128-bit block cipher inside of it) will pad at least some if not all of those savings away with extra zeroes when you put a single event on the wire.
The effect of all this extra complication may quite literally come out to nothing.
(Mind that I'm not counting whatever the required overhead for the dictionary is because you've got that anyways for the extensions field defined here)
I do realize that this might constrain the value type choice for the standard dictionary entries at the tooling level, but this spec seems to be unconcerned about that for extension values, which appear to be encoded as JSON strings.
At the wire level, two lower two bits of the preamble of any encoded protobuf field actually indicates the wire type, which means that the format factually can encode polymorphic dictionaries that distinguish between numbers and strings, which is sufficient for the value space we have in CloudEvents.
protobuf-format.md
Outdated
"comExampleExtension1", | ||
Value.newBuilder().setStringValue("value1").build()) | ||
.putFields( | ||
"comExampleExtension2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would add some clarity if the field name was com_example_extension_2
?
If I get https://developers.google.com/protocol-buffers/docs/proto3#json correctly, the snake_case to camelCase conversion is applied to proto field names, but - I guess - the names within a Struct
aren't touched, as they are not proto fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not supposed to imply that the extension is a proto. An extension should be defined in terms of the CE type system, and CE attributes are in lowerCamelCase. I'll update this to be less distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that, per convention, extension attributes should be in camelCase. However, this is not a MUST, and a snake_cased attribute must not be transformed to camelCase.
I guess everything works as expected after I dug into the proto json spec, but it wasn't immediately clear to me why e.g. event_id
is turned into eventId
, but com_example_extension
isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, I think I misunderstood your original comment. In the case of extensions being in the Struct
, fields can be arbitrarily cased because it is a map keyed on case sensitive strings.
If we ever introduce a top level attribute that is not lowerCamelCase then we must use the json_name
option to override the json field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!
protobuf-format.md
Outdated
.build(); | ||
``` | ||
|
||
#### 2.3.1 CloudEvents upgrade process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels out of place for this doc. If we need text like this then it should really be in the main spec since this would seem to apply in general, not just for this one serialization, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this part since it's implied with the new section about forward compatibility.
protobuf-format.md
Outdated
|
||
If a field moves from the extensions bag to a top level field, then | ||
the producers and consumers of a CloudEvents system SHOULD coordinate | ||
in the upgrade process: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two concerns I have:
- the spec has no idea if a new property was in the extension bag or not - so I'm not sure what the first part of the sentence means from a spec perspective. To the spec, all it knows is that we added a new property. Its history is irrelevant.
- the coordination during an upgrade process doesn't seem viable since in a distributed system (especially across the internet) we pretty much have to assume the opposite... zero coordination and expect people to only upgrade at a time that is convenient for them (both sides of the wire).
protobuf-format.md
Outdated
1. Initially, the producers and consumers are using CloudEvents 1.0 | ||
and the extension is expressed in the “extensions” bag. | ||
1. CloudEvents 1.1 is released. | ||
1. The producers write the extension to both the extensions bag and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the obvious concern about duplication of data, or worse, these two bits of data being different - can someone provide a URL to any spec out there that has a similar requirement? I can't think of any spec (JSON or otherwise) that requires this type of behavior and so I'd like to see how/why they do it. Since I don't think our spec is doing anything unusual I don't understand why our spec needs to have such special rules.
protobuf-format.md
Outdated
| Timestamp | google.protobuf.Timestamp | ||
| Map | google.protobuf.Struct | ||
| Object | google.protobuf.Value | ||
| Integer | int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor thing, but if you're going to go thru the trouble to align most of the columns, you might as well align all of them :-)
In addition to the technical feedback, I need to point out this submission does not qualify for inclusion into the core standard on the grounds that Protobuf is, as it stands, is not a qualifying encoding per our adopted rules, because it's not managed under and not owned by a vendor-neutral organization. We're aware that this hasn't been raised as an issue in some other CNCF projects as of yet. |
Here are some reasons why I think Protocol Buffers meets the bar for a qualifying encoding. Protocol buffers is a widely used format that has been open sourced by Google for a very long time now.
I believe these characteristics, and the strong and vibrant ecosystem around protocol buffers implementations and adoption, qualify protobuf as a format for encoding Cloud Events. |
@hsaliak I appreciate the argument, but CNCF exists specifically as neutral ground for multiparty collaboration while insulating all participants from a host of legal complications. Donating ProtoBuf to CNCF would resolve this. |
When we talked about the language describing a de-facto standard, we specifically talked about not requiring something to be donated to a foundation to be considered a de-facto standard -- that was supposed to be an example, not the rule, and we agreed to later word-smith as needed. Submitted: https://github.com/cloudevents/spec/pull/298/files?utf8=%E2%9C%93&diff=unified&w=1 the main point is to change "of" to "or" -- at the time, naming Kafka seemed to be a reasonable way to describe something that originated at a company and became more widely used, and now it is being used to introduce a requirement which seems to go against the intent of calling something a "de-facto standard" |
There's obviously been a lot of discussion on this already but here are my thoughts..
|
@ultrasaurus Whether something is a de-facto standard rather than a formal standard is a different concern from who owns the technology. Both concerns do apply for Protobuf. The concern that weighs more heavily is that while Protobuf may be widely used, is a single-vendor controlled technology (the current license is irrelevant because the owner can change all terms at any time) and that being the case has legal implications for multivendor standards efforts. Our legal team can talk to yours if that needs further clarification. |
The CNCF is not a standards body and does not seek to act like one: Many CNCF projects take dependencies on and/or support things that aren't official standards and/or are non-foundation-owned open-source building blocks. YAML is not a standard, for example. Go is used by several projects. Kubernetes has a .NET client library. I don't think we want to set the precedent that foundation projects can't support widely used non-official-standard/non-foundation-owned specifications, languages, libraries, etc. that meet our dependency licensing requirements. /cc @caniszczyk @monadic It is reasonable to consider what representation(s) in this specification might be most useful for a binary encoding, such as protocol buffers. Since official documentation, stackoverflow posts, and other documentation sources will likely be in the form of the "official" library interfaces, I think it's useful to include for explanatory purposes in addition to the binary representation. Perhaps one of the more popular non-Google implementations would be useful, also, such as https://github.com/dcodeIO/protobuf.js |
cc @kenowens12 |
Agree with @bgrant0607! Let's look at rewording this and adding flexibility. |
@bgrant0607 Another principle in the document you're quoting is that projects are self-governing. I understand that CNCF doesn't want to be a standards body, but this particular WG is rather difficult to distinguish from a OASIS TC in how it operates. We have a rule recognizing de-facto protocol standards emerging out of major foundation OSS projects such as the Apache Kafka protocol. .NET, since you mention it, is run under the umbrella of the .NET Foundation. Go is owned by the "Go authors" per license. YAML is owned by a group of individuals. That's all rather different from a toolset fully controlled by a single major vendor that can change and has changed dramatically between revisions or that the vendor could walk away from in favor of the next hot thing tomorrow - and history shows that this isn't a far-fetched scenario. This group has literally been asked to adapt the JSON event format to the particular idiosyncrasies of that toolset and there's been a fairly direct threat to sabotage the standard if the WG does not comply: "[...] Because proto is an IDL it also defines JSON interfaces as well. We are hoping for a JSON spec for which we can reverse engineer a proto shape to match. Otherwise Google will not be able to keep from releasing an alternate JSON binding. [...]" I don't think we want to set the precedent that foundation projects must bend to the needs of proprietary projects. If Protobuf were to follow gRPC and be donated into CNCF, and the focus of the Protobuf event format were strictly on the binary wire format, along with explanatory references to toolsets for producing that format as I indicated in my review comment above and which you appear to agree with, my concerns would be addressed. |
Thanks for pointing out the comment in the other thread on #277. I read that as an existing technical limitation rather than a threat, but I will look into it. I don't see how contributing protobuf to CNCF would address that specific technical concern, though. I'll have to respond to self-governance vs. our principles later. |
@clemensv I need to address your claim that explaining an implication of a proposed change is a "fairly direct threat to sabotage the standard". It's not. You've misunderstood that comment entirely. Here's the substantive paragraph from the comment you linked to:
You seem you have misunderstood the last sentence. That's someone explaining that the change forces proto implementations to support the JSON spec generated by the Proto IDL and a JSON spec that matching future versions of CloudEvents. That's not a threat; that's trying to make the spec as widely implementable as possible. |
Thanks, @rachelmyers. Since this appears to be getting heated, I'll remind everyone of our code of conduct: To quote from our principles document: CNCF “specification” materials are not “standards”. It is in future possible that an independent and recognized international standards body takes a CNCF document as “upstream” and evolves it into a standard via (e.g.) the IETF process. The CNCF is morally supportive of independent parties doing this, but does not see this work as its own responsibility. For that matter, to date, no such exercise is in process. In general CNCF specifications will evolve as “living documents” side by side with the CNCF OSS projects that adopt them. This means they are “dynamic and fast-moving” whereas a typical IETF standard is ”static” and not subject to frequent updates & new releases. For the avoidance of doubt: Any written “specification” materials in CNCF shall have the same status as CNCF open source software and project technical docs. That is to say that specifications shall be updated and release versioned according to the conventions of a CNCF open source project. The CNCF IP policy, which includes whitelisted and blacklisted open-source licenses, is in section 11 of the charter: There is no foundation-level policy about independence of project dependencies. Hopefully that was clear before CloudEvents was inducted into the sandbox. Coming back to the issue of protocol buffers specifically, yes, I agree some reasonable concerns about the form of the document were raised. However, on the issue of ownership: I don't understand the objective of not having "official" specifications and/or SDK support for widely used formats, or the distinction being made in the YAML case (I thought I'd seen a proposal for an incompatible YAML2, but can't find it right now). Is the desired outcome for fewer formats to be used? Why wouldn't the outcome be fragmentation and inconsistency among the formats used, and/or for adoption to be impeded? I don't know whether the point raised about the addition of extensions was addressed. |
I've asked for some off-line discussions to happen around this. I'd like to ask that we hold off further discussions in this PR until those have happened. |
@clemensv , @rachelmyers , @bgrant0607 and I had a chat the other day concerning this PR. The net results were:
|
I just want to confirm that CNCF/LF legal is engaged with Microsoft legal to address their concerns. |
I'll be sending out an update to this PR in the next couple of days. |
@zpencer thanks for the update! |
Any update on this @zpencer? Fantastic job by the way! |
@glerchundi there are some discussions happening internally about how to address the PR feedback. I will post an update to the PR as soon as it is ready. |
@zpencer thanks! |
| URI | string_value (string expression conforming to URI-reference as defined in RFC 3986 §4.1) | ||
| Timestamp | string_value (string expression as defined in RFC 3339.) | ||
| Map | map_value | ||
| Integer | int_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its obvious and not needed, but should "Any" be listed in here just for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on Any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
@zpencer While I like how short the doc is, it feels like it ends abruptly when I'm used to seeing some kind of example after the "Definition" section. Maybe it's not needed for someone who knows protobuf and if so then ignore this comment. |
Signed-off-by: Spencer Fang <spencerfang@google.com>
Signed-off-by: Spencer Fang <spencerfang@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added an example using the Java version of the protocol Buffers library.
protobuf-format.md
Outdated
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||
document are to be interpreted as described in RFC2119. | ||
|
||
## 2. Protobuf format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the language to say that the IDL is used only as a concise way to illustrate what happens on the binary level. I'll state that the spec does not mandate the use of any particular protobuf implementation or IDL.
protobuf-format.md
Outdated
.build(); | ||
``` | ||
|
||
#### 2.3.1 CloudEvents upgrade process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this part since it's implied with the new section about forward compatibility.
| URI | string_value (string expression conforming to URI-reference as defined in RFC 3986 §4.1) | ||
| Timestamp | string_value (string expression as defined in RFC 3339.) | ||
| Map | map_value | ||
| Integer | int_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on Any
protobuf-format.md
Outdated
## 1. Introduction | ||
|
||
This specification defines how the [Context | ||
Attributes](spec.md#context-attributes) Attributes defined in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 2nd "Attributes" (after the link) is a repeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
protobuf-format.md
Outdated
Buffers](https://github.com/protocolbuffers/protobuf) includes support | ||
for an interface descriptor language (IDL), and this document makes | ||
use of language level 3 IDL from Protocol Buffers v3.5.0. CloudEvents | ||
systems using Protocol Buffers are not required to use the IDL or any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/required/manadated/ to avoid the RFC checker... which reminds me, can you modify the Makefile as part of this PR so that the RFC checker is run on this doc too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I'd like to see this more closely mirror that of the json-format.md specification, eg..
As a general comment part of me would like to see a formal |
@zpencer any chance you can address the open question before Tuesday next week? It would be great if we could get this merged for v0.2 - which we're hoping to have for KubeCon Seattle. |
- s/Attributes Atttributes/Attributes/ - s/required/mandated/ - Add protobuf-format.md to Makefile - add cloudevent.proto Signed-off-by: Spencer Fang <spencerfang@google.com>
This was pushed too soon. Stand by for the missing section. |
Signed-off-by: Spencer Fang <spencerfang@google.com>
I thought about this some more and I'm not sure structured / binary mode really apply for the proto format. If we had ended up with type safe fields in the proto, and some way to embed a proto inside (like with a
Added |
protobuf-format.md
Outdated
perform automatic serialization and deserialization. The [Protocol | ||
Buffers specification defines a well-known encoding | ||
format](https://developers.google.com/protocol-buffers/docs/encoding) | ||
which is the basis of this specification. Specifications below may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/may/MAY/
What do you mean by "below"? Do you mean later on in the this doc? Which specs are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded this section. Apologies for the strange wording.
@zpencer - Should there at least be some guidance around contentType(s) so that the I could send the played across other transports such as HTTP. AMQP, etc.. To the best of my knowledge there isn't a registered media-type for a proto object, absent that (if that's still true) we should provide guidance to allow interop. |
Signed-off-by: Spencer Fang <spencerfang@google.com>
I believe the closest thing to a proto content type is the expired IETF draft which registers |
Approved on the Nov 8 call. |
This PR is meant for the spec as it is currently written, where the
extensions is an explicit bag. The focus of this PR is on the protobuf
format itself, not on the extensions proposal.
Signed-off-by: Spencer Fang spencerfang@google.com