-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict character set (keep case-sensitivity/camelCase), dashes for HTTP case-insensitivity #317
Restrict character set (keep case-sensitivity/camelCase), dashes for HTTP case-insensitivity #317
Conversation
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
* `eventID` maps to `CE-EventID` | ||
* `cloudEventsVersion` maps to `CE-CloudEventsVersion` | ||
* `eventTime` maps to `CE-Event-Time` | ||
* `eventID` maps to `CE-Event-I-D` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't beautiful - does someone has a good suggestion for mapping abbreviations? Alternatively, we could define an exception for eventID
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards special casing well-known abbreviations - e.g. -ID, -URL, -URI... and just listing them in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duglin do you have any reservations about the fact that we would have to list every possible well-known abbreviation, since adding them to later versions would be a breaking change? IMHO the Event-I-D
is understandable and "good enough" without introducing cruft around huge lists of "reserved words", which would presumably be in flux at least before v1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup - that's a real concern and I'm not thrilled with pretty much defining one list for the lifetime of v1.0. So, Event-I-D
might be the best choice if we do head down this path - if nothing else its at least consistent, which I do like :-)
* `eventTime` maps to `CE-EventTime` | ||
* `eventID` maps to `CE-EventID` | ||
* `cloudEventsVersion` maps to `CE-CloudEventsVersion` | ||
* `eventTime` maps to `CE-Event-Time` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CE-Event-Time
is now more inline with the HTTP Convention (e.g. Content-Type
is also separated by a dash) than the previous `CE-EventTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone mentioned on the call the redundancy of the word "event", does it make things a bit cleaner to omit that from properties?
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e: time
maps to ce-time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we did this, we should probably do it for all properties that have the word "event" in them - like EventID.
However, I kind of like having it there even if it does seem redundant in some cases. Mainly because it avoids the risk of it overlapping with some other property. When we talked about future properties or extensions we always talked about how people should pick good name to avoid potential conflicts. To me that means avoiding really generic terms that could be used for other purposes. I think "Time" would fall into that category since there could be many different "times" added as properties. Keeping "Event" in there (to me) scopes it to a CloudEvent property - where we've (hopefully) clearly defined its semantics/meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think it is a valid concern that can be discussed, I'm not sure if this PR is the best place to do so - @PerryBirch what do you think about opening a separate issue (or PR) instead?
I think this PR can be approved or rejected independently of that.
spec.md
Outdated
@@ -170,6 +174,9 @@ event consumers can easily access this information without needing to decode | |||
and examine the event data. Such identity attributes can also be used to | |||
help intermediate gateways determine how to route the events. | |||
|
|||
Extension Attributes MUST follow the [Attribute Naming | |||
Convention](spec.md#attribute-naming-convention). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor thing, but I think the "spec.md" is unnecessary in this link reference.
http-transport-binding.md
Outdated
@@ -163,16 +163,23 @@ Except for attributes [explicitly handled in this specification] | |||
HTTP header mapping of well-known CloudEvents attributes is that | |||
each attribute name MUST be prefixed with "CE-". | |||
|
|||
Note: per the [HTTP](https://tools.ietf.org/html/rfc7230#section-3.2) | |||
specification, header names are case-insensitive. Upper-case letters are | |||
prefixed with a dash ("-"). When converting a HTTP Header Name back, the dash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh geez - that's neat/tricky! :-)
http-transport-binding.md
Outdated
Note: per the [HTTP](https://tools.ietf.org/html/rfc7230#section-3.2) | ||
specification, header names are case-insensitive. Upper-case letters are | ||
prefixed with a dash ("-"). When converting a HTTP Header Name back, the dash | ||
converts the following letter back to upper-case, all other letters are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except the first one, right? since its camelCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks! 310c06f
CloudEvents attributes use "camelCasing" for the object member names, to aid | ||
integration with common programming languages. | ||
To aid integration with common programming languages, CloudEvents attributes | ||
MUST only contain alpha-numeric characters, and MUST start with a letter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people need to use underscores or dashes in their names? I think those would be the two most likely problem characters for people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to address this here.
My thinking is that e.g. the Python SDK would want to use the conventional snake_case. So, with the Python SDK, one may want to publish a CloudEvent like this:
ce.event_type = "..."
ce.event_id = "..."
ce.some_extension = "..."
Given that everything is snake_case, it is only natural to use it for an extension name.
In e.g. the JavaScript SDK, I'd like to consume the CloudEvent in camelCase.
console.log(ce.eventType);
console.log(ce.eventId);
console.log(ce.someExtension);
However, the conversion can not be performed without (potential) loss if both someExtension
and some_extension
are valid names. E.g. if the JavaScript SDK sends a CloudEvent with bothsome_extension
and someExtension
, which one does the Python ce.some_extension
refer to?
In a way, disallowing underscores and dashes in the CloudEvents convention allows them to be used in other conventions (e.g. Python or HTTP).
It is quite restrictive though, and it is the main reason I was a bit hesitant to open this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but isn't ce.event_id
problematic? If abbreviations are all-capitals in camelCase, the automatic conversion doesn't really work without a huge "reserved words" abbreviation list, as eventID
would become event_i_d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the SDK would have to define a special mapping for ID
in general or eventID
specifically.
overall, this is a really interesting approach. I liked how you related it back to HTTP's use of dashes. |
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
bb9c8bb
to
310c06f
Compare
"CE-", an infix reflecting the map attribute followed by a dash | ||
("-"), and the name of the map entry key, e.g. "CE-attrib-key". | ||
"CE-", an infix reflecting the map attribute followed by a plus | ||
("+"), and the name of the map entry key, e.g. "CE-attrib+key". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps rather than a +, we add a new CEAttrib-
or CEA-
? or maybe CE+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess the example is too short.
It is about map attributes, e.g.
{
"aMap": {
"firstAttribute": ...,
"anotherMap": {
"secondAttribute": ...
}
}
}
That would map to:
CE-A-Map+First-Attribute
CE-A-Map+Another-Map+Second-Attribute
E.g. CEAttrib-A-Map-First-Attribute
won't allow me to deconstruct the map.
Does that make sense - or did I not get the point of your comment @n3wscott ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a higher-level question... why are we transforming maps at all? By that I mean, why isn't the value of the property (the map) just put into the HTTP as a string like all other properties? If someone really wanted each nested property as a stand-alone http header then they can create a new top-level extension property.
is:
{
"myPets": {
"dog": "fido",
"cat": "lilly"
}
}
really any better than:
{
"myPet-dog": "fido",
"myPet-cat": "lilly"
}
I'm not suggesting that we ban maps or anything, but if people are using the binary/raw format then there's a good chance that performance is one of their reasons, and adding additional processing to deconstruct maps seems to be contradictory to that goal.
If people really do need the JSON nesting, they're free to do so, but then let's not try to find a way to map all possible JSON objects into HTTP header names - that sounds like a nightmare. People can just parse the JSON/string themselves by putting the map as the HTTP header value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid question, I'd also be fine with that. I can revert my changes to the two lines, as it isn't the main point of this PR, and we could open up an independent PR to address #265 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I wouldn't change anything yet until we hear from others - basing any decision on my warped POV is risky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards deconstructing maps, and think +
is a good separator, mainly due to fears of passing web server header size limits if serialising to the value. Apache has a default limit of 8K.
Also I'd be hesitant to make assumptions about perf of destructuring into flat key-value pairs vs. serialisation of huge maps into JSON without some benchmarks. With deconstructed maps deserialising doesn't need JSON parsing.
See #321 as alternate proposal |
@clemensv Our proposals have quite a few similarities in the |
On today's call we discussed this topic and narrowed our choices down to this PR and #321. We then decided to start a vote between the two. Please go to #321 to vote regardless of which PR you would like to vote for. Voting members are: Adobe, Alibaba, Confluent, ControlBEAM, Fujitsu, Google, Huawei, IBM, Itau Unibanco, Kyyti Group, Microsoft, NAIC, NATS/Synadia, Nordstrom, Oracle, PayPal, Red Hat, SAP, Serverless, Solace, VMWare, Vlad Ionescu, Renato Gallis (Renato: do you work for Itau or are you independent?) |
Forgot to mention - the vote ends at the start of next weeks' call |
Currently, an (extension) attribute name isn't limited in any way. This poses problems for Transport Bindings and SDKs if they have to work with any string. Protocols may only support a (subset of) ASCII for names, or some programming languages only allow alphanumeric characters.
In particular for HTTP, Headers are case-insensitive, see #177 and the following discussion on the headaches involved with having to deal with any possible string.
With this PR, I'm proposing to limit attribute names to alphanumeric characters, starting with a letter. This should help fulfill the existing goal to "aid integration with common programming languages".
Transport Bindings and SDKs may represent attribute names differently, but must convert them back without loss (see #177 (comment)).
To show how this will help Transport Bindings, I've fixed #177 and #265 in the HTTP Binding.
PS: While GitHub shows the http file first, I suggest to start with the changes in
spec.md
.