Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict character set (keep case-sensitivity/camelCase), dashes for HTTP case-insensitivity #317

Closed

Conversation

cneijenhuis
Copy link
Contributor

Currently, an (extension) attribute name isn't limited in any way. This poses problems for Transport Bindings and SDKs if they have to work with any string. Protocols may only support a (subset of) ASCII for names, or some programming languages only allow alphanumeric characters.

In particular for HTTP, Headers are case-insensitive, see #177 and the following discussion on the headaches involved with having to deal with any possible string.

With this PR, I'm proposing to limit attribute names to alphanumeric characters, starting with a letter. This should help fulfill the existing goal to "aid integration with common programming languages".
Transport Bindings and SDKs may represent attribute names differently, but must convert them back without loss (see #177 (comment)).

To show how this will help Transport Bindings, I've fixed #177 and #265 in the HTTP Binding.

PS: While GitHub shows the http file first, I suggest to start with the changes in spec.md.

Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
* `eventID` maps to `CE-EventID`
* `cloudEventsVersion` maps to `CE-CloudEventsVersion`
* `eventTime` maps to `CE-Event-Time`
* `eventID` maps to `CE-Event-I-D`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't beautiful - does someone has a good suggestion for mapping abbreviations? Alternatively, we could define an exception for eventID.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards special casing well-known abbreviations - e.g. -ID, -URL, -URI... and just listing them in the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin do you have any reservations about the fact that we would have to list every possible well-known abbreviation, since adding them to later versions would be a breaking change? IMHO the Event-I-D is understandable and "good enough" without introducing cruft around huge lists of "reserved words", which would presumably be in flux at least before v1.0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup - that's a real concern and I'm not thrilled with pretty much defining one list for the lifetime of v1.0. So, Event-I-D might be the best choice if we do head down this path - if nothing else its at least consistent, which I do like :-)

* `eventTime` maps to `CE-EventTime`
* `eventID` maps to `CE-EventID`
* `cloudEventsVersion` maps to `CE-CloudEventsVersion`
* `eventTime` maps to `CE-Event-Time`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CE-Event-Time is now more inline with the HTTP Convention (e.g. Content-Type is also separated by a dash) than the previous `CE-EventTime.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone mentioned on the call the redundancy of the word "event", does it make things a bit cleaner to omit that from properties?
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e: time maps to ce-time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did this, we should probably do it for all properties that have the word "event" in them - like EventID.

However, I kind of like having it there even if it does seem redundant in some cases. Mainly because it avoids the risk of it overlapping with some other property. When we talked about future properties or extensions we always talked about how people should pick good name to avoid potential conflicts. To me that means avoiding really generic terms that could be used for other purposes. I think "Time" would fall into that category since there could be many different "times" added as properties. Keeping "Event" in there (to me) scopes it to a CloudEvent property - where we've (hopefully) clearly defined its semantics/meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think it is a valid concern that can be discussed, I'm not sure if this PR is the best place to do so - @PerryBirch what do you think about opening a separate issue (or PR) instead?

I think this PR can be approved or rejected independently of that.

spec.md Outdated
@@ -170,6 +174,9 @@ event consumers can easily access this information without needing to decode
and examine the event data. Such identity attributes can also be used to
help intermediate gateways determine how to route the events.

Extension Attributes MUST follow the [Attribute Naming
Convention](spec.md#attribute-naming-convention).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor thing, but I think the "spec.md" is unnecessary in this link reference.

@@ -163,16 +163,23 @@ Except for attributes [explicitly handled in this specification]
HTTP header mapping of well-known CloudEvents attributes is that
each attribute name MUST be prefixed with "CE-".

Note: per the [HTTP](https://tools.ietf.org/html/rfc7230#section-3.2)
specification, header names are case-insensitive. Upper-case letters are
prefixed with a dash ("-"). When converting a HTTP Header Name back, the dash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh geez - that's neat/tricky! :-)

Note: per the [HTTP](https://tools.ietf.org/html/rfc7230#section-3.2)
specification, header names are case-insensitive. Upper-case letters are
prefixed with a dash ("-"). When converting a HTTP Header Name back, the dash
converts the following letter back to upper-case, all other letters are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except the first one, right? since its camelCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks! 310c06f

CloudEvents attributes use "camelCasing" for the object member names, to aid
integration with common programming languages.
To aid integration with common programming languages, CloudEvents attributes
MUST only contain alpha-numeric characters, and MUST start with a letter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people need to use underscores or dashes in their names? I think those would be the two most likely problem characters for people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to address this here.

My thinking is that e.g. the Python SDK would want to use the conventional snake_case. So, with the Python SDK, one may want to publish a CloudEvent like this:

ce.event_type = "..."
ce.event_id = "..."
ce.some_extension = "..."

Given that everything is snake_case, it is only natural to use it for an extension name.

In e.g. the JavaScript SDK, I'd like to consume the CloudEvent in camelCase.

console.log(ce.eventType);
console.log(ce.eventId);
console.log(ce.someExtension);

However, the conversion can not be performed without (potential) loss if both someExtension and some_extension are valid names. E.g. if the JavaScript SDK sends a CloudEvent with bothsome_extension and someExtension, which one does the Python ce.some_extension refer to?

In a way, disallowing underscores and dashes in the CloudEvents convention allows them to be used in other conventions (e.g. Python or HTTP).

It is quite restrictive though, and it is the main reason I was a bit hesitant to open this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but isn't ce.event_id problematic? If abbreviations are all-capitals in camelCase, the automatic conversion doesn't really work without a huge "reserved words" abbreviation list, as eventID would become event_i_d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the SDK would have to define a special mapping for ID in general or eventID specifically.

@duglin
Copy link
Collaborator

duglin commented Sep 26, 2018

overall, this is a really interesting approach. I liked how you related it back to HTTP's use of dashes.

Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
@cneijenhuis cneijenhuis force-pushed the attribute-names-alphanumeric branch from bb9c8bb to 310c06f Compare September 26, 2018 10:39
"CE-", an infix reflecting the map attribute followed by a dash
("-"), and the name of the map entry key, e.g. "CE-attrib-key".
"CE-", an infix reflecting the map attribute followed by a plus
("+"), and the name of the map entry key, e.g. "CE-attrib+key".
Copy link
Member

@n3wscott n3wscott Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps rather than a +, we add a new CEAttrib- or CEA-? or maybe CE+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess the example is too short.

It is about map attributes, e.g.

{
  "aMap": {
    "firstAttribute": ...,
    "anotherMap": {
      "secondAttribute": ...
    }
  }
}

That would map to:

CE-A-Map+First-Attribute
CE-A-Map+Another-Map+Second-Attribute

E.g. CEAttrib-A-Map-First-Attribute won't allow me to deconstruct the map.

Does that make sense - or did I not get the point of your comment @n3wscott ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a higher-level question... why are we transforming maps at all? By that I mean, why isn't the value of the property (the map) just put into the HTTP as a string like all other properties? If someone really wanted each nested property as a stand-alone http header then they can create a new top-level extension property.

is:

{
 "myPets": {
    "dog": "fido",
    "cat": "lilly"
  }
}

really any better than:

{
  "myPet-dog": "fido",
  "myPet-cat": "lilly"
}

I'm not suggesting that we ban maps or anything, but if people are using the binary/raw format then there's a good chance that performance is one of their reasons, and adding additional processing to deconstruct maps seems to be contradictory to that goal.

If people really do need the JSON nesting, they're free to do so, but then let's not try to find a way to map all possible JSON objects into HTTP header names - that sounds like a nightmare. People can just parse the JSON/string themselves by putting the map as the HTTP header value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid question, I'd also be fine with that. I can revert my changes to the two lines, as it isn't the main point of this PR, and we could open up an independent PR to address #265 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I wouldn't change anything yet until we hear from others - basing any decision on my warped POV is risky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards deconstructing maps, and think + is a good separator, mainly due to fears of passing web server header size limits if serialising to the value. Apache has a default limit of 8K.

Also I'd be hesitant to make assumptions about perf of destructuring into flat key-value pairs vs. serialisation of huge maps into JSON without some benchmarks. With deconstructed maps deserialising doesn't need JSON parsing.

@clemensv
Copy link
Contributor

clemensv commented Oct 2, 2018

See #321 as alternate proposal

@cneijenhuis cneijenhuis changed the title Alphanumeric Attribute Names for better (HTTP) interoperability Restrict character set (keep case-sensitivity/camelCase), dashes for better HTTP case-insensitivity Oct 11, 2018
@cneijenhuis cneijenhuis changed the title Restrict character set (keep case-sensitivity/camelCase), dashes for better HTTP case-insensitivity Restrict character set (keep case-sensitivity/camelCase), dashes for HTTP case-insensitivity Oct 11, 2018
@cneijenhuis
Copy link
Contributor Author

@clemensv Our proposals have quite a few similarities in the spec.md. Is it fine if I copy over parts of your text? I think your text is nicer, and I guess it makes it easier for others to review the differences between the three proposals.

@duglin
Copy link
Collaborator

duglin commented Oct 18, 2018

On today's call we discussed this topic and narrowed our choices down to this PR and #321. We then decided to start a vote between the two. Please go to #321 to vote regardless of which PR you would like to vote for.

Voting members are: Adobe, Alibaba, Confluent, ControlBEAM, Fujitsu, Google, Huawei, IBM, Itau Unibanco, Kyyti Group, Microsoft, NAIC, NATS/Synadia, Nordstrom, Oracle, PayPal, Red Hat, SAP, Serverless, Solace, VMWare, Vlad Ionescu, Renato Gallis (Renato: do you work for Itau or are you independent?)

@duglin
Copy link
Collaborator

duglin commented Oct 18, 2018

Forgot to mention - the vote ends at the start of next weeks' call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extension names become case insensitive in binary encoding
6 participants