Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restricting character set for attribute names #321

Merged
merged 3 commits into from
Nov 1, 2018

Conversation

clemensv
Copy link
Contributor

@clemensv clemensv commented Oct 2, 2018

At least week's meeting I was asked to create PR as a follow-up to my oral comment on #317

I appreciate the work done on that PR and the attempt to preserve case-sensitivity through an HTTP hop where the actors might perform case-folding for any reason, including semantically pointless aesthetics, due to HTTP being case-insensitive. That PR having to introduce complex rules is an issue with the main spec, and not something the HTTP mapping ought to fix, IMO.

This PR puts a constraint on the attribute name character set, only allowing lower-case letters and digits, with a leading lower-case letter. I am also including the rationale in the spec.

In effect, we are not taking a stance on case-sensitive or case-insensitive handling, but we rather eliminate the problem by only permitting lower-case letters for normatively named attributes.

While we're at it, I'm also adding a wire-footprint-greedy SHOULD clause regarding length.

To make up for the loss of the word separation by case, it would be great to have some kind of word separator available in addition to lower-case letters and digits, but dots and dashes and underscores clash with identifier rules in various languages/protocols. I don't think the result is all that terrible and the wire doesn't care about aesthetics.

If an SDK wants to "right case" identifiers for its clients, it's free to do so. A C# SDK ought to use PascalCase for strongly typed identifiers in order to comply with language conventions, and therefore "eventtime" ought to surface as "EventTime". The rules we make here must avoid that an implementer gets confused about what comes across and goes on the wire.

I have only modified the core spec for the purpose of the initial discussion. Specs that explicitly refer to attribute names will also have to change accordingly, of course, and I will amend the PR if we decide to go this route.

@clemensv clemensv changed the title Attribute names Restricting character set for attribute names Oct 2, 2018
spec.md Outdated
languages. Some of these treat metadata elements as case-sensitive while others
do not, and a single CloudEvent might be routed via multiple hops that involve
a mix of protocols, encodings, and runtimes. Therefore, this specification
limits the available character set of all normatively named attributes such that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should drop "normatively named" because these rules would apply to extensions too, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clemensv thoughts on this one?

@Tapppi
Copy link
Contributor

Tapppi commented Oct 3, 2018

I understand the need for clarification, and like the simplicity of this approach. However, I think this approach simplifies the spec by shifting the cost to usability. Practically speaking this will force keys to be lowercased everywhere, which will not play well with many programming languages and will feel seriously out of place. I worry about the inertia on adoption this approach might cause.

If an SDK wants to "right case" identifiers for its clients, it's free to do so. A C# SDK ought to use PascalCase for strongly typed identifiers in order to comply with language conventions, and therefore "eventtime" ought to surface as "EventTime". The rules we make here must avoid that an implementer gets confused about what comes across and goes on the wire.

I'm doubtful about the practicality of this approach and think it's a false promise. If the spec mandates all lowercase charactes with no word separator, each possible word would have to be known to the SDK for it to correctly apply word separation. That is impossible, because:

  • extensions
    Imagine an extension property bucket companySpecificLabels with key goodTeamLabel. The SDK would need not only the extension spec but also any possible words present in keys before surfacing the event attributes. EventTime next to companyspecificlabels.goodteamlabel does not sound fun.
  • attributes from new versions of the spec would be lowercased but would then change casing when upgrading SDK, Duglin put it well in this comment:

    Keep in mind that I don't think this is just about extensions, I think all CE properties will need to be encoded the same way as extensions. This is because to a v1.0 consumer, any new v1.1 properties will be viewed as extensions.

@tenczar
Copy link

tenczar commented Oct 4, 2018

I'm doubtful about the practicality of this approach and think it's a false promise. If the spec mandates all lowercase charactes with no word separator, each possible word would have to be known to the SDK for it to correctly apply word separation. That is impossible, because:

  • extensions
    Imagine an extension property bucket companySpecificLabels with key goodTeamLabel. The SDK would need not only the extension spec but also any possible words present in keys before surfacing the event attributes. EventTime next to companyspecificlabels.goodteamlabel does not sound fun.
  • attributes from new versions of the spec would be lowercased but would then change casing when upgrading SDK, Duglin put it well in this comment:

    Keep in mind that I don't think this is just about extensions, I think all CE properties will need to be encoded the same way as extensions. This is because to a v1.0 consumer, any new v1.1 properties will be viewed as extensions.

The point of this change is that on the wire the attributes are always lower case. Period.
For extensions the SDK will not even attempt to apply word separation, you can retrieve extensions by their all lower case name from whatever map-like struct the SDK holds them in.
We also don't need to care when promoting an extension because the wire format is still all lowercase. A new SDK version can provide a helper that can retrieve the extension with a name that matches the language's conventions, but that wrapper is going to grab the name from the all lower case representation.

@Tapppi
Copy link
Contributor

Tapppi commented Oct 4, 2018

The call was very informative. I realise #321 (comment) might not be a problem, other than losing any possibility of casing all identifiers, which is kind of the point anyway but affects some languages.

A new SDK version can provide a helper that can retrieve the extension with a name that matches the language's conventions, but that wrapper is going to grab the name from the all lower case representation.

As I said in the call, my concern is with dynamically typed languages, where an idiomatic way to access the event (dynamic property access, e.g. event.myEventExtension) wouldn't necessarily distinguish between core spec fields and extensions.

For extensions the SDK will not even attempt to apply word separation

To me extensions and core fields being idiomatically cased and not separated would've seemed a natural progression after extensions were moved out of a bag to the top-level, but I I jumped to conclusions 👍

Copy link
Contributor

@cneijenhuis cneijenhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #317 (mainly to get a discussion started), but I also support this PR. Both have their tradeoffs, but they solve the same issue (in different ways).

Words that are acronyms are written in all-caps, e.g. "ID" and "URL".
The CloudEvents specifications define mappings to various protocols and
encodings, and the accompanying CloudEvents SDK targets various runtimes and
languages. Some of these treat metadata elements as case-sensitive while others
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that a rationale is provided. I think it could be extended to include a rationale for ASCII/alpha-numeric characters, and the length limit.

spec.md Outdated
case-sensitivity issues or clashes with the permissible character set for
identifiers in common languages are prevented.

CloudEvents attribute names MUST consist of lower-case letter ('a' to 'z')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/letter/letters

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clemensv please fix this one during your rebase - it should be non-controversial.

"otherValue": 5
"eventid" : "A234-1234-1234",
"eventtime" : "2018-04-05T17:31:00Z",
"comexampleextension1" : "value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly 20 characters 😉

I'm wondering if, with the 20 character limit, we should "examplify" the use of a reverse URL. Even for this rather short TLD, 10 characters are taken. "commicrosoft" is 12 chars, "comserverless" is 13 chars...

I don't have a good alternative suggestion, though.

@duglin
Copy link
Collaborator

duglin commented Oct 18, 2018

On today's call we discussed this topic and narrowed our choices down to this PR and #317. We then decided to start a vote between the two. Please add a comment to this PR to vote regardless of which PR you would like to vote for (if you did not vote already during today's call). The vote ends at the start of next week's call.

Voting members are: Adobe, Alibaba, Confluent, ControlBEAM, Fujitsu, Google, Huawei, IBM, Itau Unibanco, Microsoft, NAIC, NATS/Synadia, Nordstrom, Oracle, PayPal, Red Hat, SAP, Serverless, Solace, VMWare, Vlad Ionescu, Tapani Moilanen, Renato Gallis (Renato: do you work for Itau or are you independent?)

On today's call, the following companies voted:
MSFT: 321
PayPal: 321
Nordstrom: 321
Google: 321
Itau: 317

@rperelma
Copy link
Contributor

Adobe: 321

@Vlaaaaaaad
Copy link

I'm voting for 321 too.

@ColinSullivan1
Copy link
Contributor

Synadia: 321

@deissnerk
Copy link
Contributor

SAP: 321

@duglin
Copy link
Collaborator

duglin commented Oct 24, 2018

IBM: 321

@NaohiroTamura
Copy link

Fujitsu: 321

@duglin
Copy link
Collaborator

duglin commented Oct 29, 2018

Per the 10/25 call, finally tally is:
321: MSFT, PayPal, Nordstrom, Google, Vlad, Synadia, SAP, IBM, Fujitsu, Adobe
317: Itau,

So, #317 will be closed and @clemensv will rebase this PR - then we'll merge.

Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
All [CloudEvents][CE] attributes with exception of `contentType` and `data`
MUST be individually mapped to and from the AMQP
All [CloudEvents][CE] attributes with exception of `contenttype` and `data`
are individually mapped to and from the AMQP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you drop the "MUST"?

All [CloudEvents][CE] attributes with exception of `contentType` and `data`
MUST be individually mapped to and from distinct HTTP message headers,
All [CloudEvents][CE] attributes with exception of `contenttype` and `data`
are individually mapped to and from distinct HTTP message headers,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no MUST?

@duglin duglin added the v0.2 label Oct 31, 2018
Clemens Vasters added 2 commits October 31, 2018 17:31
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
@duglin
Copy link
Collaborator

duglin commented Oct 31, 2018

LGTM

I'd like to get at least one more set of eyes on this to make sure we didn't miss something before we merge it.

Copy link
Contributor

@cneijenhuis cneijenhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(I still think we should provide better guidance/examples on how to provide descriptive extension names with a limit of 20 characters. The reverse url "wastes" too much characters quickly. But we can do that in a follow-on PR)

@duglin
Copy link
Collaborator

duglin commented Nov 1, 2018

Approved on 11/1 call.

@duglin duglin merged commit 044d6aa into cloudevents:master Nov 1, 2018
@duglin duglin mentioned this pull request Nov 30, 2018
@ottomata
Copy link

ottomata commented Feb 6, 2019

I've just heard of CloudEvents, so am very new to this. At the Wikimedia Foundation, we are undergoing a similar (internal) process to standardize our event schemas. I'd like it if we could conform to some larger standard rather than one we make up for ourselves.

I appreciate the restriction of lower case here; often our events are imported into case-insensitive SQL based systems, and we also have issues with case insensitive HTTP headers too. Lowercasing everything helps. However, @clemensv, you wrote

To make up for the loss of the word separation by case, it would be great to have some kind of word separator available in addition to lower-case letters and digits, but dots and dashes and underscores clash with identifier rules in various languages/protocols.

Where do underscores clash? I'm sure this spec has to be compatible with many more languages and protocols than I'm familiar with, but off the top of my head I don't know of one. I understand that you are trying to just standardize 'wire' protocols here, but I doubt event data will only be used for async service communication. They are often also used for analytics purposes where they are imported into databases and need to be readable by humans. Underscores would make this much saner.

cneijenhuis added a commit to cneijenhuis/spec that referenced this pull request Mar 20, 2019
Signed-off-by: Christoph Neijenhuis <christoph.neijenhuis@commercetools.de>
duglin pushed a commit that referenced this pull request Mar 21, 2019
Extensions follow attribute naming scheme introduced in #321
mikehelmick added a commit to mikehelmick/eventing that referenced this pull request Jun 24, 2019
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.
mikehelmick added a commit to mikehelmick/eventing that referenced this pull request Jun 24, 2019
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.
knative-prow-robot pushed a commit to knative/eventing that referenced this pull request Jun 24, 2019
* Fix issue with broker dropping messages with "No TTL Seen"

* Treat the broker TTL attribute as case insensitive, in line with how
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.

* Remove unnecessary test change.
grantr pushed a commit to grantr/eventing that referenced this pull request Jun 25, 2019
)

* Fix issue with broker dropping messages with "No TTL Seen"

* Treat the broker TTL attribute as case insensitive, in line with how
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.

* Remove unnecessary test change.
grantr pushed a commit to grantr/eventing that referenced this pull request Jun 25, 2019
* Fix issue with broker dropping messages with "No TTL Seen"

* Treat the broker TTL attribute as case insensitive, in line with how
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.

* Remove unnecessary test change.
knative-prow-robot pushed a commit to knative/eventing that referenced this pull request Jun 25, 2019
* Fix issue with broker dropping messages with "No TTL Seen"

* Treat the broker TTL attribute as case insensitive, in line with how
cloudevents attributes should be handled. See
  cloudevents/spec#321
for context.

* Remove unnecessary test change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.