Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic conventions for cloud provider-specific resource attributes #1099

Merged
merged 9 commits into from
Nov 25, 2020

Conversation

willarmiros
Copy link
Contributor

Fixes #1098

Changes

This is an implementation of my proposal in #1098 to include a series of resource attributes that are specific to AWS, and not useful as generalized attributes. I believe this proposal is extendable to include attributes from other cloud providers like GCP, Azure, and any other providers involved in OpenTelemetry. I'd appreciate feedback on anything from the location of this within the spec to the naming of attributes.

One thing syntactically I wasn't sure of was how to describe attributes with common prefixes within a group. For example, for aws.ecs.task.arn and aws.ecs.task.family, should I create a new group called aws.ecs.task with 2 attributes (arn and family), or should I simply include task.arn and task.family as attributes within the larger aws.ecs group, as I've done below?

I've also included a new attribute on the cloud group called infrastructure.service. Please see the modified cloud.md for more details. I think that such an attribute would be useful for users who want to quickly identify the type of compute platform a trace/metric is originating from. I've used the syntax that we use in AWS X-Ray, but I am open to modifying it (e.g. removing the cloud provider prefix since that is already known from cloud.provider attribute, or changing the delimiter). Or I can move the attribute to the AWS spec if others feel it doesn't belong in cloud.md.

Related issue: #968

@willarmiros willarmiros requested review from a team October 15, 2020 20:00
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 15, 2020

CLA Check
The committers are authorized under a signed CLA.

@willarmiros
Copy link
Contributor Author

/cc @anuraaga @wangzlei

@@ -13,7 +13,7 @@ groups:
brief: 'Amazon Web Services'
- id: Azure
value: 'azure'
brief: 'Amazon Web Services'
brief: 'Microsoft Azure'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

@@ -37,3 +37,12 @@ groups:
note: >
In AWS, this is called availability-zone.
examples: ['us-central1-a']
- id: infrastructure.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably good to separate this into a separate PR since it's broader reaching than AWS-specific conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will separate it into a new PR with your feedback included.

The first entry should generally be a cloud provider, followed by a product
category, then finally a particular piece of compute infrastructure. Each entry
is delimited by a double colon (::).
examples: ['AWS::EC2::Instance', 'Azure::Compute::VM', 'GCP::ComputeEngine::VM']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS, etc are already in cloud.provider so this should only need something like ec2, aks, cloud run. Of course we'll want insight from a GCP and Azure member for some examples :)

X-Ray uses these extended names but it's the X-Ray exporter that can format the string correctly but not what we'd require in the semantic convention which should just be a precise semantic string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I sorta figured as much. I'll get rid of the delimiters and just specify the compute platform.

Copy link
Member

@thisthat thisthat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution :)
I would suggest adding a README.md file inside the aws folder that contains the list of AWS related semantic conventions, similar to what we have for trace semantic conventions :)

@@ -0,0 +1,34 @@
groups:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more related to logging than AWS in the general sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer I move this to somewhere in the logs spec? The loggroup and logstream fields are specific terms to AWS CloudWatch, but if there are more generic terms that would apply to other logging backends but still convey the same meaning, I'd be happy to switch the wording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with logs, but I guess there will be semantic conventions for it as well!
For the moment, I would suggest changing the id to aws.log to avoid a naming clash with other more general aws semantic conventions.


Attributes that are only applicable to resources from a specific cloud provider. Currently, these
resources can only be defined for providers listed as a valid `cloud.provider` in
[Cloud](./cloud.md). Provider-specific attributes all reside in the `cloud_provider` directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the list of providers for which we have some semantic convention here

Copy link
Contributor Author

@willarmiros willarmiros Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll include a list of valid cloud providers more explicitly. I'll also include the README in the aws directory :)

@willarmiros
Copy link
Contributor Author

Made an additional change to make the AWS log attributes to be string arrays. As I noted in the spec:

Multiple log groups must be supported for cases like multi-container applications, where a single application has sidecar containers, and each write to their own log group.

@willarmiros
Copy link
Contributor Author

@thisthat I've applied your suggestions, please give a second look when available :)

Copy link
Member

@thisthat thisthat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution :)

specification/resource/semantic_conventions/README.md Outdated Show resolved Hide resolved
@anuraaga
Copy link
Contributor

Multiple log groups must be supported for cases like multi-container applications, where a single application has sidecar containers, and each write to their own log group.

@willarmiros is this true for telemetry though? Won't any single app process (Resource) only have a single log group since each sidecar container is a different process? I might be confused on it but I think it's appropriate for these to be singular, any metric resource can only be reporting to a single log group I think.

Co-authored-by: Giovanni Liva <giovanni.liva@dynatrace.com>
@willarmiros
Copy link
Contributor Author

@anuraaga you're right that each container or process has its own log group, and I suppose I worded it as a feature even though it's more of a limitation. For ECS, the problem is that when calling the Task Metadata Endpoint from the collector container, we have no way of identifying which container is the application container. We just get a list of all containers defined for a task, and we get the container the collector is running on. So we can confidently exclude the collector's log group from the list of log groups, but if there are other sidecar containers we can't distinguish them from the main app container.

Similarly, in EC2, we get the log group data from the agent config file, which can have any number of log groups. Again, we can't distinguish which one is the "right" one. This is why log groups on X-Ray segments are also modeled as an array, even though an X-Ray segment is also only supposed to correspond to one resource.

The only workaround us we get some sort of user input, e.g. in config.yaml. However detector-specific config (e.g. configuration for just the ECS resource detector) doesn't exist and even if we added it, it would still be cumbersome to users IMO. At that point, we might as well just have the user input their log group themselves :)

@willarmiros
Copy link
Contributor Author

@anuraaga when you get a chance a second review would be appreciated :)

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions
Copy link

github-actions bot commented Nov 6, 2020

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 6, 2020
@anuraaga
Copy link
Contributor

anuraaga commented Nov 6, 2020

@open-telemetry/specs-approvers Can anyone take a look at this PR? Thanks!

@github-actions github-actions bot removed the Stale label Nov 7, 2020
@tigrannajaryan
Copy link
Member

@open-telemetry/specs-approvers Can anyone take a look at this PR? Thanks!

@anuraaga I believe we need to resolve #1151 which blocks this.

@willarmiros
Copy link
Contributor Author

willarmiros commented Nov 12, 2020

@tigrannajaryan I plan on attending the spec issue scrub tomorrow morning to discuss :) if there's a better time/place to raise this please let me know

@tigrannajaryan
Copy link
Member

@tigrannajaryan I plan on attending the spec issue scrub tomorrow morning to discuss :) if there's a better time/place to raise this please let me know

The spec meeting on Tue may be a better time. Also it would be useful to invite some attention to this issue so that it is discussed offline too. @open-telemetry/specs-approvers please chime in at #1151 if you have thoughts.

@willarmiros
Copy link
Contributor Author

Sounds good. Also to be clear, #1151 blocks #1112 but I don't think it should block this PR. This PR is strictly adding AWS-specific resource fields and adding a place for other cloud-provider-specific attributes to live.

#1112 involves adding a new generic cloud attribute, so understandably in need of more discussion. Regardless I'll plan on attending Tuesday to chat!

| Attribute | Type | Description | Example | Required |
|---|---|---|---|---|
| `aws.ecs.container.arn` | string | The Amazon Resource Name (ARN) of an [ECS container instance](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html). | `arn:aws:ecs:us-west-1:123456789123:container/32624152-9086-4f0e-acae-1a75b14fe4d9` | No |
| `aws.ecs.cluster.arn` | string | The ARN of an [ECS cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html). | `arn:aws:ecs:us-west-2:123456789123:cluster/my-cluster` | No |
Copy link

@hossain-rayhan hossain-rayhan Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get task metadata from ECS Task Metadata Endpoint, it gives us the ClusterName instead of ClusterARN for EC2 Launch Type. Can't we add one for aws.ecs.cluster.name?

Copy link
Contributor Author

@willarmiros willarmiros Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it as cluster.name but, since I was populating the ARN for nearly all other fields and consumers could easily derive cluster name from the ARN, I figured recording the cluster ARN made more sense. On Fargate we are provided the cluster ARN by default, and on EC2 launch type, we have the region and account ID from the task ARN, so you can always construct the cluster ARN like I do here.

All this being said, I can still add a new field for the cluster name if you need it. I just wanted to reduce the possibility for redundant info.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess very few customers care about the whole ARN. Most useful info here is the resource name (ClusterName or TaskID). It's also implies that all the exporters should write their logic to extract resource name from ARN. It's not only extracting the resource name, they need to do other checks before that. If resource attribute map does not have the name fields, extract them and apply. The logic is not super complex but extra work and check.
So, in a sense, we are blocking our customers if any specific exporter doesn't come with these logics.

Also, I am kind of convinced with the idea of less redundant info. I am OK to follow the experts opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: We're ok with this as is for now, and in the future if customers ask for it we can add cluster name, task ID, etc to this spec.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide specification for cloud-provider-specific resource attributes
7 participants