Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test data and data types for awsecscontainermetricsreceiver #1044

Merged
merged 5 commits into from
Sep 21, 2020

Conversation

hossain-rayhan
Copy link
Contributor

Description:
This change adds metadata and types to support metric conversion logic for awsecscontainermetricsreceiver. The receiver gets the task metadata endpoint response and generates OT metrics from that.

Linking to existing issue
#457

Testing: Unit tests added.

Documentation: README file

@hossain-rayhan hossain-rayhan requested a review from a team September 17, 2020 01:02
@hossain-rayhan hossain-rayhan changed the title Add data types to support metrics calculation Add test data and data types for awsecscontainermetricsreceiver Sep 17, 2020
@codecov
Copy link

codecov bot commented Sep 17, 2020

Codecov Report

Merging #1044 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1044      +/-   ##
==========================================
+ Coverage   88.81%   88.85%   +0.04%     
==========================================
  Files         250      251       +1     
  Lines       11924    11952      +28     
==========================================
+ Hits        10590    10620      +30     
+ Misses        990      989       -1     
+ Partials      344      343       -1     
Flag Coverage Δ
#integration 75.42% <ø> (ø)
#unit 87.87% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...nermetricsreceiver/awsecscontainermetrics/label.go 100.00% <100.00%> (ø)
receiver/k8sclusterreceiver/watcher.go 97.64% <0.00%> (+2.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bff19a2...d7ba9d5. Read the comment docs.

Comment on lines +17 to +18
// ContainerStats defines the structure for container stats
type ContainerStats struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and the other structs for parsing the response from ECS Task Metadata?

If so... would it be a bad idea to import the structs from the ECS Agent? The ECS Agent is in Go and uses go code to generate the output that you are processing.

I guess importing that code might not be wise... ECS Agent isn't meant to be used as a library. Still you could add a link to those structures as a reference. Or a link to the ECS Task Metadata docs explaining where all of this comes from.

I think its here and here:

Looks like they actually mostly just import from Docker though... would it be a good idea to import the docker structs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this! Agree to not import from a package not meant for consumption as a library, the ECS agent. But do definitely reuse docker structs where you can that's nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good idea. However, our docker-stats json structure is little bit different. It has an extra item (block) for network rate metrics which is added by ECS agent. In future, we will have another block of data for storage read/write metrics. So, creating a simple struct with the fields we care about only might be a straight-forward solution.

"networkspersec": {
                "eth0": [
                    {
                        "timestamp": "2020-06-17T21:45:26.381091349Z",
                        "rx_bytes_per_sec": 20,
                        "tx_bytes_per_sec": 25
                    },
                    {
                        "timestamp": "2020-06-17T21:45:26.481091349Z",
                        "rx_bytes_per_sec": 10,
                        "tx_bytes_per_sec": 55
                    }

                ]
          }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm @hossain-rayhan you don't intend to import from the docker api types? I see most of these are pointers, which docker doesn't use, so I understand coercing them to match your usage may be undesired in the long run.

Also, if these are largely taken from the ECS agent or related lib, some attribution would be helpful/appreciated.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from issue of sharing structs with upstream libraries where possible and one about leaving out labels from this receiver, generally looks fine to me. Thanks!

metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1"
)

func containerLabelKeysAndValues(cm ContainerMetadata) ([]*metricspb.LabelKey, []*metricspb.LabelValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add labels in the receiver? We usually add resource information in a processor, such as here

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/master/processor/k8sprocessor/kube/kube.go#L104

I think this would allow the same resource information to be added both to these metrics, and any other metrics / traces received by the collector.

Copy link

@PettitWesley PettitWesley Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for ECS, not k8s.

Also, while Task labels could be added by a ECS Metadata processor, I am not sure container labels could be, since by the time you reach the processor you lose information about which container is which (unless you add labels that identify the container, in which case, you might as well just add all of them in the receiver directly).

Also, another point- I am not sure the receiver can spit out container specific metrics without labels that identify which container they came from anyway. If you have 3 containers, and collect metrics for each, but don't differentiate them with something, then you are actually doing an aggregation across all the containers. At least that is my understanding.

I'll finally add that we've discussed this a bunch internally, and I think Rayhan has discussed it a bit externally as well. This implementation has been thought out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should keep them as close as to the source (producer). Also, it will help to keep the config simple for customers. Customer will configure the receiver and good to go.

I saw some other receivers did the same (like kubelestatsreceiver, radisreceiver and statsdreceiver).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for ECS, not k8s.

Yeah but k8s is an example with a presumably very similar pattern :)

Also, while Task labels could be added by a ECS Metadata processor

From my understanding, it's not a could be but a when will it be - aren't we planning on adding an ECS processor so the information is added to data published by the application as well to allow correlation between ecs metrics and app metrics?

container labels

Understood on the container labels - they do seem important to add in the receiver for those reasons. So yeah, the worst case is processor is also enabled and a small amount of extra overhead populating task information twice, but not a big deal.

I'll finally add that we've discussed this a bunch internally, and I think Rayhan has discussed it a bit externally as well. This implementation has been thought out.

This isn't helpful because

  1. Internal discussions don't provide value when working in OSS. Going forward, it'll be good to move all related design discussions to GitHub or public Slack or gitter as well to faciliate AWS involvement in OpenTelemetry or other OSS
  2. An implementation being thought out doesn't mean rubberstamped reviews or allow avoiding probing questions during reviews. It's what code reviews are there for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't helpful because

Fair point 👍

it's not a could be but a when will it be - aren't we planning on adding an ECS processor so the information is added to data published by the application as well to allow correlation between ecs metrics and app metrics?

I'm still confused by that actually. I heard the plan is that the OT SDKs will add the metadata themselves, and so there's not a huge need for the processor. I was told that doing it in the SDK is better- from within the container a call to Task Metadata can tell you which container you are a part of, so you can add the container specific labels. A collector processor would have trouble with that since once it reaches the collector you potentially lose the knowledge of which container in the task the application is.

I was told though that there might be some value in having a processor for other types of metrics/traces not from the OT SDKs. Like OpenCensus or Prometheus, since those won't have a built-in integration with ECS Metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, the worst case is processor is also enabled and a small amount of extra overhead populating task information twice, but not a big deal.

I agree. Let's start with the minimum (putting them into the receiver for now). If we feel a strong need to add a processor, we can do that later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately the collector processor can't add docker container labels since the OTel collector is in a different container. But as long as its a sidecar it should be able to add the rest of the ECS labels. The current state is that the SDKs add container labels, which are standard since they're just docker - this can be provided by OTel in a generic way.

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/container.md

They don't provide ECS-specific labels, mainly since they're not defined in the spec and aren't an SDK concern. We can add SDK extensions for ECS - and in the very long term, it makes sense to have SDK plugins for ECS labels too to support running the collector as a service instead of a sidecar. We see in practice that there is overlap between resource detection logic in the collector, like for GCP/EC2, and what's in the SDKs, primarily since collector as a sidecar provides an approach for underdeveloped or future languages while SDK implementations are more flexible in terms of deployment.

In the shorter term, if we restrict the conversation to sidecar as a first iteration, then implementing ECS labels as a processor would allow using from any app regardless of language, until ECS plugins could be added for each language too. For example, because Java and JS at least already populate the container info, if the collector has a processor for ECS tasks and above granularity, both languages would get the full set of ECS metadata automatically. So I think it makes sense to have a processor as the first implementation of ECS task metadata enrichment.

Hopefully that provides some overall context - but to avoid confusion, it's indeed unrelated to this PR as I realized that indeed it makes sense to additionally add them in the receiver :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through dockerstat receiver. They are adding labels to the metrics for each container: close to the source as mentioned by @hossain-rayhan

  1. ContainerStatsToMetrics:
  2. updateConfiguredResourceLabels:
    func updateConfiguredResourceLabels(md *consumerdata.MetricsData, container *DockerContainer, config *Config) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, these are associated with individual container stat requests so attaching them later would be generally impractical. Host correlation or similar features not tied to metric generation would occur in a relevant processor though.

metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1"
)

func containerLabelKeysAndValues(cm ContainerMetadata) ([]*metricspb.LabelKey, []*metricspb.LabelValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for ECS, not k8s.

Yeah but k8s is an example with a presumably very similar pattern :)

Also, while Task labels could be added by a ECS Metadata processor

From my understanding, it's not a could be but a when will it be - aren't we planning on adding an ECS processor so the information is added to data published by the application as well to allow correlation between ecs metrics and app metrics?

container labels

Understood on the container labels - they do seem important to add in the receiver for those reasons. So yeah, the worst case is processor is also enabled and a small amount of extra overhead populating task information twice, but not a big deal.

I'll finally add that we've discussed this a bunch internally, and I think Rayhan has discussed it a bit externally as well. This implementation has been thought out.

This isn't helpful because

  1. Internal discussions don't provide value when working in OSS. Going forward, it'll be good to move all related design discussions to GitHub or public Slack or gitter as well to faciliate AWS involvement in OpenTelemetry or other OSS
  2. An implementation being thought out doesn't mean rubberstamped reviews or allow avoiding probing questions during reviews. It's what code reviews are there for

labelKeys := make([]*metricspb.LabelKey, 0, 3)
labelValues := make([]*metricspb.LabelValue, 0, 3)

labelKeys = append(labelKeys, &metricspb.LabelKey{Key: "ecs.container-name"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

labelKeys := make([]*metricspb.LabelKey, 0, 3)
labelValues := make([]*metricspb.LabelValue, 0, 3)

labelKeys = append(labelKeys, &metricspb.LabelKey{Key: "container.name"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told using the convention constants directly where applicable is preferred: https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/conventions/opentelemetry.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Used the default convention. Also, transferred some of my values to a constant file.

@@ -0,0 +1,106 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is/can this test data being used anywhere in these changes or existing structure? Not immediately clear why it's being introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it was a big PR. Later we decided to to split it into 3 PRs to make the review process faster. In this PR we are mainly adding the structs and test data. This is being used by our code which are coming with next PR.
Yea, we can debate but I think it's fine.

@rmfitzpatrick
Copy link
Contributor

Unfamiliar with ECS subtleties, this looks good to me. Just a few nits/questions, mainly around the test data and potentially required attribution(s).

DockerName: "docker-container-1",
}
k, v := containerLabelKeysAndValues(cm)
require.EqualValues(t, 3, len(k))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would having more detailed label content checks be helpful with these cases or will they be prone to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as we will always have correct values if we come to this point. Also, I made a little change- moved the hardcoded length to a constant file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I added some minor checks.

}

func getTaskIDFromARN(arn string) string {
if !strings.HasPrefix(arn, "arn:aws") || arn == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if there's a convention in having the empty check last, as it would potentially avoid a function call if it were first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I shuffled their position, thanks.

@hossain-rayhan
Copy link
Contributor Author

Thanks @rmfitzpatrick for your review. I pushed an update based on your feedback.

@hossain-rayhan
Copy link
Contributor Author

@tigrannajaryan Can we get it merged?

@bogdandrutu bogdandrutu merged commit ba5fced into open-telemetry:master Sep 21, 2020
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
* Replace `WithSyncer` with `WithBatcher` in examples

* update CHANGELOG

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
codeboten pushed a commit that referenced this pull request Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants