Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[awsemfexporter] Implement metric filtering and dimension setting #1503

Merged

Conversation

kohrapha
Copy link
Contributor

@kohrapha kohrapha commented Nov 5, 2020

Description

Problem

Currently, the EMF Exporter does not provide any configuration for defining the dimensions for exported metrics or which metrics should be exported. All incoming metrics are exported with their label set as the exported dimension. A few problems arise, including:

  1. Unwanted metrics/dimensions resulting in increased cost for the customer
  2. CloudWatch metrics can only support at max 10 dimensions. By exporting all labels, some metrics will have a dimension set of more than 10 dimensions.

Solution

This PR solves both issues by defining a new metric_declarations (optional) attribute that allows the user to define the dimensions for exported metrics and filter by metric names. Specifically, the metric_declarations is a list of metric_declarations with the following API spec:

<metric_declaration>
Each metric declaration characterizes a rule to be used to set dimensions for certain incoming metrics, filtered by their metric names.

# List of dimension sets (which are lists of dimension names) to be included
# in exported metrics. If the metric does not contain any of the specified
# dimensions, the metric would be dropped (will only show up in logs).
dimensions: [ - [ - <string> ] ]

# List of regex strings to be matched against metric names to determine which
# metrics should be included with this metric declaration rule.
metric_name_selectors: [ - <string> ]

Example config

exporters:
  awsemf:
    log_group_name: 'awscollector-test'
    region: 'us-west-2'
    log_stream_name: metric-declarations
    dimension_rollup_option: 'NoDimensionRollup'
    metric_declarations:
    - dimensions: [['Service', 'Namespace'], ['pod_name', 'container_name']]
      metric_name_selectors:
      - '^go_memstats_alloc_bytes_total$'
    - dimensions: [['app_kubernetes_io_component', 'Namespace'], ['app_kubernetes_io_name'], ['Invalid', 'Namespace']]
      metric_name_selectors:
      - '^go_goroutines$'
    - dimensions: [['Namespace', 'Service', 'Namespace']]
      metric_name_selectors:
      - '^go_mem.+$'

Note that the metric_declarations attribute is optional. If this section is left out of the config file, the exporter defaults to exporting all labels as a dimension set, as is done currently. Each incoming metric is matched against each metric declaration rule. If it matches at least one of the metric_name_selectors, the corresponding rule is applied. Each dimension set specified by the rule is applied to the exported metric if and only if the metric has all the dimensions in that dimension set.

All of the dimension sets across all metric declarations are aggregated together and applied to the metric. In other words, if a metric matches multiple rules, all valid dimensions specified by those rules will be put into a single list of dimension sets. Additionally, any rolled-up dimensions are also added to this set. De-duplication is applied across the final list of dimension sets.

De-duplication functionality
We also help the user perform de-duplication of dimensions. 2 dimension sets are considered duplicate if they contain the same set of unique dimensions (order is irrelevant). There are 3 cases for duplicated dimensions:

  1. Within a single dimension set, i.e. ['dim1', 'dim2', 'dim1'] -> ['dim1', 'dim2']
  2. Within a single metric declaration rule, i.e. [['dim1', 'dim2'], ['dim2', 'dim1']] -> [['dim1', 'dim2']]
  3. Across metric declarations, i.e. if a metric matches multiple metric declaration rules. In this case, case 2 is applied since we aggregate all dimension sets into 1 list.

Documentation

The documentation for the AWS EMF Exporter was updated to include the new metric_declarations attribute and its API spec.

Testing

  • Unit tests were added for each new function and ran successfully
  • Manual E2E testing was done and logs/metrics were successfully exported to CloudWatch backend. Specifically, given the config defined above, we got the following CloudWatch logs:
  1. 1 matching rule (w/ 1 unmatched dimension), go_goroutines:
{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "metric-declarations",
                "Dimensions": [
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ],
                    [
                        "app_kubernetes_io_name"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_goroutines",
                        "Unit": ""
                    }
                ]
            }
        ],
        "Timestamp": 1604431600752
    },
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_goroutines": 93,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-69-5.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-mlc8k"
}
  1. 1 matching rule (w/ dedup), go_memstats_alloc_bytes:
{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "metric-declarations",
                "Dimensions": [
                    [
                        "Namespace",
                        "Service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_alloc_bytes",
                        "Unit": "By"
                    }
                ]
            }
        ],
        "Timestamp": 1604431600752
    },
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_memstats_alloc_bytes": 6938768,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-69-5.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-mlc8k"
}
  1. 2 matching rules w/ dedup), go_memstats_alloc_bytes_total:
{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "metric-declarations",
                "Dimensions": [
                    [
                        "Namespace",
                        "Service"
                    ],
                    [
                        "container_name",
                        "pod_name"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_alloc_bytes_total",
                        "Unit": ""
                    }
                ]
            }
        ],
        "Timestamp": 1604431600752
    },
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_memstats_alloc_bytes_total": 0,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-69-5.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-mlc8k"
}
  1. No matching rule, go_info:
{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_info": 1,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-69-5.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-mlc8k",
    "version": "go1.15.2"
}

@codecov
Copy link

codecov bot commented Nov 5, 2020

Codecov Report

Merging #1503 (2c31065) into master (8109852) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1503      +/-   ##
==========================================
+ Coverage   89.24%   89.29%   +0.04%     
==========================================
  Files         354      355       +1     
  Lines       17417    17501      +84     
==========================================
+ Hits        15544    15627      +83     
- Misses       1395     1397       +2     
+ Partials      478      477       -1     
Flag Coverage Δ
integration 71.00% <ø> (ø)
unit 87.93% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
exporter/awsemfexporter/emf_exporter.go 98.07% <100.00%> (+0.16%) ⬆️
exporter/awsemfexporter/factory.go 100.00% <100.00%> (ø)
exporter/awsemfexporter/metric_declaration.go 100.00% <100.00%> (ø)
exporter/awsemfexporter/metric_translator.go 94.76% <100.00%> (+0.46%) ⬆️
receiver/prometheusexecreceiver/receiver.go 85.83% <0.00%> (-2.50%) ⬇️
receiver/k8sclusterreceiver/watcher.go 97.64% <0.00%> (+2.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8109852...2c31065. Read the comment docs.

@kohrapha
Copy link
Contributor Author

kohrapha commented Nov 5, 2020

cc: @mxiamxia @hdj630

@kohrapha
Copy link
Contributor Author

kohrapha commented Nov 9, 2020

@anuraaga @james-bebbington could you help review this PR? Thanks :)

exporter/awsemfexporter/emf_exporter.go Outdated Show resolved Hide resolved
func (m *MetricDeclaration) Init(logger *zap.Logger) (err error) {
// Return error if no metric name selectors are defined
if len(m.MetricNameSelectors) == 0 {
return errors.New("invalid metric declaration: no metric name selectors defined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user inputs an incorrect config, is this the log message they will see? Is it enough information to fix the config issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would see the message "Dropped metric declaration. Error: invalid metric declaration: no metric name selectors defined.". I think this is sufficient as it tells the user that a metric has been dropped due to no metric name selectors defined. I'm open to suggestions on how to make this better though

for _, dimSet := range m.Dimensions {
concatenatedDims := strings.Join(dimSet, ",")
if len(dimSet) > 10 {
logger.Warn("Dropped dimension set: > 10 dimensions specified.", zap.String("dimensions", concatenatedDims))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, these warnings only happen on startup? Even so we might use debug logging, I think collector generally leans towards less logging messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, this only happens on startup. In this case, would it be ok to use Warn since it's only once on startup? I feel that this information is useful for the user to fix in case they don't enable Debug level logging.

@anuraaga
Copy link
Contributor

Just some small comments, thanks

@bogdandrutu
Copy link
Member

@kohrapha I see that you need to make some small updates, ping me when things are done.

@kohrapha
Copy link
Contributor Author

@bogdandrutu Made the changes as response to @anuraaga's comments and rebased off the new changes from master. We're good to merge, and if there are any strong opinions on the other comments, we can make the changes afterwards.

@bogdandrutu bogdandrutu merged commit 6b97426 into open-telemetry:master Nov 13, 2020
shaochengwang pushed a commit to shaochengwang/opentelemetry-collector-contrib that referenced this pull request Nov 13, 2020
…en-telemetry#1503)

* Create MetricDeclaration struct

* Implement dimension dedup logic when adding rolledup dimensions (open-telemetry#6)

* Implement dimension dedup logic when adding rolledup dimensions

* Remove duplicated dimensions in metric declaration

* Create benchmark for filtering with metric declarations and use assertCwMeasurementEqual for tests

* Move helper test code to the top of file

* Update dimRollup test

* Aggregate dimensions and perform dedup

* Minor code change

* Fix test failure from merging new changes
ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.35.2 to 1.36.0.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](golangci/golangci-lint@v1.35.2...v1.36.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants