Skip to content

Adds whitelisting of docker labels [#1730] #1735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Adds whitelisting of docker labels [#1730] #1735

wants to merge 8 commits into from

Conversation

cirocosta
Copy link

Hey,

going after #1730, the PR adds two flags:

  • -docker_no_labels: forces no retrieval of labels
  • -docker_label_whitelist: whitelists labels to be retrieved in case docker_no_labels is set

The first flag is useful not only for a proper implementation of the second but also for those that don't need labels at all but receive them in push-based systems (like InfluxDB).

Do you think it makes sense to keep them separate? I wondered if we could force docker_no_labels via docker_label_whitelist="" (empty slice), but that didn't feel very right (I prefer the explicitness of having both).

Also, should I add an integration test for this behavior? Should I add a handler test?

Thanks!


Without docker_no_labels, we expect receiving all the labels:

container_tasks_state{container_label_com_docker_compose_config_hash="92efcf747964251a5581db4f85ed470d74c4f7362294627a351bf0b1b5ff1964",container_label_com_docker_compose_container_number="1",container_label_com_docker_compose_oneoff="False",container_label_com_docker_compose_project="influx",container_label_com_docker_compose_service="influxdb",container_label_com_docker_compose_version="1.15.0",id="/docker/a0d5ca740222dec2d0d17b04b77aa14a0c123b6a8d272c38d9e5c8896f76df74",image="influxdb:alpine",name="influx_influxdb_1",state="uninterruptible"} 0

With docker_no_labels, 0 labels:

container_tasks_state{id="/docker/a0d5ca740222dec2d0d17b04b77aa14a0c123b6a8d272c38d9e5c8896f76df74",image="influxdb:alpine",name="influx_influxdb_1",state="iowaiting"} 0

With docker_no_labels and docker_label_metadata_whitelist=com.docker.compose.service we expect receiving only the label whitelisted:

container_tasks_state{container_label_com_docker_compose_service="influxdb",id="/docker/a0d5ca740222dec2d0d17b04b77aa14a0c123b6a8d272c38d9e5c8896f76df74",image="influxdb:alpine",name="influx_influxdb_1",state="iowaiting"} 0

With docker_label_metadata_whitelist but no docker_no_labels we expect receiving all labels:

container_tasks_state{container_label_com_docker_compose_config_hash="92efcf747964251a5581db4f85ed470d74c4f7362294627a351bf0b1b5ff1964",container_label_com_docker_compose_container_number="1",container_label_com_docker_compose_oneoff="False",container_label_com_docker_compose_project="influx",container_label_com_docker_compose_service="influxdb",container_label_com_docker_compose_version="1.15.0",id="/docker/a0d5ca740222dec2d0d17b04b77aa14a0c123b6a8d272c38d9e5c8896f76df74",image="influxdb:alpine",name="influx_influxdb_1",state="uninterruptible"} 0

@k8s-ci-robot
Copy link
Collaborator

Hi @cirocosta. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cirocosta cirocosta changed the title Adds whitelisting of docker labels Adds whitelisting of docker labels [#1730] Aug 28, 2017
@dashpole
Copy link
Collaborator

/ok-to-test

@dashpole
Copy link
Collaborator

It seems that prometheus has a similar problem: #1704
I dont think it would be very hard to generalize this to other runtimes (RKT), although you dont need to implement it here.
WDYT about naming them enforce_label_whitelist (default=false) and label_whitelist (default=[]).

@dashpole
Copy link
Collaborator

If we only had one flag, how would we differentiate between "send all labels" and "send no labels"? I think having two is fine.

@dashpole
Copy link
Collaborator

In docker/handler.go:GetSpec(), we should check to see if we are enforcing the whitelist and if "restartcount" is in the whitelist before adding "restartcount" as a label"

@cirocosta
Copy link
Author

cirocosta commented Aug 29, 2017

Oh, I totally missed restartcount, nice catch!

About the naming (label_whitelist, ...), I like the suggestion!

However, I think it only makes sense if we include rkt in the PR 🤔 Going to add the check for restartcount and then check the rkt code 👍

Thx!


(PR not ready yet)

Ciro S. Costa added 8 commits September 8, 2017 15:19
This commit adds two extra flags to 'docker':
  - docker_label_metadata_whitelist:
    indicates a set of labels to be retrieved from the containers
    when 'no_labels' is specified.
  - docker_no_labels:
    forces no fetching of docker container labels.
As 'noLabels' is meant to be enforced for all labels,
'restartcount' (a label) must also be checked against
the set of labels whitelisted in such case.
@cirocosta
Copy link
Author

Hey @dashpole, sorry for the delay, only found time today; could you please take a look once again?

I also updated the 'docs' document related to runtime configs, do you think the format is appropriate?

I'm having some problems manually testing the rkt part, it's my first time using it so I'm not sure if I'm doing something completely wrong 🤔

I ran a container using

 rkt run --insecure-options=image docker://nginx:alpine --user-label=foo=bar

and also initiated api-service:

rkt api-service --listen=localhost:15441
api-service: API service starting...
api-service: Listening on localhost:15441
api-service: API service running

then when I start cadvisor, api-service shows:

2017/09/08 15:47:08 grpc: Server.Serve failed to create ServerTransport:  connection error: desc = "transport: write tcp 127.0.0.1:15441->127.0.0.1:49326: write: broken pipe"

which leads to an error in cadvisor logs:

I0908 15:47:09.007274   17800 manager.go:320] Recovery completed
I0908 15:47:09.007400   17800 rkt.go:56] starting detectRktContainers thread
I0908 15:47:09.076635   17800 cadvisor.go:159] Starting cAdvisor version: v0.27.0.19+33f84c4934366a-dirty-33f84c4 on port 8080
W0908 15:47:19.014873   17800 handler.go:105] couldn't find app machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope in pod
E0908 15:47:19.014928   17800 helpers.go:131] ReadFile failed, couldn't read /var/lib/rkt/pods/run/2d9c1b0f-5004-44f8-9716-6938106bd2c4/appsinfo/machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope/treeStoreID to get upper dir: open /var/lib/rkt/pods/run/2d9c1b0f-5004-44f8-9716-6938106bd2c4/appsinfo/machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope/treeStoreID: no such file or directory
W0908 15:47:19.016649   17800 manager.go:1117] Failed to process watch event {EventType:0 Name:/user.slice/user-0.slice/session-1.scope/machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope/system.slice/nginx.service WatchSource:1}: this should be impossible!, new handler failing, but factory allowed, name = /user.slice/user-0.slice/session-1.scope/machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope/system.slice/nginx.service
I0908 15:47:19.016666   17800 container.go:471] Failed to update stats for container "/user.slice/user-0.slice/session-1.scope/machine-rkt\x2d2d9c1b0f\x2d5004\x2d44f8\x2d9716\x2d6938106bd2c4.scope": stat failed on  with error: no such file or directory, continuing to push stats

am I doing something wrong here?

Thx!

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, and dont worry about the RKT implementation for now.

// retrieve desired labels
for _, exposedLabel := range labelWhitelist {
labelValue, present := ctnr.Config.Labels[exposedLabel]
if !present {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: condense this
if labelValue, present := ctnr.Config.Labels[exposedLabel]; present {
handler.labels[exposedLabel] = labelValue
}

@@ -53,6 +53,8 @@ import (

var globalHousekeepingInterval = flag.Duration("global_housekeeping_interval", 1*time.Minute, "Interval between global housekeepings")
var logCadvisorUsage = flag.Bool("log_cadvisor_usage", false, "Whether to log the usage of the cAdvisor container")
var enforceLabelWhitelist = flag.Bool("enforce_label_whitelist", false, "enforces label whitelisting")
var labelWhitelist = flag.String("label_whitelist", "", "comma-separated list of label keys that are allowed to be collected for docker and rkt containers")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont specify the runtime here: ... "collected for containers"

@@ -39,6 +39,16 @@ From [glog](https://github.com/golang/glog) here are some flags we find useful:
--vmodule=: comma-separated list of pattern=N settings for file-filtered logging
```

## Labels

Both Docker and Rkt labels support restricting the amount of labels collected by `cadvisor`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit runtimes here as well: " cadvisor can restrict the set of labels it collects from containers. This can be useful in ensuring metrics have a consistent set of labels."

@dashpole
Copy link
Collaborator

dashpole commented Sep 19, 2017

I am not sure. I would not worry about the rkt integration (dont make any changes there). Once this is merged, ill open an issue to track that these flags do not have rkt or cri-o support.

@fbuecklers
Copy link

fbuecklers commented Feb 10, 2018

Any chances that this comes up into master?

The Patch is awesome, we used it to monitor our swarm cluster.
The following metrics are reduced from our cadvisors:
Memory consumption 250MB -> 80MB
Used CPU from 20% -> 0.5-1%
/metrics response time: 20s -> ~ 3s
/metrics response size from 25MB -> 3MB

@chrisboulton
Copy link

@dashpole, @cirocosta - I'd be super interested in these changes as well, because we heavily rely on labels on our containers for a bunch of other reasons which aren't really applicable to Prometheus monitoring. If it's just a matter of addressing those nitpicks (and resolving the recent merge conflicts), I'd be happy to get that done, sign the CLA, and contribute changes back. Was that all you'd be looking to address with this or are there other changes made of late that should be considered?

@dashpole
Copy link
Collaborator

closing in favor of #1984

@dashpole dashpole closed this Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants