Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kubelet metricsets to share response from endpoint #25782

Merged
merged 8 commits into from
May 28, 2021

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented May 19, 2021

What does this PR do?

Follow up of #25640. This PR changes how kubernetes module handle metricsets which collect metrics from kubelet's API and which share same target endpoint.
Metricsets affected:

  • system
  • pod
  • node
  • volume
  • container

Why is it important?

To improve the performance of the module by avoid fetching same content multiple times.

How to test this PR locally

  1. Deploy Metricbeat on k8s with https://github.com/elastic/beats/blob/master/deploy/kubernetes/metricbeat-kubernetes.yaml using the proper docker image (ie a BC, snapshot etc). Modify the configmaps accordingly using the configs below.
  2. state_* config using leaderelection:
metricbeat.autodiscover:
  providers:
    - type: kubernetes
      scope: cluster
      node: ${NODE_NAME}
      unique: true
      templates:
        - config:
            - module: kubernetes
              hosts: ["kube-state-metrics:8080"]
              period: 10s
              add_metadata: true
              metricsets:
                - state_node
                - state_deployment
                - state_daemonset
                - state_replicaset
                - state_pod
            - module: kubernetes
              hosts: ["kube-state-metrics:8080"]
              period: 10s
              add_metadata: true
              metricsets:
                - state_container
                - state_cronjob
                - state_service
                - state_resourcequota
                - state_statefulset
  1. Kubelet's metricsets' config:
- module: kubernetes
  metricsets:
    - container
    - volume
  period: 10s
  host: ${NODE_NAME}
  hosts: ["https://${NODE_NAME}:10250"]
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ssl.verification_mode: "none" 
- module: kubernetes
  metricsets:
    - node
    - system
    - pod
  period: 10s
  host: ${NODE_NAME}
  hosts: ["https://${NODE_NAME}:10250"]
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ssl.verification_mode: "none"
  1. Verify that all metricsets are being populated (like in the screenshot below) and that k8s metadata are properly attached on the events.

Related issues

Closes #24869

Screenshots

Screenshot 2021-05-20 at 1 47 18 PM

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark added Team:Integrations Label for the Integrations team v7.14.0 kubernetes Enable builds in the CI for kubernetes labels May 19, 2021
@ChrsMark ChrsMark requested a review from jsoriano May 19, 2021 09:46
@ChrsMark ChrsMark self-assigned this May 19, 2021
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 19, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 19, 2021

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Started by user Chris Mark

  • Start Time: 2021-05-27T20:16:45.596+0000

  • Duration: 2 min 6 sec

  • Commit: 8cfa7ed

Trends 🧪

Image of Build Times

Steps errors 1

Expand to view the steps failures

Google Storage Download
  • Took 0 min 0 sec . View more details on here
  • Description: Exception while performing download

Log output

Expand to view the last 100 lines of log output

[2021-05-27T20:16:45.596Z] Started by user Chris Mark
[2021-05-27T20:16:45.596Z] Restarted from build #8, stage Packaging
[2021-05-27T20:16:45.665Z] Connecting to https://api.github.com using 72677/****** (Jenkins - beats-ci)
[2021-05-27T20:16:46.350Z] Connecting to https://api.github.com to check permissions of obtain list of ChrsMark for elastic/beats
[2021-05-27T20:16:46.873Z] Obtained Jenkinsfile from 8cfa7ed2bf21eecba362ab61953a368f948b7666+2ee21d95aef89af7f7e7aef8d07f679a24d690b4 (6c4bb2c85c17c89fba8e542fc55a372d726e9a6c)
[2021-05-27T20:16:46.981Z] Copying 17 artifact(s) from #8
[2021-05-27T20:16:47.170Z] Resume disabled by user, switching to high-performance, low-durability mode.
[2021-05-27T20:17:03.562Z] Still waiting to schedule task
[2021-05-27T20:17:03.562Z] All nodes of label ‘ubuntu-18&&immutable’ are offline
[2021-05-27T20:18:01.801Z] Running on beats-ci-immutable-ubuntu-1804-1622146619733136934 in /var/lib/jenkins/workspace/Beats_beats_PR-25782
[2021-05-27T20:18:01.959Z] �[39;49m[INFO] Override default checkout�[0m
[2021-05-27T20:18:02.054Z] Sleeping for 10 sec
[2021-05-27T20:18:12.248Z] The recommended git tool is: git
[2021-05-27T20:18:17.267Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2021-05-27T20:18:17.274Z] Wiping out workspace first.
[2021-05-27T20:18:17.350Z] Cloning the remote Git repository
[2021-05-27T20:18:17.350Z] Using shallow clone with depth 10
[2021-05-27T20:18:17.350Z] Avoid fetching tags
[2021-05-27T20:18:17.371Z] Cloning repository git@github.com:elastic/beats.git
[2021-05-27T20:18:17.458Z]  > git init /var/lib/jenkins/workspace/Beats_beats_PR-25782 # timeout=10
[2021-05-27T20:18:17.528Z] Fetching upstream changes from git@github.com:elastic/beats.git
[2021-05-27T20:18:17.528Z]  > git --version # timeout=10
[2021-05-27T20:18:17.532Z]  > git --version # 'git version 2.17.1'
[2021-05-27T20:18:17.533Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-05-27T20:18:17.601Z]  > git fetch --no-tags --progress -- git@github.com:elastic/beats.git +refs/heads/*:refs/remotes/origin/* # timeout=15
[2021-05-27T20:18:39.433Z] Cleaning workspace
[2021-05-27T20:18:39.451Z] Using shallow fetch with depth 10
[2021-05-27T20:18:39.451Z] Pruning obsolete local branches
[2021-05-27T20:18:40.428Z] Merging remotes/origin/master commit 2ee21d95aef89af7f7e7aef8d07f679a24d690b4 into PR head commit 8cfa7ed2bf21eecba362ab61953a368f948b7666
[2021-05-27T20:18:39.407Z]  > git config remote.origin.url git@github.com:elastic/beats.git # timeout=10
[2021-05-27T20:18:39.417Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2021-05-27T20:18:39.426Z]  > git config remote.origin.url git@github.com:elastic/beats.git # timeout=10
[2021-05-27T20:18:39.435Z]  > git rev-parse --verify HEAD # timeout=10
[2021-05-27T20:18:39.440Z] No valid HEAD. Skipping the resetting
[2021-05-27T20:18:39.441Z]  > git clean -fdx # timeout=10
[2021-05-27T20:18:39.457Z] Fetching upstream changes from git@github.com:elastic/beats.git
[2021-05-27T20:18:39.457Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-05-27T20:18:39.462Z]  > git fetch --no-tags --progress --prune -- git@github.com:elastic/beats.git +refs/pull/25782/head:refs/remotes/origin/PR-25782 +refs/heads/master:refs/remotes/origin/master # timeout=15
[2021-05-27T20:18:40.483Z]  > git config core.sparsecheckout # timeout=10
[2021-05-27T20:18:40.487Z]  > git checkout -f 8cfa7ed2bf21eecba362ab61953a368f948b7666 # timeout=15
[2021-05-27T20:18:42.393Z] Merge succeeded, producing 99aa2f03ee0859c46e5ec37c04db68fcf9c24c1c
[2021-05-27T20:18:42.394Z] Checking out Revision 99aa2f03ee0859c46e5ec37c04db68fcf9c24c1c (PR-25782)
[2021-05-27T20:18:42.123Z]  > git remote # timeout=10
[2021-05-27T20:18:42.128Z]  > git config --get remote.origin.url # timeout=10
[2021-05-27T20:18:42.134Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-05-27T20:18:42.139Z]  > git merge 2ee21d95aef89af7f7e7aef8d07f679a24d690b4 # timeout=10
[2021-05-27T20:18:42.386Z]  > git rev-parse HEAD^{commit} # timeout=10
[2021-05-27T20:18:42.397Z]  > git config core.sparsecheckout # timeout=10
[2021-05-27T20:18:42.402Z]  > git checkout -f 99aa2f03ee0859c46e5ec37c04db68fcf9c24c1c # timeout=15
[2021-05-27T20:18:47.176Z] Commit message: "Merge commit '2ee21d95aef89af7f7e7aef8d07f679a24d690b4' into HEAD"
[2021-05-27T20:18:47.185Z] First time build. Skipping changelog.
[2021-05-27T20:18:47.185Z] Cleaning workspace
[2021-05-27T20:18:48.233Z] Timeout set to expire in 3 hr 0 min
[2021-05-27T20:18:48.273Z] The timestamps step is unnecessary when timestamps are enabled for all Pipeline builds.
[2021-05-27T20:18:48.369Z] Stage "Checkout" skipped due to this build restarting at stage "Packaging"
[2021-05-27T20:18:48.446Z] Stage "Lint" skipped due to this build restarting at stage "Packaging"
[2021-05-27T20:18:47.179Z]  > git rev-list --no-walk 60f99ea72217ccaedc72ce5b36627044a359c500 # timeout=10
[2021-05-27T20:18:47.187Z]  > git rev-parse --verify HEAD # timeout=10
[2021-05-27T20:18:47.191Z] Resetting working tree
[2021-05-27T20:18:47.192Z]  > git reset --hard # timeout=10
[2021-05-27T20:18:47.335Z]  > git clean -fdx # timeout=10
[2021-05-27T20:18:48.528Z] Stage "Build&Test" skipped due to this build restarting at stage "Packaging"
[2021-05-27T20:18:48.608Z] Stage "Extended" skipped due to this build restarting at stage "Packaging"
[2021-05-27T20:18:49.437Z] [INFO] unstashV2: JOB_GCS_BUCKET is set. bucket param got precedency instead.
[2021-05-27T20:18:49.475Z] [INFO] unstashV2: JOB_GCS_CREDENTIALS is set. credentialsId param got precedency instead.
[2021-05-27T20:18:49.639Z] Stage "Packaging-Pipeline" skipped due to earlier failure(s)
[2021-05-27T20:18:49.751Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-25782/src/github.com/elastic/beats
[2021-05-27T20:18:50.352Z] Running on worker-395930 in /var/lib/jenkins/workspace/Beats_beats_PR-25782
[2021-05-27T20:18:50.509Z] [INFO] getVaultSecret: Getting secrets
[2021-05-27T20:18:50.563Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-05-27T20:18:52.998Z] + chmod 755 generate-build-data.sh
[2021-05-27T20:18:52.998Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-25782/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-25782/runs/9 FAILURE 126002
[2021-05-27T20:18:52.998Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-25782/runs/9/steps/?limit=10000 -o steps-info.json
[2021-05-27T20:18:54.447Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-25782/runs/9/tests/?status=FAILED -o tests-errors.json
[2021-05-27T20:18:55.266Z] Retry 1/3 exited 22, retrying in 1 seconds...
[2021-05-27T20:18:56.713Z] Retry 2/3 exited 22, retrying in 2 seconds...

❕ Flaky test report

No test was executed to be analysed.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark marked this pull request as ready for review May 20, 2021 09:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only some nitpicking.

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Comment on lines 42 to 43
GetSharedFamilies(prometheus p.Prometheus) ([]*dto.MetricFamily, error)
GetSharedKubeletStats(http *helper.HTTP) ([]byte, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. I am thinking now that this being shared is an implementation detail, interface and consumers don't care if the response is shared or cached.

Suggested change
GetSharedFamilies(prometheus p.Prometheus) ([]*dto.MetricFamily, error)
GetSharedKubeletStats(http *helper.HTTP) ([]byte, error)
GetFamilies(prometheus p.Prometheus) ([]*dto.MetricFamily, error)
GetKubeletStats(http *helper.HTTP) ([]byte, error)

ChrsMark and others added 2 commits May 27, 2021 15:53
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

/package

@ChrsMark
Copy link
Member Author

Failing tests seem to be unrelated (hitting issue reported at #25956). Merging this one.

@ChrsMark ChrsMark merged commit a39dd00 into elastic:master May 28, 2021
ChrsMark added a commit to ChrsMark/beats that referenced this pull request May 28, 2021
ChrsMark added a commit that referenced this pull request May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kubernetes Enable builds in the CI for kubernetes Team:Integrations Label for the Integrations team v7.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor kubernetes metricsets to use single calls per endpoint
3 participants