Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Metrics Overhaul #1206

Closed
brancz opened this issue Aug 7, 2019 · 61 comments · Fixed by #1935
Closed

Kubernetes Metrics Overhaul #1206

brancz opened this issue Aug 7, 2019 · 61 comments · Fixed by #1935
Assignees
Labels
sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team

Comments

@brancz
Copy link
Member

brancz commented Aug 7, 2019

Enhancement Description

This is a cleanup so there are no stability milestones involved, however, to not break hard immediately, SIG Instrumentation is doing its best effort to inform about these changes in various ways as follows:

  1. Alpha release target 1.16
  • Stability framework is in place with metric verification/validation running
    in CI.
  • Metrics which are deprecated in the metrics overhaul are marked as deprecated,
    which can be overridden in a binary through a command line flag
  • No metrics can be marked as stable.
  1. Beta release target 1.17
  • All previously marked deprecated metrics will be removed from the codebase.
  • Metrics can be marked as stable.
  1. Stable release target 1.18
  • First release cycle in which stable metrics may be deprecated as per the new stability guidelines.

@logicalhan @serathius @piosz @ehashman

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2019
@brancz brancz added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Aug 7, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2019
@kacole2
Copy link

kacole2 commented Aug 7, 2019

@brancz the graduation criteria in the KEP needs to be more detailed on what make it move between stages. The current is very vague.

@lachie83 @mrbobbytables @rbitia @mariantalla @evillgenius75

@ehashman
Copy link
Member

ehashman commented Aug 7, 2019

@kacole2 graduation criteria section is vague because this KEP is basically a collection of individual tasks that are graduated once completed.

As @brancz summarized in the first comment, everything has landed except for the full deprecation of the inconsistent labels, which is proposed for the 1.16 release, and removal of deprecated metrics targeted for 1.17. Once those are complete the KEP will be fully implemented. Does it need to be updated to say as much?

@kacole2
Copy link

kacole2 commented Aug 12, 2019

/milestone v1.16
/stage alpha

@k8s-ci-robot k8s-ci-robot added the stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status label Aug 12, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 12, 2019
@kacole2 kacole2 added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Aug 12, 2019
@kacole2
Copy link

kacole2 commented Aug 12, 2019

@brancz @ehashman is there a certain stage this should be labeled? I added alpha, but not sure if this is beta or moving to GA.

@brancz
Copy link
Member Author

brancz commented Aug 12, 2019

This is “just” a large scale cleanup that spans multiple releases, there’s not really any stability level around it. Not sure how to answer that question.

@brancz
Copy link
Member Author

brancz commented Aug 12, 2019

Talked to some folks, we're merging #1209 into this issue as the umbrella issue (they're actually already related as the current issue description says, we'll be updating the issue and comment here again).

@mrbobbytables
Copy link
Member

This is somewhat of a personal opinion but I'd consider this "beta" with the plan to fully deprecate in 1.17 as "stable".

The alpha/beta/stable designation doesn't always align with the effort being done =/ For these I tend to think of it like this:
Alpha == work just beginning, long road ahead -- possibly several releases.
Beta == work in progress, most effort being done here and may span multiple cycles.
Stable == work wrapping up.

@kacole2
Copy link

kacole2 commented Aug 13, 2019

@brancz it looks like there is some disconnect over at #1209. Can we find a time to gather all relevant parties to get this sorted out?

@lachie83

@brancz
Copy link
Member Author

brancz commented Aug 19, 2019

Sorry for taking a bit. I edited the issue to reflect alpha/beta/stable timelines and tasks.

@simplytunde
Copy link

@brancz I am 1.16 Doc Lead. We need a placeholder PR against k/website(dev-1.16 branch) for this enhancement before Friday, Aug 23rd. Let me know how I can help to make this happen or if doc is not required.

@ehashman
Copy link
Member

I'm happy to jump on the doc PR if needed this week, where do the docs need to be updated @simplytunde? Do they just need to reflect the timeline/information included in the description of this issue?

@simplytunde
Copy link

@ehashman I do not have enough context on this to make decision on where/what docs needs to be updated. Lets bring it up on sig-instrumentation.

@kacole2
Copy link

kacole2 commented Aug 26, 2019

@ehashman @brancz code freeze for 1.16 is on Thursday 8/29. Looks like all PRs listed have been merged for Alpha. If there are any more that need to be tracked, please let me know!

@ehashman
Copy link
Member

ehashman commented Aug 26, 2019

@kacole2 we have one more PR coming for showing hidden metrics (as defined in the metrics stability KEP), per

Metrics which are deprecated in the metrics overhaul are marked as deprecated, which can be overridden in a binary through a command line flag

I'm working on that right now, should be able to have it up before code freeze. I think everything else is merged.

Edit: WIP PR link: kubernetes/kubernetes#81970

@logicalhan
Copy link
Member

logicalhan commented Aug 26, 2019

These also need to be merged:

  1. Kubelet migration
  2. Apiserver migration
  3. Controller-manager migration
  4. Kube-proxy migration
  5. Scheduler migration

@RainbowMango
Copy link
Member

@brancz @ehashman @logicalhan
I'd like to join the task (metric overhaul, stability, validation, etc).

Kubernetes 1.17 will remove the in 1.14 marked as deprecated metrics. As a stretch goal, if the metrics stability framework is in place, then in Kubernetes 1.17 the metrics will only be turned off by default through the stability framework. Should this not be available, then the metrics will be removed.

I guess we can start this task after 1.16 release. Where can I find the list of deprecated metrics?

@RainbowMango
Copy link
Member

/assign

@brancz
Copy link
Member Author

brancz commented Sep 10, 2019

There won’t be removals as the framework components landed in 1.16 and flags are in progress. That means they’ll just be turned off by default for 1.17 and only truly removed in 1.18.

I would recommend to join the sig instrumentation slack channel and or sig meetings to get involved! :)

@RainbowMango
Copy link
Member

Yeah, I know the metrics stability framework is in place now.
Thanks @brancz , I will try to join sig meeting.

@mrbobbytables
Copy link
Member

Hey there @brancz @ehashman -- 1.17 Enhancements lead here. I know it's still kind of fuzzy what each stage defines 😬 but I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

Thanks!

/milestone clear

@jeremyrickard jeremyrickard added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team and removed tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team labels Jan 15, 2020
@jeremyrickard
Copy link
Contributor

@RainbowMango @brancz could we please have the KEP for this updated with Test Plan info? It looks like we didn't do that in the 1.17 time frame and we should have. I'm going to remove this from the milestone for now, but you can file an exception request and we can add this back in. The KEP just needs to have test data added to it.

@jeremyrickard
Copy link
Contributor

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.18 milestone Jan 29, 2020
@jeremyrickard jeremyrickard added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Jan 29, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2020
@palnabarun
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2020
@palnabarun
Copy link
Member

Hi @RainbowMango -- 1.19 Enhancements Lead here, I wanted to check in if you think this enhancement would graduate in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

@RainbowMango
Copy link
Member

  • The KEP PR must be merged in an implementable state
  • The KEP must have test plans
  • The KEP must have graduation criteria.

@palnabarun Seems the enhancement need to supply test plans. What exactly is it? Is there any document about this?

@palnabarun
Copy link
Member

Hi @RainbowMango, thank you for the update.

For the test plans, you can have a look at this KEP template for the exact requirement: https://raw.githubusercontent.com/kubernetes/enhancements/master/keps/NNNN-kep-template/README.md

Also, one quick question, which graduation stage would you be targeting in 1.19?

@palnabarun
Copy link
Member

@RainbowMango -- pinging back as a reminder of the above. 🙂

@palnabarun
Copy link
Member

Hi @RainbowMango,

Tomorrow, Tuesday May 19 EOD Pacific Time is Enhancements Freeze

Will this enhancement be part of the 1.19 release cycle?

@RainbowMango
Copy link
Member

Will this enhancement be part of the 1.19 release cycle?

The legacy changes of this KEP will not introduce a user-facing change, So, I guess you can ignore this KEP.

@palnabarun
Copy link
Member

@RainbowMango -- Thanks for the update. I have updated the tracking sheet accordingly. 👍

@ehashman
Copy link
Member

/assign

@ehashman
Copy link
Member

This work was basically completed in the 1.17 release. I'll update the KEP as needed in order to close out this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.