Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.0 TODO List #489

Closed
2 of 3 tasks
mdlayher opened this issue Mar 2, 2017 · 26 comments
Closed
2 of 3 tasks

1.0 TODO List #489

mdlayher opened this issue Mar 2, 2017 · 26 comments

Comments

@mdlayher
Copy link
Contributor

mdlayher commented Mar 2, 2017

Feel free to edit this and add more points. Filing per @SuperQ 's request. This can probably be turned into a GitHub milestone at some point too.

@brian-brazil
Copy link
Contributor

Fix the naming scheme of the collectors where we decide the names rather than creating them mechanically (e.g. node_cpu should be node_cpu_seconds_total). This would be a major breaking change.

@discordianfish
Copy link
Member

Fortunately we're pre 1.0 :). But yeah, we need some graceful way to deprecate these. Maybe some legacy flag which makes the node-exporter expose both for some time? Or just provide recording rules or relabling?

@brian-brazil
Copy link
Contributor

For such a large change I don't think a graceful approach is practical. We can update the console/grafana templates to work with both.

@gouthamve
Copy link
Member

Also move the flags to -- instead of - (and CLI to https://github.com/alecthomas/kingpin?)

@mjtrangoni
Copy link
Contributor

I think the TODO List could be moved to this section projects and be organized as a real list of issues.

@mdlayher
Copy link
Contributor Author

Agreed.

@SuperQ
Copy link
Member

SuperQ commented Mar 9, 2018

I had a chat with @discordianfish, he would like to not release 0.16.0 with the big breaking changes for the metric names. He proposes we release the big breaking metric name change as 1.0.0-beta.0. What do people think?

@brian-brazil
Copy link
Contributor

I think it's risk to go straight to 1.0 when big changes have just been made, it doesn't leave us a lot of wiggle room if we find problems over the next while. We didn't have big changes in Prometheus 1.0 either.

@discordianfish
Copy link
Member

@brian-brazil There weren't no big breaking changes either, no?
The new storage was introduced as 2.0 because it's breaking. Similar concerns here. Ideally we would backport the fixes without the naming changes for a 0.16.0 release but nobody has time for this, so the next best thing is imo to use a 1.0.0-beta.0. People can use this but it's clear that there was changes that make it important to read the change log.

@SuperQ
Copy link
Member

SuperQ commented Mar 9, 2018

@brian-brazil That was my thinking as well. The main worry @discordianfish has is that people will just blindly upgrade from 0.15 to 0.16 without reading the breaking changes notes. The idea is that "1.0" will make more people notice. But I'm not sure that's going to really work.

@SuperQ
Copy link
Member

SuperQ commented Mar 9, 2018

@discordianfish My current plan was to make the new release 0.16.0-rc.0 and not go directly to release.

@brian-brazil
Copy link
Contributor

But I'm not sure that's going to really work.

I can see where he's coming from, but I don't think it's going to reduce confusion much.

@discordianfish
Copy link
Member

@SuperQ but with a 0.16.0-rc.0 there will be a 0.16.0 eventually..

I don't know, the whole point of doing the metric name changes now was because we wanted to release 1.0 next. I understand it as best practice to bump a major version on such significant changes. Yes, you can say 0.x comes with no guarantees whatsoever but this will hurt people and it will hurt them much more if we do it between 0.15 and 0.16 than between 0.15 and 1.0. We have a giant user base and whatever we can do reasonable to not break them.

I don't want to block this, so if you both want to go with 0.16.0 I'll accept that but I really think going with the 1.0-beta is safer. I can't really see a downside there either.

@matthiasr
Copy link
Contributor

I'm not a fan of releasing any significant changes in a 1.0 – it just reinforces that a .0 can't be trusted, instead of "this is ready for production". To me, and in my understanding of SemVer, 1.0 is the point of stability, not of big changes. I'd rather release an 0.16.0 and maybe even a n 0.16.1 or 0.17 before re-releasing essentially that version as 1.0.

IIRC that's also what we did with Prometheus 1.0. The comparison with 2.0 is wrong IMO, post-1.0 the reasons for major releases are different than the reason to do 1.0.

@matthiasr
Copy link
Contributor

Why remove the end-to-end tests? Or do you mean re-implement them in Go? I think there is value in running the actual production binary and checking that it starts and roughly does what we expect before releasing it.

@discordianfish
Copy link
Member

Well okay then, if I'm the only one believing this bump will help let's go with 0.16.0 then.

@SuperQ
Copy link
Member

SuperQ commented Mar 9, 2018

❤️ Thanks for all the comments.

@stapelberg
Copy link

I don't know, the whole point of doing the metric name changes now was because we wanted to release 1.0 next. I understand it as best practice to bump a major version on such significant changes. Yes, you can say 0.x comes with no guarantees whatsoever but this will hurt people and it will hurt them much more if we do it between 0.15 and 0.16 than between 0.15 and 1.0. We have a giant user base and whatever we can do reasonable to not break them.

As someone who just spent an hour fixing broken dashboards and unifying node-exporter versions across my fleet, I would go even further and say that we should have released 1.0 and then only change the metric names in a 2.0 release.

I’m really looking forward to breaking changes requiring a major version bump, and I think we are well past the point where a 1.0 makes sense (from an adoption perspective).

Thanks for considering,

@discordianfish
Copy link
Member

@stapelberg Guess that ship is sailed.. But we shouldn't have to break anything before releasing 1.0.

Since we're at it, anyone has time to look into #478? I might have time to look into #66 which I think would be great to have before 1.0.

@james-powis
Copy link

james-powis commented Oct 10, 2018

Tossing in my 2 cents, the change from node_cpu to node_cpu_blah between 15.2 and 16.0 really makes the whole grafana (not your yard I get it )import functionality useless, many of the blogs (primarily using helm) are broken, etc. Keeping node_cpu and having another more specific named metic in parallel for some time allows for migration. While not a 1.x piece of code yet, maintaining depreciation standards will keep from alienating the user base, or worse yet conditioning them to never upgrade no matter the reasoning.

@SuperQ
Copy link
Member

SuperQ commented Oct 10, 2018

@james-powis We had to fix those naming problems eventually, better before 1.0 than after. This instability is why we have not declared 1.0 in the first place. There are still a few minor cleanups that are in progress, but now that we have automated testing in place, the naming changes shouldn't cause too much trouble.

@discordianfish
Copy link
Member

I'd also add that you shouldn't depend on 'latest' tags etc, so if there is a helm chart which isn't pinned to a specific version that should be fixed.

@discordianfish
Copy link
Member

I don't think #478 should block this. I think we should just get 1.0 out asap.
@SuperQ Do you still have something that really needs to be done before 1.0?

@SuperQ
Copy link
Member

SuperQ commented Dec 19, 2018

I want to split cpufreq from CPU collector. It's a bit of a problematic collector due to the interactions with the Kernel.

I think that's about it.

What about TLS? I guess adding it isn't a breaking change, so it could be in 1.1

@matthiasr
Copy link
Contributor

I would even say TLS should happen after 1.0 – I prefer if a 1.0 is rock solid and proven before it is even released, i.e. essentially a re-branding of the previous release, and without any new and potentially unstable features.

@discordianfish
Copy link
Member

Ok, I've updated the issue description accordingly..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants