-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0 TODO List #489
Comments
Fix the naming scheme of the collectors where we decide the names rather than creating them mechanically (e.g. node_cpu should be node_cpu_seconds_total). This would be a major breaking change. |
Fortunately we're pre 1.0 :). But yeah, we need some graceful way to deprecate these. Maybe some legacy flag which makes the node-exporter expose both for some time? Or just provide recording rules or relabling? |
For such a large change I don't think a graceful approach is practical. We can update the console/grafana templates to work with both. |
Also move the flags to |
I think the TODO List could be moved to this section projects and be organized as a real list of issues. |
Agreed. |
I had a chat with @discordianfish, he would like to not release |
I think it's risk to go straight to 1.0 when big changes have just been made, it doesn't leave us a lot of wiggle room if we find problems over the next while. We didn't have big changes in Prometheus 1.0 either. |
@brian-brazil There weren't no big breaking changes either, no? |
@brian-brazil That was my thinking as well. The main worry @discordianfish has is that people will just blindly upgrade from 0.15 to 0.16 without reading the breaking changes notes. The idea is that "1.0" will make more people notice. But I'm not sure that's going to really work. |
@discordianfish My current plan was to make the new release |
I can see where he's coming from, but I don't think it's going to reduce confusion much. |
@SuperQ but with a 0.16.0-rc.0 there will be a 0.16.0 eventually.. I don't know, the whole point of doing the metric name changes now was because we wanted to release 1.0 next. I understand it as best practice to bump a major version on such significant changes. Yes, you can say 0.x comes with no guarantees whatsoever but this will hurt people and it will hurt them much more if we do it between 0.15 and 0.16 than between 0.15 and 1.0. We have a giant user base and whatever we can do reasonable to not break them. I don't want to block this, so if you both want to go with 0.16.0 I'll accept that but I really think going with the 1.0-beta is safer. I can't really see a downside there either. |
I'm not a fan of releasing any significant changes in a 1.0 – it just reinforces that a .0 can't be trusted, instead of "this is ready for production". To me, and in my understanding of SemVer, 1.0 is the point of stability, not of big changes. I'd rather release an 0.16.0 and maybe even a n 0.16.1 or 0.17 before re-releasing essentially that version as 1.0. IIRC that's also what we did with Prometheus 1.0. The comparison with 2.0 is wrong IMO, post-1.0 the reasons for major releases are different than the reason to do 1.0. |
Why remove the end-to-end tests? Or do you mean re-implement them in Go? I think there is value in running the actual production binary and checking that it starts and roughly does what we expect before releasing it. |
Well okay then, if I'm the only one believing this bump will help let's go with 0.16.0 then. |
❤️ Thanks for all the comments. |
As someone who just spent an hour fixing broken dashboards and unifying node-exporter versions across my fleet, I would go even further and say that we should have released 1.0 and then only change the metric names in a 2.0 release. I’m really looking forward to breaking changes requiring a major version bump, and I think we are well past the point where a 1.0 makes sense (from an adoption perspective). Thanks for considering, |
@stapelberg Guess that ship is sailed.. But we shouldn't have to break anything before releasing 1.0. Since we're at it, anyone has time to look into #478? I might have time to look into #66 which I think would be great to have before 1.0. |
Tossing in my 2 cents, the change from node_cpu to node_cpu_blah between 15.2 and 16.0 really makes the whole grafana (not your yard I get it )import functionality useless, many of the blogs (primarily using helm) are broken, etc. Keeping node_cpu and having another more specific named metic in parallel for some time allows for migration. While not a 1.x piece of code yet, maintaining depreciation standards will keep from alienating the user base, or worse yet conditioning them to never upgrade no matter the reasoning. |
@james-powis We had to fix those naming problems eventually, better before 1.0 than after. This instability is why we have not declared 1.0 in the first place. There are still a few minor cleanups that are in progress, but now that we have automated testing in place, the naming changes shouldn't cause too much trouble. |
I'd also add that you shouldn't depend on 'latest' tags etc, so if there is a helm chart which isn't pinned to a specific version that should be fixed. |
I want to split cpufreq from CPU collector. It's a bit of a problematic collector due to the interactions with the Kernel. I think that's about it. What about TLS? I guess adding it isn't a breaking change, so it could be in 1.1 |
I would even say TLS should happen after 1.0 – I prefer if a 1.0 is rock solid and proven before it is even released, i.e. essentially a re-branding of the previous release, and without any new and potentially unstable features. |
Ok, I've updated the issue description accordingly.. |
Feel free to edit this and add more points. Filing per @SuperQ 's request. This can probably be turned into a GitHub milestone at some point too.
The text was updated successfully, but these errors were encountered: