Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask versioning to switch to CalVer #100

Closed
mrocklin opened this issue Oct 6, 2020 · 46 comments
Closed

Dask versioning to switch to CalVer #100

mrocklin opened this issue Oct 6, 2020 · 46 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented Oct 6, 2020

Dask is planning to switch to Calendar versioning (calver)

This has been discussed in several issues (like #93 ) and was met with fairly unanimous consent in the last community meeting. If there are major objections we're still happy to hear them, but at this point I think that the default choice is to move to CalVer rather than wait-and-see.

This issue is here to serve as a general announcement, and for logistics about the move. I would expect this change to happen within the next month.

@mrocklin
Copy link
Member Author

mrocklin commented Oct 6, 2020

I suggest that we use the versioning scheme YY.MM.DD, so if we were to release today it would be version

>>> dask.__version__
"20.10.06"

@jakirkham
Copy link
Member

jakirkham commented Oct 6, 2020

What about doing YYYY.MM.DD? This is typically what I see projects using CalVer do. For example certifi

@djhoese
Copy link

djhoese commented Oct 6, 2020

What about a patch release in the same day?

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Whatever we choose we'll want to make sure it's pep440 compatible. Rules here: https://www.python.org/dev/peps/pep-0440/#public-version-identifiers

@jrbourbeau
Copy link
Member

Yeah, following up on @djhoese point, it doesn't happen often, but we might publish a release, discover a critical bug and want to publish a fixed release the same day. What about YY.MM.XX where XX is a zero-padded increment for the releases that month? This seems similar to what Ubuntu and Twisted do

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

What about YY.MM.XX where XX is a zero-padded increment for the releases that month?

That sounds good to me, but makes it a bit harder to automate the versioning (we'd have to check some central registry to see what the last release was or something). Using a date is much simpler, since the version is based on the date alone. Speaking of this, we'd need to transition from versioneer to some other solution, preferably one that prevents duplicating version info between setup.py and __init__.py.

@gforsyth
Copy link

gforsyth commented Oct 6, 2020

potential small issue (that we can figure a way around), I don't think you can have a zero-pad between . since pep440 uses int() to parse some stuff.

Could do YYYYMMXX / YYMMXX (which is admittedly a bit harder to read)

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Seems to work fine for me:

In [3]: from distutils.version import StrictVersion

In [4]: StrictVersion("20.01.01")
Out[4]: StrictVersion ('20.1.1')

We also don't need to zero pad at all, since the . delimiter already handles splitting between year/month/point release.

@gforsyth
Copy link

gforsyth commented Oct 6, 2020

In [4]: StrictVersion("20.01.01")
Out[4]: StrictVersion ('20.1.1')

Sorry, I wasn't clear. I meant the zero-pad won't survive pep440, which is probably not a huge deal.

@martindurant
Copy link
Member

If it doesn't cause a problem, I would prefer zero-padding, so that string sort and number sort do the same thing.

@mrocklin
Copy link
Member Author

mrocklin commented Oct 6, 2020

What about doing YYYY.MM.DD? This is typically what I see projects using CalVer do. For example certifi

Some examples of YY.MM.DD include Ubuntu and pip. I think it comes down to whether we want users to identify our version as a date, or hide the calendar nature of the versioning.

@mrocklin
Copy link
Member Author

mrocklin commented Oct 6, 2020

What about a patch release in the same day?

We can always bend time and travel to the future by a day if we need to.

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

If it doesn't cause a problem, I would prefer zero-padding, so that string sort and number sort do the same thing.

That's what using StrictVersion/LooseVersion is for. Even our existing versioning scheme doesn't allow for string sorting of versions. I'd prefer to not zero-pad if possible, as the zeros don't add anything meaningful, and don't follow pep440.

@martindurant
Copy link
Member

I'd prefer to not zero-pad if possible, as the zeros don't add anything meaningful, and don't follow pep440.

I don't insist, and that's good enough reason not to... but you do occasionally end up dealing with git tag strings or pip installable blob files

@pradyunsg
Copy link

puts on "I know waaaay too much about the Python packaging standards" hat

20.01.01 / 2020.01.01 are valid PEP 440 versions.

>>> from packaging.version import Version
>>> v = Version("2020.01.01")

The fact that it does not error out here means that it's a valid PEP 440 version. The only thing that you may want to worry about here is that the "canonical" form of that version isn't the same as the version itself:

>>> from packaging.utils import canonicalize_version
>>> str(v)
'2020.1.1'
>>> canonicalize_version("2020.01.01")
'2020.1.1'

However, note that things will still work even if there's zero-padding:

>>> Version("2020.01.01") == Version("2020.1.1")
True

Overall, the choice between 0Y/0M/0D vs YY/MM/DD is purely a aesthetic one -- either will work with basically every Python Packaging tool that follows the standards.


Oh, and... https://pypi.org/project/calver/ is a thing you might want to use, in case you're using setuptools. :)


PS: Please don't use distutils for... well... anything. pip, PyPI and most of the other Python packaging uses packaging.version for handling versions -- it's the reference/main implementation of PEP 440 and friends. At the cost of using a slightly grim analogy, distutils is brain-dead and we're gonna turn off the life support soon.

@jrbourbeau
Copy link
Member

jrbourbeau commented Oct 6, 2020

but makes it a bit harder to automate the versioning (we'd have to check some central registry to see what the last release was or something). Using a date is much simpler, since the version is based on the date alone.

Making our version a strict function of the date would be great, but I'm personally okay just having the releaser check PyPI and bump the version number accordingly. We already do this today as we need to know what the minor version is to bump. As long as we update our release procedure to clearly state what needs to be done, I don't see it being a large increase in the maintenance burden. That's just my opinion though, I'm happy to hear from others

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Making our version a strict function of the date would be great, but I'm personally okay just having the releaser check PyPI and bump the version number accordingly.

That's fine with me, I brain-lapsed and missed that all of this is still compatible with versioneer, we're just changing our tagging scheme. Need more coffee.

I'm mildly against the zero-padded version since it's not canonical, but could go either way here. Up to whoever does the work I guess.

@jakirkham
Copy link
Member

jakirkham commented Oct 6, 2020

What about doing YYYY.MM.DD? This is typically what I see projects using CalVer do. For example certifi

Some examples of YY.MM.DD include Ubuntu and pip. I think it comes down to whether we want users to identify our version as a date, or hide the calendar nature of the versioning.

I think part of the value of having YYYY.MM.DD is that helps align our brains to what is happening with the date. This YY.MM.DD looks kind of like this MM.DD.YY or this DD.MM.YY (European format) (depending on the numbers involved), which means there is a chance someone accidentally confuses these and makes a mistake during the release. Having YYYY.MM.DD makes it easier to recognize the year comes first as there are no 4 digit months or dates 😉

@jrbourbeau
Copy link
Member

I think it comes down to whether we want users to identify our version as a date, or hide the calendar nature of the versioning

It might be nice to have the calendar nature of our version more clear to users. For example, that would help better set user expectation about us not following semvar (xref #93).

@jrbourbeau
Copy link
Member

Okay, so it seems like YYYY.MM.X, where X is an incrementing identifier for releases in the same month, would allow us to release as often as we'd like (e.g. multiple releases on the same day) and is PEP 440 compliant.

What should X be at the start of a new month? 1 or 0? I'll suggest we 0-index our incrementing identifier

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

What should X be at the start of a new month? 1 or 0? I'll suggest we 0-index our incrementing identifier

That makes sense to me.

I'd also suggest we do M instead of MM (no zero pad). Slight preference for always using the canonical versions.

@jsignell
Copy link
Member

jsignell commented Oct 6, 2020

Do we want to talk about what we'll call the "blessed" version? I imagine that'd be called YYYY.MM with no X

@martindurant
Copy link
Member

what we'll call the "blessed" version

Yes, that was the idea, but all releases in that month will appear newer, regardless of when in the month we "bless".

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Yes, that was the idea, but all releases in that month will appear newer, regardless of when in the month we "bless".

Per the logic in packaging.version, YYYY.MM == YYYY.MM.0:

 Version("2020.10.0") == Version("2020.10")

IMO we want our versions to always be sortable in a way that corresponds to their dev history (if release B is after release A in git history, the release version B should also sort to after release version A). I'm not sure if adding a "blessed" indicator on a version is possible per pep440. This seems like something we could solve with a docs page, perhaps a table of "blessed" release version numbers with dates. Keep it simple.

@jakirkham
Copy link
Member

Or we could reserve the last day of the month for the blessed version. Though maybe we are drifting away from days now.

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Or we could reserve the last day of the month for the blessed version.

That still fails the sortable property if there are releases in the same month that happen afterwards.

@jakirkham
Copy link
Member

In conda-forge where we do have a couple of cases of releasing something more than once a day, we just include additional details about the time (HH.MM.SS). I doubt we need to get that detailed here, but it is an option. Not sure how that works with PEP440 though.

@jsignell
Copy link
Member

jsignell commented Oct 6, 2020

Version("2020.10.0") == Version("2020.10")

Can we just 1-index the regular releases then?

@jcrist
Copy link
Member

jcrist commented Oct 6, 2020

Can we just 1-index the regular releases then?

This still doesn't match the sorting requirement, as the blessed version would sort less than all other versions released that month.

I'm really against any versioning scheme where sorting the versions doesn't also match the git history. Otherwise it makes it hard to check for the presence of certain features/bug fixes based on versions. If A < B < C, then I'd expect a feature/bugfix added in B to also be present in C. Making a "blessed" version that doesn't follow this pattern breaks this assumption and makes it harder to gate features/workarounds based on version number.

@jsignell
Copy link
Member

M

@mrocklin
Copy link
Member Author

I voted for MM, but I'd also love to learn more about the downsides of using MM before making that decision.

@consideRatio
Copy link

I now voted for MM because I understand it to not have zero padding and M to have zero padding. Zero padding breaks SemVer2 specification, and breaking that means Helm charts need another version when publishing as Helm3 requires SemVer2 compatible versions.

@jacobtomlinson
Copy link
Member

@consideRatio isn't MM the zero padded option?

@consideRatio
Copy link

Ah, then I prefer the non-zero-padded option, and I assume you are right that it is represented by M. But, only because Helm charts require it =/

About Helm chart's constraints

I know of three Helm charts part of the dask organizations, all of these are required to have a non-zero-padded version in their Chart.yaml files.

  • dask-gateway
  • dask (A Jupyter server + a dask cluster in k8s)
  • daskhub (JupyterHub + Dask-Gateway: a jupyter server per user, allowed to create individual dask clusters)

@jacobtomlinson
Copy link
Member

Thanks for sharing, I wasn't aware of this limiation in Helm.

I guess given that zero-padded is the most popular option we will just have to make the charts outliers which use the non-zero-padded version.

@consideRatio
Copy link

@jacobtomlinson does it remain the most popular given this information? I for example would also vote for zero padded unless it were for this. Does it merit a recount?

@jacobtomlinson
Copy link
Member

IMO the Helm charts service a small part of the community, so probably shouldn't dictate large decisions like this. But happy to hear from others if they feel differently.

@martindurant
Copy link
Member

Anyone who wishes to change their vote could comment here, but I wouldn't suggest calling for a new vote. I Still think passing is better, because it ensures numeric ordering and text ordering agree and is a closer match to scientific/iso formats.

@hristog
Copy link

hristog commented Apr 3, 2021

Hi, I've noticed that dask-ml is still following its original versioning scheme. Are there any plans or discussions, regarding updating it to CalVer as well?

@jrbourbeau
Copy link
Member

I don't think there's a hard rule about sub-projects switching to CalVer (though some projects like dask-cloudprovider and dask-kubernetes have done so). I recommend opening up an issue in dask-ml about CalVer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests