Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to the version sorting (versorted) output #13

Closed
tdruez opened this issue Jul 28, 2014 · 7 comments
Closed

Enhancement to the version sorting (versorted) output #13

tdruez opened this issue Jul 28, 2014 · 7 comments
Labels

Comments

@tdruez
Copy link

tdruez commented Jul 28, 2014

Hi Seth, thanks for the great work on this lib!
I'd like your opinion on something regarding the version sorting.

From https://github.com/SethMMorton/natsort/blob/master/test_natsort/test_natsort.py#L211

>>>  a = ['1.9.9a', '1.11', '1.9.9b', '1.11.4', '1.10.1']

Let's add the '1.11a' version to the list.

>>> a = ['1.9.9a', '1.11', '1.9.9b', '1.11.4', '1.10.1']
>>> natsorted(a)
['1.9.9a', '1.9.9b', '1.10.1', '1.11', '1.11.4', '1.11a']

I think the '1.11a' should be sorted before the '1.11.4'.
What's your take on this? Thanks.

@SethMMorton
Copy link
Owner

SethMMorton commented Jul 29, 2014

What output would you expect from the following:

>>> a = ['1.11', '1.11.4', '1.11a', '1.11.0', '1.11a.0', '1.11a.4']

I agree that '1.11a' would go before '1.11.4' in some cases, but I am not sure if everyone would agree to this 100% percent of the time. The problem is that versioning for pre-release is not overly strict, so you can get cases like this where the pattern doesn't match up.

We can investigate why the ordering is the way it is using the output from the natsort_keygen- generated function:

>>> from natsort import natsorted, natsort_keygen
>>> nsk = natsort_keygen()
>>> [nsk(x) for x in natsorted(a)]
[(u'', 1, '.', 11),
 (u'', 1, '.', 11, '.', 0),
 (u'', 1, '.', 11, '.', 4),
 (u'', 1, '.', 11, 'a'),
 (u'', 1, '.', 11, 'a.', 0),
 (u'', 1, '.', 11, 'a.', 4)]

Since 'a' comes after '.' in the ASCII table, the '.' is put first. If you wanted to reverse this, you could replace '.' in your strings with something that comes at the end of the ASCII table (such as '~'):

>>> natsorted(a, key=lambda x: x.replace('.', '~'))
>>> ['1.11', '1.11a', '1.11a.0', '1.11a.4', '1.11.0', '1.11.4']

Would this work for your use case? Is this common enough that you think should be added to the natsort API?

@tdruez
Copy link
Author

tdruez commented Jul 29, 2014

Thanks for the detailed explanation.
I'm able to get the result I was looking for thanks to your suggestion replacing the '.' with a '~'.
My use case was actually:

>>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2.1', '1.1', '1.3']
>>> natsorted(a, key=lambda x: x.replace('.', '~'), reverse=True)
['1.3', '1.2.1', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2', '1.1' ]

I don't think it needs to be added to the API, until other people manifest a need too.
Your solution is good enough for me.
Thanks :)

@tdruez tdruez closed this as completed Jul 29, 2014
@SethMMorton
Copy link
Owner

I'm glad I could help! But there is one thing I am confused about... The results you show look like it sorted in reversed order. Did you actually use the reverse keyword but not add that in your example? If not, it would be a bug.

@tdruez
Copy link
Author

tdruez commented Jul 30, 2014

Good catch, I do use the reverse.
I've edited my example to avoid further confusion.

@cel4
Copy link

cel4 commented Aug 24, 2014

Sorry for jumping in here, but I have a related problem like @tdruez described in this bug and it probably does not make sense to open a new issue for this.

['1.9.9a', '1.9.9b', '1.10.1', '1.11', '1.11.4', '1.11a']

I think the '1.11a' should be sorted before the '1.11.4'.
What's your take on this? Thanks.

I have the same problem, but with release candidates. But I would like to add 1.11rc1 to the list. I would expect to get 1.11a1 < 1.11b1 < 1.11rc1 < 1.11. I'm pretty surprised that @tdruez suggested 1.11a should be after 1.11.

@SethMMorton
Copy link
Owner

SethMMorton commented Aug 24, 2014

If '1.11' were '1.11.0' instead, this would work as expected (assuming you do the '~' trick I suggested). The sorting algorithm doesn't actually comprehend the input as versions numbers, but rather separates out the numbers for you so that things ascend properly. What is happening is that each of the four numbers you suggest have '1.11' at the front, so the one with no trailing characters is placed first. Imagine that we replaced '1.11' with 'and', and you will see what I mean: and < anda1 < andb1 < andrc1

To remedy this, you can try something bold like this:

>>> natsorted(['1.11', '1.11rc1', '1.11a1', '1.11b1'], key=lambda x : x+'z')
['1.11a1', '1.11b1', '1.11rc1', '1.11']

This will tack on the 'z' character to each version, so that you will be sorting ['1.11z, '1.11rc1z', '1.11a1z', '1.11b1z'] instead. 'z' comes after any of 'a', 'b', or 'rc', so '1.11' ends up last. If you need to also do the '~' trick I suggested above, you could use the key lambda x : x.replace('.', '~')+'z'

If for some reason this does not work, let me know why and I can try and suggest other ways.

@cel4
Copy link

cel4 commented Aug 24, 2014

Thanks, the adding-z hack seems to work for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants