-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
descending
arg looks backwards in top_k
#15236
Comments
Confusing in general. I would expect the |
you can pass multiple values to
|
I see. Seems like a single function called |
Yup...which leaves open the question of "what to do with |
A somewhat more explicit but verbose would be a import polars as pl
df = pl.DataFrame({
"a": [1, 1, 2, 2, 3, 3],
"b": [2, 1, 2, 1, 2, 1],
})
# (a=3, b=2)
df.top_k(k=1, by=pl.all())
# (a=3, b=1)
df.top_k(k=1, by=pl.all(), sort_method=["ascending", "descending"])
# (a=1, b=1)
df.bottom_k(k=1, by=pl.all())
# (a=1, b=2)
df.bottom_k(k=1, by=pl.all(), sort_method="descending", "ascending") |
And actually, just renaming |
I don't think it can just be renamed, because then the output of I feel like suggesting Current behaviour: In [46]: import polars as pl
...: df = pl.DataFrame({'a': [1,3,2]})
...: df.top_k(k=3, by='a', descending=False)
Out[46]:
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
│ 2 │
│ 1 │
└─────┘ Proposed new api: In [46]: import polars as pl
...: df = pl.DataFrame({'a': [1,3,2]})
...: df.top_k(k=3, by='a', find_smallest=False)
Out[46]:
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
│ 2 │
│ 1 │
└─────┘ |
I ran across this yesterday in a different context - it looks to me like the underlying Rust code is doing it backwards. Here's where the Python code calls into the Rust, if I understand it correctly: polars/py-polars/polars/lazyframe/frame.py Line 1550 in 31df06d
To me it seems like the Python code reflects the Rust API and is faithfully passing I don't like this behavior but it feels like the Python and Rust code should be kept consistent, so my temptation would be to look in to addressing the problem upstream of the Python bindings. I'm personally not actually using Python at all except as an example (we're looking at the Scala JNI bindings to the Rust backend) but my coworkers would want to use the Python bindings. It would be excellent for us to be able to be on the same page with respect to the API shape. |
We will rename the parameter to |
Checks
Reproducible example
This came up here: https://discord.com/channels/908022250106667068/1082941945216782397/1220375752428753027
Log output
Issue description
If I read
df.top_k(k=1, by='a', descending=False)
, I expect it to mean "get the top 1 smallest element". However, it actually finds the top 1 largest element (contradictingdescending=False
?)Conversely,
df.top_k(k=1, by='a', descending=True)
finds the smallest elementExpected behavior
Not sure, it requires more discussion. But for a start, there should be an issue about it :)
Polars used to have
reverse
arguments for sorting functions, but they were changed todescending
. But in this case,reverse
does feel quite naturalI don't really know what to suggest, but comments are very welcome
Installed versions
The text was updated successfully, but these errors were encountered: