-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Return largest non-NaN value for max()
on sorted float arrays if it exists instead of NaN
#15060
Conversation
@@ -182,10 +206,10 @@ where | |||
} | |||
|
|||
fn mean(&self) -> Option<f64> { | |||
if self.is_empty() || self.null_count() == self.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by, the is_empty()
call is redundant
/// Get a slice of the non-null values of a sorted array. | ||
/// # Safety | ||
/// The array is sorted and has at least one non-null value. | ||
pub unsafe fn slice_sorted_non_null_and_offset<T>(ca: &ChunkedArray<T>) -> (usize, ChunkedArray<T>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this for now because binary search does not yet work for arrays sorted with nulls last (#15045)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15060 +/- ##
==========================================
+ Coverage 81.01% 81.05% +0.03%
==========================================
Files 1338 1340 +2
Lines 173595 173822 +227
Branches 2461 2456 -5
==========================================
+ Hits 140635 140884 +249
+ Misses 32490 32469 -21
+ Partials 470 469 -1 ☔ View full report in Codecov by Sentry. |
max()
on sorted float arrays if they existmax()
on sorted float arrays if it exists instead of NaN
}; | ||
use crate::prelude::*; | ||
|
||
macro_rules! impl_sorted_float_arg_max { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this generic? I want to use macro's only for dispatch whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a try, it's now implemented on ChunkedArray<T: PolarsFloatType>
instead of using a macro, but let me know if you meant in another way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, much better. Thank you! :)
Fixes #12931.
This PR uses a binary search to find non-NaN values for the sorted max/arg_max on floats (if they exist). In the case where all non-null values are NaNs, then the result will be NaN (note the result for all-null is still null).
Part of the binary search code was refactored and moved from
polars-ops
topolars-core
to support this.