Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Return largest non-NaN value for max() on sorted float arrays if it exists instead of NaN #15060

Merged
merged 11 commits into from
Mar 15, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Mar 14, 2024

Fixes #12931.

This PR uses a binary search to find non-NaN values for the sorted max/arg_max on floats (if they exist). In the case where all non-null values are NaNs, then the result will be NaN (note the result for all-null is still null).

Part of the binary search code was refactored and moved from polars-ops to polars-core to support this.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Mar 14, 2024
@@ -182,10 +206,10 @@ where
}

fn mean(&self) -> Option<f64> {
if self.is_empty() || self.null_count() == self.len() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by, the is_empty() call is redundant

/// Get a slice of the non-null values of a sorted array.
/// # Safety
/// The array is sorted and has at least one non-null value.
pub unsafe fn slice_sorted_non_null_and_offset<T>(ca: &ChunkedArray<T>) -> (usize, ChunkedArray<T>)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this for now because binary search does not yet work for arrays sorted with nulls last (#15045)

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 88.65979% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 81.05%. Comparing base (82932c6) to head (705b4f6).
Report is 18 commits behind head on main.

❗ Current head 705b4f6 differs from pull request most recent head ab4dc4c. Consider uploading reports for the commit ab4dc4c to get more accurate results

Files Patch % Lines
...polars-core/src/chunked_array/ops/aggregate/mod.rs 54.54% 15 Missing ⚠️
...polars-core/src/chunked_array/ops/search_sorted.rs 95.00% 4 Missing ⚠️
crates/polars-ops/src/series/ops/search_sorted.rs 71.42% 2 Missing ⚠️
crates/polars-ops/src/series/ops/arg_min_max.rs 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15060      +/-   ##
==========================================
+ Coverage   81.01%   81.05%   +0.03%     
==========================================
  Files        1338     1340       +2     
  Lines      173595   173822     +227     
  Branches     2461     2456       -5     
==========================================
+ Hits       140635   140884     +249     
+ Misses      32490    32469      -21     
+ Partials      470      469       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nameexhaustion nameexhaustion marked this pull request as ready for review March 14, 2024 15:16
@nameexhaustion nameexhaustion changed the title fix: Return non-NaN values for max() on sorted float arrays if they exist fix: Return largest non-NaN value for max() on sorted float arrays if it exists instead of NaN Mar 14, 2024
@nameexhaustion nameexhaustion marked this pull request as draft March 15, 2024 01:12
@nameexhaustion nameexhaustion marked this pull request as ready for review March 15, 2024 05:58
};
use crate::prelude::*;

macro_rules! impl_sorted_float_arg_max {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this generic? I want to use macro's only for dispatch whenever possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a try, it's now implemented on ChunkedArray<T: PolarsFloatType> instead of using a macro, but let me know if you meant in another way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, much better. Thank you! :)

@ritchie46 ritchie46 merged commit ec04150 into pola-rs:main Mar 15, 2024
23 checks passed
@nameexhaustion nameexhaustion deleted the float-min-max branch March 15, 2024 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

min/max incorrectly can return nan on sorted arrays
2 participants