refactor(rust!): Refactor AnyValue supertype logic #15280

stinodego · 2024-03-25T12:06:51Z

Changes

Rename existing public util any_values_to_dtype to any_values_to_supertype_and_n_dtypes and move it to the polars_core::utils module. (this is the breaking part)
Add two public utils based on existing functionality:
- dtypes_to_supertype for inferring the supertype of multiple dtypes
- any_values_to_supertype for inferring the supertype of a collection of AnyValues

ritchie46 · 2024-03-25T12:51:24Z

crates/polars-core/src/utils/any_value.rs

+{
+    let mut supertype = DataType::Null;
+    let mut dtypes = PlHashSet::<DataType>::new();
+    for av in values {


Just for context. The indexset perf might be better, as you keep a smaller loop. Now you have a branch on every iteration of N and we expect N to be much bigger than K, where N is no. of elements and K is no. of unique datatypes.

I didn't realize this type of loop would be slower - I figured the collect to the IndexSet would have to do something similar (check whether the dtype hash is already in the set before adding it). Any recommendations on where I can read up on this kind of stuff?

The benefit of this implementation is that it can early exit if there is no supertype (before going through all N values), but I guess that's relatively rare so if it makes the loop slower it's not worth it.

The indexset is a different way of storing the values (inline in a vec, instead of the hash slots), the hash slots will store indexes to that vec. Can elaborate Thursday. ^^

but I guess that's relatively rare so if it makes the loop slower

I am not saying that it does :P. It is tough to tell. Though often errors in tight loops are not good. But I think the runtime is in the hashtable more than the branches here.

ritchie46 · 2024-03-25T12:52:22Z

py-polars/src/conversion/any_value.rs

@@ -419,3 +420,12 @@ pub(crate) fn py_object_to_any_value(ob: &PyAny, strict: bool) -> PyResult<AnyVa
        })
    })
 }
+
+fn any_values_to_supertype_and_n_types(values: &[AnyValue]) -> PolarsResult<(DataType, usize)> {


Can you make this generic? Then fn any_values_to_supertype can dispatch to this one and just get the DataType out.

I solved it slightly differently (extract the logic and re-use).

codecov · 2024-03-25T13:10:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.32%. Comparing base (53f5536) to head (aad4c59).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15280      +/-   ##
==========================================
- Coverage   81.32%   81.32%   -0.01%     
==========================================
  Files        1359     1360       +1     
  Lines      176072   176070       -2     
  Branches     2526     2526              
==========================================
- Hits       143193   143188       -5     
- Misses      32396    32399       +3     
  Partials      483      483

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ritchie46

Clean second pass. 👍

Add dtypes_to_supertype util

b59ff02

github-actions bot added breaking rust Change that breaks backwards compatibility for the Rust crate internal An internal refactor or improvement rust Related to Rust Polars labels Mar 25, 2024

stinodego added 5 commits March 25, 2024 13:12

Add any_values_to_supertype util

e5f1511

Use util

3aeddc5

Use util

1aa2946

Drive-by

013553a

Move util

e721659

stinodego force-pushed the supertype-dup branch from 8b337c0 to e721659 Compare March 25, 2024 12:12

stinodego marked this pull request as ready for review March 25, 2024 12:19

stinodego requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and orlp as code owners March 25, 2024 12:19

ritchie46 reviewed Mar 25, 2024

View reviewed changes

stinodego marked this pull request as draft March 25, 2024 13:41

Refactor anyvalue utils

aad4c59

stinodego marked this pull request as ready for review March 25, 2024 14:08

ritchie46 approved these changes Mar 25, 2024

View reviewed changes

ritchie46 merged commit 705b148 into main Mar 25, 2024
24 checks passed

ritchie46 deleted the supertype-dup branch March 25, 2024 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(rust!): Refactor AnyValue supertype logic #15280

refactor(rust!): Refactor AnyValue supertype logic #15280

stinodego commented Mar 25, 2024 •

edited

Loading

ritchie46 Mar 25, 2024

stinodego Mar 25, 2024 •

edited

Loading

ritchie46 Mar 25, 2024

ritchie46 Mar 25, 2024

stinodego Mar 25, 2024

codecov bot commented Mar 25, 2024 •

edited

Loading

ritchie46 left a comment

refactor(rust!): Refactor AnyValue supertype logic #15280

refactor(rust!): Refactor AnyValue supertype logic #15280

Conversation

stinodego commented Mar 25, 2024 • edited Loading

Changes

ritchie46 Mar 25, 2024

Choose a reason for hiding this comment

stinodego Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

ritchie46 Mar 25, 2024

Choose a reason for hiding this comment

ritchie46 Mar 25, 2024

Choose a reason for hiding this comment

stinodego Mar 25, 2024

Choose a reason for hiding this comment

codecov bot commented Mar 25, 2024 • edited Loading

Codecov Report

ritchie46 left a comment

Choose a reason for hiding this comment

stinodego commented Mar 25, 2024 •

edited

Loading

stinodego Mar 25, 2024 •

edited

Loading

codecov bot commented Mar 25, 2024 •

edited

Loading