Skip to content

🕳️ handle gaps in x #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 28, 2023
Merged

🕳️ handle gaps in x #20

merged 10 commits into from
Jan 28, 2023

Conversation

jvdd
Copy link
Member

@jvdd jvdd commented Jan 20, 2023

The current implementation crashed when there are gaps in the x (i.e., an equidistant x-bin that does not contain any datapoints).

To overcome this issue we now handle gaps

  • in the searchsorted functionality (aka the equidistant binning) by returning None when the bin is empty
    -> the output type of the Iterator is thus now Option<(usize, usize)>
  • in the M4 and MinMax generic code we check if the bin_size is <= 4 for M4 and 2 for minmax and then add all the indices
    -> the othter case is the default implementation (as was before)

Code updates:

  • handle gaps for MinMax
  • test this
  • handle gaps for M4
  • test this
  • add Python tests

TODO:

  • check performance impact

Benches on the gaps branch ⬇️

minmax_scalx_50M_f32    time:   [47.923 ms 48.396 ms 49.009 ms]
minmax_simdx_50M_f32    time:   [18.414 ms 18.586 ms 18.924 ms]
minmax_scalx_p_50M_f32  time:   [5.0812 ms 5.1216 ms 5.1748 ms]
minmax_simdx_p_50M_f32  time:   [4.0378 ms 4.0979 ms 4.1694 ms]

m4_scalx_50M_f32        time:   [46.977 ms 47.279 ms 47.669 ms]
m4_simdx_50M_f32        time:   [18.329 ms 18.441 ms 18.659 ms]
m4_scalx_p_50M_f32      time:   [7.3339 ms 7.4294 ms 7.5635 ms]
m4_simdx_p_50M_f32      time:   [4.8553 ms 4.8932 ms 4.9480 ms]

mlttb_scalx_50M_f32     time:   [66.573 ms 66.716 ms 66.881 ms]
mlttb_simdx_50M_f32     time:   [39.012 ms 39.743 ms 40.736 ms]
mlttb_scalx_p_50M_f32   time:   [13.433 ms 13.916 ms 14.530 ms]
mlttb_simdx_p_50M_f32   time:   [12.676 ms 12.772 ms 12.875 ms]

Benches on the main branch ⬇️

minmax_scalx_50M_f32    time:   [50.004 ms 51.015 ms 52.223 ms]
minmax_simdx_50M_f32    time:   [22.209 ms 22.425 ms 22.749 ms]
minmax_scalx_p_50M_f32  time:   [4.7367 ms 4.7549 ms 4.7757 ms]
minmax_simdx_p_50M_f32  time:   [3.8919 ms 3.9095 ms 3.9282 ms]

m4_scalx_50M_f32        time:   [49.127 ms 49.451 ms 49.871 ms]
m4_simdx_50M_f32        time:   [18.848 ms 18.865 ms 18.883 ms]
m4_scalx_p_50M_f32      time:   [7.2405 ms 7.2528 ms 7.2660 ms]
m4_simdx_p_50M_f32      time:   [4.9203 ms 4.9338 ms 4.9486 ms]

mlttb_scalx_50M_f32     time:   [66.978 ms 67.068 ms 67.170 ms]
mlttb_simdx_50M_f32     time:   [42.044 ms 43.382 ms 45.161 ms]
mlttb_scalx_p_50M_f32   time:   [13.511 ms 14.040 ms 14.689 ms]
mlttb_simdx_p_50M_f32   time:   [13.285 ms 13.666 ms 14.202 ms]

=> no real statistical differences :)


Some remarks 🤔
➡️ The most beautiful & flexible code update would be to alter the get_equidistant_bins to not return an (start, end) when the start & end are the same value (i.e., an empty bin).
-> I might revert the first commit & look into smth like that performed this in PR #25

Copy link
Member Author

@jvdd jvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙃

@jvdd jvdd merged commit aeeb8ad into main Jan 28, 2023
@jvdd jvdd deleted the gaps branch January 29, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant