-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Math] Fix bug and warning in recently-introduced TMath::ModeHalfSample
#19578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
math/mathcore/inc/TMath.h
Outdated
@@ -1463,9 +1463,11 @@ template <typename T> Double_t TMath::ModeHalfSample(Long64_t n, const T *a, con | |||
const auto vstop = values.end(); | |||
const auto viter = std::lower_bound(vstart, vstop, value); | |||
const auto vidx = std::distance(vstart, viter); | |||
if (viter != vstop && w) { | |||
if (viter != vstop && *viter == value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (viter != vstop && *viter == value) { | |
if (w && viter != vstop && *viter == value) { |
Thanks.
However: I think that deduplication must be only done if weights vary.
Why? The Bickel algorithm expects all values have the same weight of 1.
The deduplication is just a fallback when the user passes varying weights, then we fall back to getting the Mode of the sample.
Otherwise, we apply the Bickel algorithm with duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Okay I see. This was not explicit in the code, which is why I was confused.
I'll add some comment explaining that.
Maybe, to make it more clear, we should have a different code path for w == nullptr
, where the sorting of the values is just done by std::sort
, the weights vector is initialized with ones, and that's that? Then the postentially expensive vector::insert
logic to keep the weights in sync can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was explained here: https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1438
and https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1454
though there is a mistake in this comment: https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1504
since that part is reached also if elements are duplicated
Separate code paths might be more efficient, but the code maybe more duplicated / less compact. And you wouldn't need a weights vector initialized to 1 either.
The funciton de-duplicates the input values, but the logic was wrong. `std::lower_bound` returns a hit even if there is no exact match, it just looks for the first element that is larger. Therefore, we need an additional value check. Also, fixes the following compiler warning that relates to doing `data() + 1` which is not valid if the vector would be empty: ``` In member function ‘void std::__new_allocator<_Tp>::deallocate(_Tp*, size_type) [with _Tp = double]’, inlined from ‘static void std::allocator_traits<std::allocator<_CharT> >::deallocate(allocator_type&, pointer, size_type) [with _Tp = double]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/alloc_traits.h:550:23, inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:392:19, inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:388:7, inlined from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:371:15, inlined from ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:751:7, inlined from ‘Double_t TMath::ModeHalfSample(Long64_t, const T*, const Double_t*) [with T = short int]’ at /home/rembserj/code/root/root_build/include/TMath.h:1540:1: /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/new_allocator.h:172:26: warning: ‘void operator delete(void*, std::size_t)’ called on a pointer to an unallocated object ‘18446744073709551608’ [-Wfree-nonheap-object] ```
weights.insert(weights.begin() + vidx, weight); | ||
if (w == nullptr) { | ||
values.assign(a, a + n); | ||
weights.assign(n, 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weights.assign(n, 1.0); |
@@ -1483,7 +1493,7 @@ template <typename T> Double_t TMath::ModeHalfSample(Long64_t n, const T *a, con | |||
const double d1_0 = values[1] - values[0]; | |||
const double d2_1 = values[2] - values[1]; | |||
if (d2_1 < d1_0) | |||
return TMath::Mean(2, values.data() + 1, weights.data() + 1); | |||
return TMath::Mean(std::next(values.begin()), values.end(), std::next(weights.begin())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is optimization, then this should maybe also go in a separate codepath, one no weights variable, and one with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1480: same comment, could be separate.
1505: this is a "fallback-jump" for very rare cases. In principle we were about to get the normal mode of the sample because weights were varying. But if after deduplicating and summing, weights end up being all equal, then we switch to Bickel algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove "are unique" in the comment in 1512, thanks!
Test Results 20 files 20 suites 3d 2h 7m 9s ⏱️ For more details on these failures, see this check. Results for commit c3dceb9. |
Follows up on d0e79a5.
The funciton de-duplicates the input values, but the logic was wrong.
std::lower_bound
returns a hit even if there is no exact match, it just looks for the first element that is larger. Therefore, we need an additional value check.Also, fixes the following compiler warning that relates to doing
data() + 1
which is not valid if the vector would be empty: