Skip to content

[Math] Fix bug and warning in recently-introduced TMath::ModeHalfSample #19578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Aug 8, 2025

Follows up on d0e79a5.

The funciton de-duplicates the input values, but the logic was wrong. std::lower_bound returns a hit even if there is no exact match, it just looks for the first element that is larger. Therefore, we need an additional value check.

Also, fixes the following compiler warning that relates to doing data() + 1 which is not valid if the vector would be empty:

In member function ‘void std::__new_allocator<_Tp>::deallocate(_Tp*, size_type) [with _Tp = double]’,
    inlined from ‘static void std::allocator_traits<std::allocator<_CharT> >::deallocate(allocator_type&, pointer, size_type) [with _Tp = double]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/alloc_traits.h:550:23,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:392:19,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:388:7,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:371:15,
    inlined from ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:751:7,
    inlined from ‘Double_t TMath::ModeHalfSample(Long64_t, const T*, const Double_t*) [with T = short int]’ at /home/rembserj/code/root/root_build/include/TMath.h:1540:1:
/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/new_allocator.h:172:26: warning: ‘void operator delete(void*, std::size_t)’ called on a pointer to an unallocated object ‘18446744073709551608’ [-Wfree-nonheap-object]

@guitargeek guitargeek requested a review from hageboeck August 8, 2025 08:46
@guitargeek guitargeek self-assigned this Aug 8, 2025
@guitargeek guitargeek requested a review from lmoneta as a code owner August 8, 2025 08:46
@@ -1463,9 +1463,11 @@ template <typename T> Double_t TMath::ModeHalfSample(Long64_t n, const T *a, con
const auto vstop = values.end();
const auto viter = std::lower_bound(vstart, vstop, value);
const auto vidx = std::distance(vstart, viter);
if (viter != vstop && w) {
if (viter != vstop && *viter == value) {
Copy link
Collaborator

@ferdymercury ferdymercury Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (viter != vstop && *viter == value) {
if (w && viter != vstop && *viter == value) {

Thanks.
However: I think that deduplication must be only done if weights vary.
Why? The Bickel algorithm expects all values have the same weight of 1.
The deduplication is just a fallback when the user passes varying weights, then we fall back to getting the Mode of the sample.
Otherwise, we apply the Bickel algorithm with duplicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Okay I see. This was not explicit in the code, which is why I was confused.

I'll add some comment explaining that.

Maybe, to make it more clear, we should have a different code path for w == nullptr, where the sorting of the values is just done by std::sort, the weights vector is initialized with ones, and that's that? Then the postentially expensive vector::insert logic to keep the weights in sync can be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was explained here: https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1438
and https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1454

though there is a mistake in this comment: https://github.com/root-project/root/pull/19578/files#diff-3a68352e5e4ab7cb364b4bc37f3ca1419f97eb64f9821f8173624f676d5fb201R1504
since that part is reached also if elements are duplicated

Separate code paths might be more efficient, but the code maybe more duplicated / less compact. And you wouldn't need a weights vector initialized to 1 either.

The funciton de-duplicates the input values, but the logic was wrong.
`std::lower_bound` returns a hit even if there is no exact match, it
just looks for the first element that is larger. Therefore, we need an
additional value check.

Also, fixes the following compiler warning that relates to doing
`data() + 1` which is not valid if the vector would be empty:
```
In member function ‘void std::__new_allocator<_Tp>::deallocate(_Tp*, size_type) [with _Tp = double]’,
    inlined from ‘static void std::allocator_traits<std::allocator<_CharT> >::deallocate(allocator_type&, pointer, size_type) [with _Tp = double]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/alloc_traits.h:550:23,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:392:19,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:388:7,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:371:15,
    inlined from ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = double; _Alloc = std::allocator<double>]’ at /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/stl_vector.h:751:7,
    inlined from ‘Double_t TMath::ModeHalfSample(Long64_t, const T*, const Double_t*) [with T = short int]’ at /home/rembserj/code/root/root_build/include/TMath.h:1540:1:
/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-14.3.0/include/c++/14.3.0/bits/new_allocator.h:172:26: warning: ‘void operator delete(void*, std::size_t)’ called on a pointer to an unallocated object ‘18446744073709551608’ [-Wfree-nonheap-object]

```
weights.insert(weights.begin() + vidx, weight);
if (w == nullptr) {
values.assign(a, a + n);
weights.assign(n, 1.0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
weights.assign(n, 1.0);

@@ -1483,7 +1493,7 @@ template <typename T> Double_t TMath::ModeHalfSample(Long64_t n, const T *a, con
const double d1_0 = values[1] - values[0];
const double d2_1 = values[2] - values[1];
if (d2_1 < d1_0)
return TMath::Mean(2, values.data() + 1, weights.data() + 1);
return TMath::Mean(std::next(values.begin()), values.end(), std::next(weights.begin()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is optimization, then this should maybe also go in a separate codepath, one no weights variable, and one with.

Copy link
Collaborator

@ferdymercury ferdymercury Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1480: same comment, could be separate.
1505: this is a "fallback-jump" for very rare cases. In principle we were about to get the normal mode of the sample because weights were varying. But if after deduplicating and summing, weights end up being all equal, then we switch to Bickel algorithm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove "are unique" in the comment in 1512, thanks!

Copy link

github-actions bot commented Aug 8, 2025

Test Results

    20 files      20 suites   3d 2h 7m 9s ⏱️
 3 249 tests  3 244 ✅ 0 💤 5 ❌
63 573 runs  63 567 ✅ 0 💤 6 ❌

For more details on these failures, see this check.

Results for commit c3dceb9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants