health_monitor_backend: dangling reference fix#29909
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a potential dangling-iterator access in health_monitor_backend::fill_aggregate_with_offline_partitions by changing the inner iteration strategy so iterator stability can be checked before advancing the inner iterator.
Changes:
- Replace
ssx::async_for_each_counteroverassignment_setwith a manual iterator loop usingssx::async_while_counter. - Add an explicit
it.check()call inside the inner loop body prior to dereference/increment of the inner iterator.
CI test resultstest results on build#82154
|
The inner loop within fill_aggregate_with_offline_partitions would use an iterator pointing to potentially invalid memory. Although the first few statements in the inner lambda check whether the outer iterator was stable for this duration, invalid memory was already read in iterating against the inner iterator. This change swaps 'for' for 'while', such that outer iterator stability can be checked BEFORE incrementing the inner iterator.
605f632 to
092097e
Compare
| @@ -1136,12 +1132,17 @@ ss::future<> health_monitor_backend::fill_aggregate_with_offline_partitions( | |||
| ++it) { | |||
| const auto& topic = it->first; | |||
There was a problem hiding this comment.
Do we also need to it.check() here?
There was a problem hiding this comment.
I was thinking the same.. my understanding is it is not needed because there is no scheduling point between cond() and f().. the logic of async_while() is something like..
....
if (cond()) {
f()
}
...
co_await maybe_yield()
There was a problem hiding this comment.
Ah nevermind, I see the stable iterator validates when incrementing
There was a problem hiding this comment.
Yea this is a slightly different portion of code; I also scrutinized L1144 but I agree it seems safe
There was a problem hiding this comment.
so the idea here is that it.check is also protecting the inner iterators over assignment set?
|
/backport v25.3.x |
|
/backport v25.2.x |
|
/backport v25.1.x |
The inner loop within fill_aggregate_with_offline_partitions would use an iterator pointing to potentially invalid memory. Although the first few statements in the inner lambda check whether the outer iterator was stable for this duration, invalid memory was already read in iterating against the inner iterator.
This change swaps 'for' for 'while', such that outer iterator stability can be checked BEFORE incrementing the inner iterator.
Backports Required
Release Notes
Bug Fixes