Conversation
When multiple worker threads simultaneously call uv_timer_start/stop on timers belonging to the same event loop, they concurrently modify the shared heap data structure without synchronization. This causes segmentation faults in pmproxy. This fix - removes the unsafe calls uv_timer_start() and uv_update_time() from worker threads - uses thread-safe time tracking uv_hrtime() - moves timer logic to the event loop thread
When expanding a delta indom into a full indom, the function copies name pointers by reference (shallow copy) from the previous full indom into the new namelist. This creates a mismatch when two different memory allocation schemes coexist in the same hashindom chain leading to free() of invalid pointers causing SIGABRT. This fix is replacing shallow copies of pointers with strdup() and sets PMLID_NAMES to ensure the expanded namelist is an independent allocation that can be safely freed.
|
fedora-rawhide and ubuntu2204 pipelines failed due to issues starting the container - not related to the proposed fix. However the changes here are quite significant in the way how context timers are handled, so I would appreciate some review from @kmcdonell or @natoscott . Thanks guys in advance. |
natoscott
left a comment
There was a problem hiding this comment.
I'll defer to @kmcdonell comment on the libpcp namelist change, but the libpcp_web changes make sense. We may need a similar hrtimer transition in pmproxy/src too?
kmcdonell
left a comment
There was a problem hiding this comment.
@kurik Thanks for the triage here. I'll defer to @natoscott for the libpcp_web changes, but with one small improvement I think the libpcp ones are OK.
Do you have a reproducer for this? I'm guessing not, it is just the corpses from pmproxy that are scattered on the ground.
I've managed to build a new QA app and test (qa/src/undelta.c et al, commit incoming) that dies spectacularly with the unmodified logmeta.c, but passes with (my version) of the changes in place.
I'm a little surprised that this has not bitten before, but I think the reason is that the common code path below pmNewContext() that most archive consuming tools are using ends up allocating the components of all the __pmLogInDom structs in a way that avoids the problem you've identified, pmproxy on the other hand is reading metadata records directly from disk for the "fail -f" mode of ingestion and is probably building the __pmLogInDom structs in a slightly different way.
There was a problem hiding this comment.
I suspect the proposed changes are fixing one problem and introducing a potential memory leak.
On review of the code, I believe __pmLogUndeltaInDom() should honour the PMLID_NAMES bit in the alloc field of the delta __pmLogInDom.
So a slightly different change for the last strdup() would be along the lines:
if (tidp->alloc & PMLID_NAMES)
namelist[k] = didp->namelist[j];
else
namelist[k] = strdup(didp->namelist[j]);
Hand craft some __pmLogInDom structures and then let __pmLogUndeltaInDom() chew on 'em. qa/1664 runs undelta, and qa/1655 is the valgrind dual. Also related to https://github.com/performancecopilot/pcp/pull/2525/changes. Fails spectacularly without the fix from PR performancecopilot#2525.
The fix comes from an analysis of a bunch of core files I am collecting from my test runs. I do not have a "reliable" reproducer, but this Fedora gating test reliably reproduces it. The test starts 10 containers in the background where each container calls set of REST API endpoints in a cycle (work as a
Thanks for this @kmcdonell . I am going to rebuild PCP with this code change and will test it. |
|
@kmcdonell I reworked the patch with you suggestion, tested it for a day and it seems to work well. I do not observe any memory leaks. Can you please review it once more ? Thanks in advance. |
|
Hey @kurik, |
|
Thanks a lot @kmcdonell . I would be happy if those crashes will be fixed, it does not need to be fixed by my code :-) |
In light of the excellent triage from Jan Kurik over at performancecopilot#2525 I've refactored the core indom replication part of __pmLogUndeltaInDom() - remove duplicated code - only one place for each of delete instance, add instance from previous full indom and add instance from delta indom - rename tidp to fidp - this points to the previous full indom, not some "temporary" as implied by "t" - clean up the flow of control - add comments to explain what's going on - ensure the reconstructed indom is fully malloc'd so it cannot contain pointers to (instance names in particular) data malloc'd someplace else, or worse in the buffer used to read physical indom records from disk - free any space previously allocated to the delta indom
|
@kurik I've committed my new logmeta.c in commit bf0dfb6 over in my tree. It looks solid in my QA testing and I've attached the new source file to this PR. Suggest you take it for a spin, and if it looks OK, you can close this PR and I'll make the same change to the libpcp3 version, commit that and push both. Thanks for you on-going assistance on this one, and your initial triage that pin pointed the root cause. |
Hand craft some __pmLogInDom structures and then let __pmLogUndeltaInDom() chew on 'em. qa/1664 runs undelta, and qa/1655 is the valgrind dual. Also related to https://github.com/performancecopilot/pcp/pull/2525/changes. Fails spectacularly without the fix from PR performancecopilot#2525.
|
@kmcdonell The patch in bf0dfb6 looks mostly good. However I have some comments :-) During testing I observed some core dumps of the Trying to understand the patch, it seems to me like the code starting on line Line 1203 in bf0dfb6 numinst is then overcounted or undercounted . This leads to heap corruption and unrelated free() is then called during the context destruction. Changing Line 1312 in bf0dfb6 didp->numinst = numinst; to didp->numinst = k; should fix this issue with the unrelated free().
There is also a next problem, causing another Line 1266 in bf0dfb6 Line 1285 in bf0dfb6 Line 1260 in bf0dfb6 Line 1208 in bf0dfb6 numinst to be too small. I guess the simplest fix for this will be to remove lines Line 1207 in bf0dfb6 Line 1208 in bf0dfb6 At last there is also one typo, I guess, where on line Line 1288 in bf0dfb6 fidp->namelist[j] should be fidp->namelist[i] (change the index j -> i).
With all three proposed fixes above implemented in the code, I do not observe memory leaks nor crashes. |
|
@kurik Jeez, it is time I was put out to pasture. Sorry for my sloppiness here ... let me undertake some penance and when I've digested your changes and if I have no further questions I'll simply take your version, commit this and push it and close this PR so we can all move onto something more productive. |
Over at performancecopilot#2525 it is becoming evident that pmproxy maybe feeding __pmLogUndeltaInDom() delta indoms with an incomplete history of earlier indoms, so the undelta'ing is either wrong or incomplete. Checks added in the last round of logmeta.c changes were detecting problems but the workaround (skipping) logic was wrong, leading to assert()s for Jan, although it passes QA for me (because we have nothing in our QA suite that accurately exercises pmproxy in this scenario). After digesting Jan's latest comments, I made more changes for qa/src/undelta.c and qa/1655 was able to reproduce the assert()s and then rework the logmeta.c code. With these changes QA check -g indom -g pmproxy -g archive -g archive_v3 are all passing, which is as much testing as I can do in the QA Farm.
|
@kurik Sigh. We start with all the instances from the full indom then add 1 for each new instance and subtract 1 for each deleted instance. This is correct for normal libpcp uses, but it seems not for pmproxy [because it is "tailing" the end of the archive it maybe is chewing on incomplete metadata]. Now it starts to get murky .... any chance you still have pmproxy.log? If so, I'd expect to see some But the fact that you hit the assert makes me suspect the Botch case is happening ... this is again a "problem" only with pmproxy because it is not processing the whole archive as expected by the libpcp code [so there maybe a deeper, darker problem here for pmproxy]. I've updated qa/src/undelta.c to simulate the two Botch cases, and sure enough Kaboom, qa/1655 hits the assert() at line 1266. I added the extra integrity checks in the earlier commit(s) "to be sure, to be sure" not expecting 'em to catch anything, but looks like pmproxy is tripping these integrity checks and then the remaining count and skip logic in logmeta.c was incorrect, leading to the assert()s. New commit 0473ab3 in my repo addresses the above by making over-estimating the size of the undelta'd indom, fixing the accounting of instances to be more robust in the presence of unexpected delta indom data. Thanks once again, let's give it another try. |
Unfortunately, I do not have the
So far, I was not able to reproduce the problem with the 0473ab3 commit. All seem to work well with this version of I have not run the full QA testsuite, assuming you did. If not let me know and I can run it on Monday. |
|
All good @kurik. I've pushed that commit, plus another couple to And yes, I've been running full QA with this fix in the QA Farm across multiple platforms now. I'm going to close this PR, as the real commits are elsewhere, ending with fed1b80 Thanks once again, for your assistance on this one. |
Fix concurrent call of uv_timer's causing pmproxy crashes
When multiple worker threads simultaneously call uv_timer_start/stop on
timers belonging to the same event loop, they concurrently modify the
shared heap data structure without synchronization. This causes segmentation
faults in pmproxy.
This fix:
Fix of expansion of delta indom causing pmproxy crashes
When expanding a delta indom into a full indom, the function copies name
pointers by reference (shallow copy) from the previous full indom into
the new namelist. This creates a mismatch when two different memory
allocation schemes coexist in the same hashindom chain leading to free()
of invalid pointers causing SIGABRT.
This fix: is replacing shallow copies of pointers with strdup() and sets
PMLID_NAMES to ensure the expanded namelist is an independent allocation
that can be safely freed.