Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have a long-running program that we want to run under sanitizers for
extra error checking and under faketime to speed up the clock. This
program hangs after a while. Backtraces suggest that the hangs occur
because of recursion in the memory allocator, which apparently locks a
non-recursive mutex.
Specifically, what we see is that due to our use of FAKETIME_NO_CACHE=1,
libfaketime wants to reload the config file inside a (random) call to
clock_gettime(). libfaketime then uses fopen() / fclose() for reading
the config files. These function allocate / free a buffer for reading
data and specifically the call to free() that happens inside fclose()
ends up calling clock_gettime() again. At this point, libfaketime locks
up because it has a time_mutex that is locked and none-recursive.
Sadly, I did not manage to come up with a stand-alone reproducer for
this problem. Also, the above description is from memory after half a
week of vacation, so it might be (partially) wrong.
More information can be found here:
This commit first adds a test case. This new test uses the already
existing libmallocintercept.so to cause calls to clock_gettime() during
memory allocation routines. The difference to the already existing
version is that additionally FAKETIME_NO_CACHE and
FAKETIME_TIMESTAMP_FILE are set. This new test hung with its last output
suggesting a recursive call to malloc:
Called malloc() from libmallocintercept...successfully
Called free() on from libmallocintercept...successfully
Called malloc() from libmallocintercept...Called malloc() from libmallocintercept...
Sadly, gdb was unable to provide a meaningful backtrace for this hang.
Next, I replaced the use of fopen()/fgets()/fgets() with
open()/read()/close(). This code no longer reads the config file
line-by-line, but instead it reads all of it at once and then "filters
out" the result (ignores comment lines, removes end of line markers).
I tried to keep the behaviour of the code the same, but I know at least
one difference: Previously, the config file was read line-by-line and
lines that began with a comment character were immediately ignored. The
new code only reads the config once and then removes comment lines.
Since the buffer that is used contains only 256 characters, it is
possible that config files that were previously parsed correctly now
longer parse. A specific example: if a file begins with 500 '#'
characters in its first line and then a timestamp in the second line,
the old code was able to parse this file while the new code would only
see an empty file.
After this change, the new test no longer hangs. Sadly, I do not
actually know its effect on the "actual bug" that I wanted to fix, but
since there are no longer any calls to fclose(), there cannot be any
hangs inside fclose().
Signed-off-by: Uli Schlachter psychon@znc.in
CC @LtdSauce FYI