Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We have a long-running program that we want to run under sanitizers for extra error checking and under faketime to speed up the clock. This program hangs after a while. Backtraces suggest that the hangs occur because of recursion in the memory allocator, which apparently locks a non-recursive mutex. Specifically, what we see is that due to our use of FAKETIME_NO_CACHE=1, libfaketime wants to reload the config file inside a (random) call to clock_gettime(). libfaketime then uses fopen() / fclose() for reading the config files. These function allocate / free a buffer for reading data and specifically the call to free() that happens inside fclose() ends up calling clock_gettime() again. At this point, libfaketime locks up because it has a time_mutex that is locked and none-recursive. Sadly, I did not manage to come up with a stand-alone reproducer for this problem. Also, the above description is from memory after half a week of vacation, so it might be (partially) wrong. More information can be found here: - wolfcw#365 (comment) - wolfcw#365 (comment) This commit first adds a test case. This new test uses the already existing libmallocintercept.so to cause calls to clock_gettime() during memory allocation routines. The difference to the already existing version is that additionally FAKETIME_NO_CACHE and FAKETIME_TIMESTAMP_FILE are set. This new test hung with its last output suggesting a recursive call to malloc: Called malloc() from libmallocintercept...successfully Called free() on from libmallocintercept...successfully Called malloc() from libmallocintercept...Called malloc() from libmallocintercept... Sadly, gdb was unable to provide a meaningful backtrace for this hang. Next, I replaced the use of fopen()/fgets()/fgets() with open()/read()/close(). This code no longer reads the config file line-by-line, but instead it reads all of it at once and then "filters out" the result (ignores comment lines, removes end of line markers). I tried to keep the behaviour of the code the same, but I know at least one difference: Previously, the config file was read line-by-line and lines that began with a comment character were immediately ignored. The new code only reads the config once and then removes comment lines. Since the buffer that is used contains only 256 characters, it is possible that config files that were previously parsed correctly now longer parse. A specific example: if a file begins with 500 '#' characters in its first line and then a timestamp in the second line, the old code was able to parse this file while the new code would only see an empty file. After this change, the new test no longer hangs. Sadly, I do not actually know its effect on the "actual bug" that I wanted to fix, but since there are no longer any calls to fclose(), there cannot be any hangs inside fclose(). Signed-off-by: Uli Schlachter <psychon@znc.in>
- Loading branch information