Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle memory allocation failures gracefully #155

Merged
merged 22 commits into from
Dec 6, 2015

Conversation

pesco
Copy link

@pesco pesco commented Oct 30, 2015

I'm opening this as an early request for comments.

Most code in hammer is written as if memory allocations could never fail. In particular, the result of h_arena_malloc isn't checked anywhere. This only works if the OS will always overcommit (e.g. Linux sysctl vm.overcommit_memory=1) and there are no resource limits in place (e.g. ulimit -v).

Issue #133 reported basically the same issue but focused on the system_allocator. PR #136 purports to fix it but ignores the CF backends and adds more potential NULL pointers to arena allocator code. It did add some longjmp-based "exception" handling to RVM compilation, though.

I propose to extend the longjmp approach to arena allocations such that h_arena_malloc never returns on failure but either performs a longjmp or exits. A longjmp target is set up and attached to the HArena data structure in a top-level function and low-level code can stay as it is. I've implemented this for the regex backend to demonstrate.

In addition, unchecked call sites of h_new, mm__->alloc, h_new_arena and thelike have to be identified and NULL-handling added where appropriate.

@pesco
Copy link
Author

pesco commented Oct 31, 2015

Some Travis builds fail on the following warning:

$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
[...]
gcc -o build/opt/src/backends/regex.o -c -std=gnu99 -Wall -Wextra -Werror -Wno-unused-parameter -Wno-attributes -Wno-unused-variable -O3 build/opt/src/backends/regex.c
build/opt/src/backends/regex.c: In function 'h_rvm_run__m':
build/opt/src/backends/regex.c:55:12: error: variable 'heads_n' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]
build/opt/src/backends/regex.c:56:6: error: variable 'heads_p' might be clobbered by 'longjmp' or 'vfork' [-Werror=clobbered]

This does not appear on my workstation with GCC 5.2.1 and it only seems to happen when the BINDINGS environment variable is set. My guess is that something in the bindings pre-install process (where some apt-get install calls happen) silently upgrades or somehow changes GCC.

The cause of the warning (which I can't reproduce) seems to be that the variables heads_n and heads_p get swapped in one code path. So I don't think it actually constitutes a problem (swapping is not really clobbering).

PS: The C++ binding seems to use GCC even on the Clang build variant; is that supposed to happen?

@pesco
Copy link
Author

pesco commented Nov 14, 2015

My guess is that something in the bindings pre-install process (where some apt-get install calls happen) silently upgrades or somehow changes GCC.

No, it happens with gcc -O3. For some reason, the travis builds with bindings are optimized while BINDINGS=none makes a debug build. The problem doesn't appear with clang -O3.

I reproduced the issue with GCC 4.2 on OpenBSD. It doesn't appear with GCC 5.2 on Linux.

@abiggerhammer
Copy link
Member

Oh hey, it's building cleanly now. Are we done here, or is there more still to be done with the CF backends?

@pesco
Copy link
Author

pesco commented Nov 16, 2015

There is still work to do; the CF backends but especially Packrat (we need that sorted for DNP3), and lastly the NULL checks on any allocs that are not arena allocs.

I'm looking for an all-clear to go on with this relatively invasive change. I did it for regex at first precisely because that's the code I've never touched before (=> maximum invasion).

@abiggerhammer
Copy link
Member

Full speed ahead!

Rationale: If memory allocation fails in the inner parse and we
longjump up the stack, the temporary arena will be missed and leak.

NB: This change means that any allocations done by the continuation
(in the form of new parsers, probably) will persist for the
lifetime of the parse result. Beware of wasting too much memory
this way! The bind continuation should generally keep dynamic
allocations to a minimum.
…allocation

Rationale: "Basic allocation" refers to things outside of parsing proper,
mostly initialization. If such allocations fail, the system is globally
emory-starved from which it will likely not recover by returning failure.
In this case, terminating the process is in fact the most robust strategy as
it may mean the difference between a permanent hang and a temporary crash.
@pesco
Copy link
Author

pesco commented Nov 30, 2015

So, longjmps in the other backends are still TODO but I have switched all non-arena allocation to the crash-and-burn strategy of error handling. Here is what I wrote in the commit message:

add h_alloc() which calls errx() on failure and use it for all basic allocation

Rationale: "Basic allocation" refers to things outside of parsing proper,
mostly initialization. If such allocations fail, the system is globally
memory-starved from which it will likely not recover by returning failure.
In this case, terminating the process is in fact the most robust strategy as
it may mean the difference between a permanent hang and a temporary crash.

So the idea is that parsers, anything else that gets built with h_new, initial creation of arenas with h_new_arena, etc. terminate the process on failure, while arena-based allocations during the parse (using h_arena_malloc) make h_parse return NULL.

NB: We characterized the latter during our last conference call as the user getting to dynamically discover the bounds of his input language as dictated by memory.

@pesco
Copy link
Author

pesco commented Nov 30, 2015

The above commits should make h_parse return NULL for all backends when it runs out of memory during the parse. I haven't gotten around to actually triggering it, yet, though, because I fail at ulimit or something.

The iterative variants are still TODO as they are not quite as trivial (but should be easy enough).

@pesco
Copy link
Author

pesco commented Dec 2, 2015

All done, request merge.

Did the test properly this time around (with a mock allocator that selectively returns NULL).
I'd still appreciate a comment on the rationale for 3fc56a0 which I quoted above.

@@ -530,8 +537,6 @@ static HCountedArray *llk_parse_chunk_(HLLkState *s, const HParser* parser,
return seq;

no_parse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to clear the exception handler here; that way we can maintain the invariant that returning to the exception handler is always valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. See d28f182.

@thequux
Copy link
Member

thequux commented Dec 6, 2015

LGTM

thequux pushed a commit that referenced this pull request Dec 6, 2015
Handle memory allocation failures gracefully
@thequux thequux merged commit 41b890c into UpstandingHackers:master Dec 6, 2015
@pesco pesco deleted the fix-alloc-failures branch December 7, 2015 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants