-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137
Merged
micahsnyder
merged 12 commits into
Cisco-Talos:main
from
micahsnyder:remove-unnecessary-cli_-allocations
Mar 15, 2024
Merged
Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137
micahsnyder
merged 12 commits into
Cisco-Talos:main
from
micahsnyder:remove-unnecessary-cli_-allocations
Mar 15, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
micahsnyder
force-pushed
the
remove-unnecessary-cli_-allocations
branch
from
January 10, 2024 15:51
c4eac44
to
81449c5
Compare
micahsnyder
force-pushed
the
remove-unnecessary-cli_-allocations
branch
from
February 28, 2024 21:02
81449c5
to
1d09996
Compare
ragusaa
suggested changes
Mar 7, 2024
ragusaa
approved these changes
Mar 12, 2024
There are a large number of allocations for fix sized buffers using the `cli_malloc` and `cli_calloc` calls that check if the requested size is larger than our allocation threshold for allocations based on untrusted input. These allocations will *always* be higher than the threshold, so the extra stack frame and check for these calls is a waste of CPU. This commit replaces needless calls with A -> B: - cli_malloc -> malloc - cli_calloc -> calloc - CLI_MALLOC -> MALLOC - CLI_CALLOC -> CALLOC I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by the max-allocation threshold, when MMAP is found/enabled. But the alternative was set to cli_malloc / cli_calloc when disabled. I changed those as well. I didn't change the cli_realloc/2 calls because our version of realloc not only implements a threshold but also stabilizes the undefined behavior in realloc to protect against accidental double-free's. It may be worth implementing a cli_realloc that doesn't have the threshold built-in, however, so as to allow reallocaitons for things like buffers for loading signatures, which aren't subject to the same concern as allocations for scanning possible malware. There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC, because it appears to be allocating based on untrusted input.
We have some special functions to wrap malloc, calloc, and realloc to make sure we don't allocate more than some limit, similar to the max-filesize and max-scansize limits. Our wrappers are really only needed when allocating memory for scans based on untrusted user input, where a scan file could have bytes that claim you need to allocate some ridiculous amount of memory. Right now they're named: - cli_malloc - cli_calloc - cli_realloc - cli_realloc2 ... and these names do not convey their purpose This commit renames them to: - cli_max_malloc - cli_max_calloc - cli_max_realloc - cli_max_realloc2 The realloc ones also have an additional feature in that they will not free your pointer if you try to realloc to 0 bytes. Freeing the memory is undefined by the C spec, and only done with some realloc implementations, so this stabilizes on the behavior of not doing that, which should prevent accidental double-free's. So for the case where you may want to realloc and do not need to have a maximum, this commit adds the following functions: - cli_safer_realloc - cli_safer_realloc2 These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when MPOOL is disabled (e.g. because mmap-support is not found), so as to match the behavior in the mpool_realloc/2 functions that do not make use of the allocation-limit.
Should be safe to allocate a smaller string than one we already have.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions provide a way to protect against allocating too much memory when the size of the allocation is derived from the untrusted input. Specifically, we worry about values in the file being scanned being manipulated to exhaust the RAM and crash the application. There is no need to check the limits if the size of the allocation is fixed, or if the size of the allocation is necessary for signature loading, or the general operation of the applications. E.g. checking the max-allocation limit for the size of a hash, or for the size of the scan recursion stack, is a complete waste of time. Although we significantly increased the max-allocation limit in a recent release, it is best not to check an allocation if the allocation will be safe. It would be a waste of time. I am also hopeful that if we can reduce the number allocations that require a limit-check to those that require it for the safe scan of a file, then eventually we can store the limit in the scan- context, and make it configurable.
Some sort of code merge way-back-when resulted in two identical max-allocation checks. I removed the noisy ones.
A code merge resulted in a duplicate copy of the CLI_STRDUP macro. Also fixed formatting.
Bytecode signature's are able to allocate buffers, but should probably adhere to clamav's max allocation limit. This adds a check to make sure they don't accidentally alloc too much based on untrusted user input.
Allocations for bytecode signatures to work need not check against the memory allocation limit, as bytecode signatures are considered trusted user input. You may note that I did not remove allocation limits from the bytecode API functions that may be called by the signatures such as adding json objects, hashsets, lzma and bz2 decompressors, etc. This is because it is likely that a bytecode signature may call them more times based on the structure of the file being scanned - particularly for the json objects.
Variables like the number of signature parts are considered trusted user input and so allocations based on those values need not check the memory allocation limit. Specifically for the allocation of the normalized buffer in cli_scanscript, I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen comes from the signature load and is therefore also trusted, so we do not need to check the allocation limit.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the allocation fails. This makes it obvious what is different about the macro versus the equivalent function, and that error handling is built-in. Renamed the cli_strdup to safer_strdup to make it obvious that it exists because it is safer than regular strdup. Regular strdup doesn't have the NULL check before trying to dup, and so may result in a NULL-deref crash. Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the NULL-check is preferred.
micahsnyder
force-pushed
the
remove-unnecessary-cli_-allocations
branch
from
March 15, 2024 14:39
1d09996
to
b3c9a56
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goal of the ClamAV's functions like "cli_malloc" and "cli_realloc" are to check that the size of the allocation is not unreasonable. But we only need to do that check when the size is derived from untrusted user input (aka may vary based on values in the possibly malicious file that we're scanning).
There are many calls to allocate memory where the size of the allocation is either known at compile time or else is derived from trusted user input. Examples of variable but trusted allocation sizes include:
This PR seeks to eliminate those unnecessary max-allocation checks. This is mostly because checking the limits in these cases is a waste of CPU. But it also sets groundwork for a future improvement where we may pass in the memory allocation limit as a parameter, and make it generally configurable for the scanning service. This would be meaningful for anyone wishing to crank up the max-filesize to scan large archives that decompress to very large amounts of data.
Further, this PR renames the max-allocation functions and related functions and macros so it's obvious what they do. I suspect that the reason the likes of "cli_calloc" and "cli_realloc" were so widely used for fixed size allocations is that the developers knew we had a special wrapper for allocating memory and didn't know when they should be used.
What's also drawing attention to is the functions that wrap allocations but do NOT check the allocation-limit. These are
cli_strdup
,cli_realloc
, andcli_realloc2
. These functions are safer than the originalstrdup
andrealloc
functions from the C "standard library".strdup
, our wrapper adds a NULL check to prevent accidental NULL-deref crashes.realloc
, the "standard library" one is not standard. Some variations may free the pointer if size == 0. Some do not. Ours wrapper does not. In this way, we know the behavior will be consistent. Our version also has two variants. The first,cli_realloc
does not free the pointer on failure. The second,cli_realloc2
, does free the pointer on failure. That second behavior is generally less desirable because you must remember to set the pointer to NULL beforegoto done;
or else you risk a double-free. I have not removed the latter variant simply because it's a lot of work, so there will still be two variants. In this PR, I improve the documentation and I rename those two so you can easily tell the difference.Also, for both the
strdup
andrealloc
wrappers, we now have a "cli_safer_" prefix on them when they do not check the the allocation limit. When we do check the allocation limit, we instead use a "cli_max_" prefix.Finally, for the macros which wrap the allocation-related functions, I've added a "_OR_GOTO_DONE" suffix to the macro names. This makes it clear what is special about the macros, and that error handling is built int. If the allocation fails, then we
goto done;
.In summary, the functions and macros are now: