Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load massive databases #1118

Open
SkewedZeppelin opened this issue Dec 17, 2023 · 3 comments
Open

Unable to load massive databases #1118

SkewedZeppelin opened this issue Dec 17, 2023 · 3 comments
Labels

Comments

@SkewedZeppelin
Copy link

SkewedZeppelin commented Dec 17, 2023

Describe the bug

Program aborts with "Can't allocate memory" when attempting to load ~45 million signatures.

How to reproduce the problem

  • Give clamscan a big database
  • It aborts
  • System has approx 40GB+ free RAM during and after the abort

Additional

Running clamav-1.0.4-1.fc39
I checked the manpage and conf to see if this was a limit that could be changed but didn't see anything particular.
It appears to be an limit of the actual allocator itself: https://github.com/Cisco-Talos/clamav/blob/main/libclamav/mpool.c#L363

clamscan -i -r -d production-clam
LibClamAV Error: mpool_malloc(): Attempt to allocate 134217728 bytes. Please report to https://github.com/Cisco-Talos/clamav/issues
LibClamAV Error: hm_addhash_bin: failed to grow hash array to 8388608 entries
LibClamAV Error: cli_loadhash: Malformed hash string at line 4317848
LibClamAV Error: cli_loadhash: Problem parsing database at line 4317848
LibClamAV Error: Can't load production-clam/malware-virusshare.hdb: Can't allocate memory
LibClamAV Error: cli_loaddbdir: error loading database production-clam/malware-virusshare.hdb
ERROR: Can't allocate memory

----------- SCAN SUMMARY -----------
Known viruses: 4824845
Engine version: 1.0.4
Scanned directories: 0
Scanned files: 0
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 14.121 sec (0 m 14 s)
Start Date: 2023:12:17 02:46:00
End Date:   2023:12:17 02:46:14

I also checked line 4317848 to ensure it is actually correct, I suspect that message is just collateral from the previous failure.
When I run it with just one database I get, which likely confirms that:

LibClamAV Error: hm_addhash_bin: failed to grow hash array to 8388608 entries
LibClamAV Error: cli_loadhash: Malformed hash string at line 8388608
LibClamAV Error: cli_loadhash: Problem parsing database at line 8388608

Script

I used this script to convert the VirusShare hashes to .hdb format

#CC0
import sys

database = open(sys.argv[1], "r");
for line in database:
	if not line.startswith("#"):
		print(line.rstrip('\n') + ':*:-:73')

The resulting files:

   4084786 malware-combined.hsb
    740059 malware-malwarebazaar.hsb
  41527270 malware-virusshare.hdb
  46352115 total

edit: searching again #522 and #537 look similar, but different, feel free to close if it really is a dupe of one of those. thanks.

@HydraDragonAntivirus
Copy link

HydraDragonAntivirus commented Dec 17, 2023

It happens me too when I try to run massive YARA rules. You can try remove duplicates. Also I have bigger database. I fixed this by avoiding using ClamAV.

@micahsnyder
Copy link
Contributor

It's a limit that is not currently configurable, and would be difficult to change (I've started working on that once before)

The limit is intended to protect against excessively large allocations triggered by scanning untrusted files. But the same limits are enforced on allocations used for loading signatures.

I feel we could remove the limit for database allocations.

@micahsnyder
Copy link
Contributor

The Yara part of this issue may be fixed by #1137

Specifically, by this change:
https://github.com/Cisco-Talos/clamav/pull/1137/files#diff-93e5f6b6685c1f9e2b5883e3700b29d9e7fce6a989cc9cc954ef58111e57bc30R455-R457

As for the original issue with this message:

mpool_malloc(): Attempt to allocate 134217728 bytes. Please report to
... I'm not sure. That probably need a change here to extend this array:

134217728,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants