Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--mem-file-size should possibly apply per-invocation or per-file, not per-process #4907

Open
solardiz opened this issue Dec 2, 2021 · 3 comments
Labels
enhancement RFC / discussion Help or comments wanted

Comments

@solardiz
Copy link
Member

solardiz commented Dec 2, 2021

When running with a huge wordlist and a high --fork count, and with otherwise default settings, startup takes ages and a lot of RAM is consumed. This appears to be because our default for --mem-file-size on 64-bit is 2 GB and we apply this limit per-process. For example, with a 60 GB wordlist and 32 forks, the entire wordlist would currently be read into RAM on startup, right? Worse, on a HDD the 32 processes would compete for disk seeks back and forth, right?

To remedy this, I think the limit should apply per-invocation or per-file, not per-process. For example, if we split a job across several machines with --node, it could be fine for each machine to have its own full limit on loading its portion of wordlist into memory. However, we shouldn't want a full limit per each virtual node within each physical machine's node numbers' range created there with --fork. If that's unreasonably complicated to implement and/or document, which I think it may be, then perhaps we should simply apply the limit per-file.

@solardiz
Copy link
Member Author

solardiz commented Dec 2, 2021

Also, at least until we fix this issue properly, I think we should lower the default limit to, say, 150 MB (just enough for RockYou). I've just set this issue's milestone at least for this one change.

@magnumripper
Copy link
Member

magnumripper commented Dec 3, 2021

When running with a huge wordlist and a high --fork count, and with otherwise default settings, startup takes ages and a lot of RAM is consumed. This appears to be because our default for --mem-file-size on 64-bit is 2 GB and we apply this limit per-process. For example, with a 60 GB wordlist and 32 forks, the entire wordlist would currently be read into RAM on startup, right? Worse, on a HDD the 32 processes would compete for disk seeks back and forth, right?

Perhaps we could add some kind of wordlist_init() (or more generally mode_init()) called before forking, where things like that would happen.

Anyways, most of Jumbo's wordlist.c is crappy, we should re-write it almost from scratch and while at it we might want to completely separate "loopback mode" from wordlist mode, from a code perspective. Many of the problems are caused by organic growth:
First, we got the memory caching of the whole wordlist, with pointers to each word. It was a great speedup.
Only later we got fork and stuff.
Then, we got MPI making a mess of that for MPI builds.
Finally we got mmap. This is great for MPI and fork because it shares the mmap among nodes on the same host, but IIRC we kept some memory caching of pointers-to-words for some minor last boost.

Perhaps we should never bother with that pointer-to-word-caching at all. When we run mmap'ed, the boost should be neglible compared to the bugginess of that code - and when we don't have mmap at all (is that really happening anywhere?) we might be in a rare enough environment we should just revert to the good old "john proper" code.

solardiz added a commit to solardiz/john that referenced this issue Jan 11, 2022
With our current implementation, this limit applies per-process, which
inflated the 2 GiB to unreasonably high values when high "--fork" counts
were used.

Works around openwall#4907
solardiz added a commit that referenced this issue Jan 11, 2022
With our current implementation, this limit applies per-process, which
inflated the 2 GiB to unreasonably high values when high "--fork" counts
were used.

Works around #4907
@solardiz solardiz removed this from the Definitely 1.9.0-jumbo-2 milestone Jan 11, 2022
@solardiz
Copy link
Member Author

Removed milestone now that this issue is worked around by lowering the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement RFC / discussion Help or comments wanted
Projects
None yet
Development

No branches or pull requests

2 participants