Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify/speedup "--show" and "--make-charset" #4430

Open
solardiz opened this issue Nov 1, 2020 · 4 comments
Open

Simplify/speedup "--show" and "--make-charset" #4430

solardiz opened this issue Nov 1, 2020 · 4 comments
Labels

Comments

@solardiz
Copy link
Member

solardiz commented Nov 1, 2020

As I pointed out in #4388 (comment), jumbo's usage of split() on --show is much heavier than core's, which is both good (allowing for #4428) and bad (slow). #4429 is a start at optimizing it, however we can do more.

Perhaps most importantly, we can optimize the scanning of the formats list in ldr_show_pot_line() to cache and probe the previous line's format's valid() first.

Also, I think right after merging #4429 (or adding to it) we can also make ldr_cracked_hash() case-sensitive.

Further, why do we(?) even bother to build the db->cracked_hash table when invoked with --make-charset? We should probably skip that.

Finally, given that #4429 obsoletes FMT_SPLIT_UNIFIES_CASE, should we drop that flag from everywhere, keep it around in case we need it again later, or rename/repurpose it (e.g., to just FMT_SPLIT_UNIFIES, so that we'd set it in more formats yet be able to skip split() in ldr_show_pot_line() for most formats)?

@solardiz
Copy link
Member Author

solardiz commented Nov 2, 2020

Copying another observation of mine from #4388 to here:

Another detail I notice is that jumbo would skip pot lines which no format recognizes. In core, basic string matching works even for unrecognized formats, except when running with explicit "--format". Perhaps this difference is unintentional (looks like it got introduced in 0ceedb2 without being documented), and is easy to fix.

which magnum confirmed there:

Yes, definitely unintentional.

So I think should track it here, and fix it, and test the fix.

@solardiz
Copy link
Member Author

solardiz commented Nov 2, 2020

rename/repurpose it (e.g., to just FMT_SPLIT_UNIFIES, so that we'd set it in more formats yet be able to skip split() in ldr_show_pot_line() for most formats)

I now realize that we'd first incur the overhead of valid() anyway, and then it's just a matter of checking the flag vs. calling fmt_default_split() in the typical case, so we won't achieve much speedup by having the flag except for pot lines with formats that have non-trivial split() yet don't set the flag (because their non-trivial split() actually splits things, and does not unify). Do we even have any formats like this? We have actual splitting in descrypt and LM, but in both of those we also unify, so we'd set the flag, so they are not examples where the flag would be beneficial. Any others? If not, or if this is indeed very rare, then maybe the flag isn't worth having.

@magnumripper
Copy link
Member

magnumripper commented Nov 2, 2020

Oh, right. And the real enemy for performance is prepare() although I think I fixed the very worst ones. There's really no good way of avoiding that function.

However, I just noticed a thing. We have code like this in loader:

			prepared = (*format)->methods.prepare(fields, *format);
		if (prepared)
			valid = (*format)->methods.valid(prepared, *format);
		else
			valid = 0;

This instance could be changed:

			prepared = alt->methods.prepare(fields, alt);
- 			if (alt->methods.valid(prepared, alt)) {
+ 			if (prepared && alt->methods.valid(prepared, alt)) {

Third place:

		prepared = alt->methods.prepare(fields, alt);
		if (!prepared)
			continue;

But our typical prepare() looks like this (here from nt2_fmt_plug.c):

static char *prepare(char *split_fields[10], struct fmt_main *self)
{
	static char out[33 + TAG_LENGTH + 1];

	if (!valid(split_fields[1], self)) {
		if (split_fields[3] && strlen(split_fields[3]) == 32) {
			sprintf(out, "%s%s", FORMAT_TAG, split_fields[3]);
			if (valid(out, self))
				return out;
		}
	}
	return split_fields[1];
}

It will never return NULL! I'm pretty sure that's a very typical implementation... and this means

  • A line that was good in the first place will call valid() twice (one from prepare() and one explicitly from loader).
  • A line that become good after prepare() will call valid() three times (twice from prepare() and one explicitly from loader).
  • A line that simply isn't this format will also call valid() three times (twice from prepare() and one explicitly from loader).

Thinking millions of hashes, this is a waste. We could at least improve this function so we return NULL when we already know it's not by any stretch a hash for this format.

 static char *prepare(char *split_fields[10], struct fmt_main *self)
 {
 	static char out[33 + TAG_LENGTH + 1];
 
 	if (!valid(split_fields[1], self)) {
 		if (split_fields[3] && strlen(split_fields[3]) == 32) {
 			sprintf(out, "%s%s", FORMAT_TAG, split_fields[3]);
 			if (valid(out, self))
 				return out;
+ 			else
+ 				return NULL;
 		}
 	}
 	return split_fields[1];
 }

@solardiz
Copy link
Member Author

Perhaps most importantly, we can optimize the scanning of the formats list in ldr_show_pot_line() to cache and probe the previous line's format's valid() first.

Implemented in #4726.

Further, why do we(?) even bother to build the db->cracked_hash table when invoked with --make-charset? We should probably skip that.

Looks like we already skip that due to the return; here:

		if (db->options->flags & DB_PLAINTEXTS) {
			list_add(db->plaintexts, line);
			return;
		}

Another detail I notice is that jumbo would skip pot lines which no format recognizes. In core, basic string matching works even for unrecognized formats, except when running with explicit "--format". Perhaps this difference is unintentional (looks like it got introduced in 0ceedb2 without being documented), and is easy to fix.

which magnum confirmed there:

Yes, definitely unintentional.

So I think should track it here, and fix it, and test the fix.

This is fixed with #4726.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants