Simplify/speedup "--show" and "--make-charset" #4430

solardiz · 2020-11-01T19:27:15Z

As I pointed out in #4388 (comment), jumbo's usage of split() on --show is much heavier than core's, which is both good (allowing for #4428) and bad (slow). #4429 is a start at optimizing it, however we can do more.

Perhaps most importantly, we can optimize the scanning of the formats list in ldr_show_pot_line() to cache and probe the previous line's format's valid() first.

Also, I think right after merging #4429 (or adding to it) we can also make ldr_cracked_hash() case-sensitive.

Further, why do we(?) even bother to build the db->cracked_hash table when invoked with --make-charset? We should probably skip that.

Finally, given that #4429 obsoletes FMT_SPLIT_UNIFIES_CASE, should we drop that flag from everywhere, keep it around in case we need it again later, or rename/repurpose it (e.g., to just FMT_SPLIT_UNIFIES, so that we'd set it in more formats yet be able to skip split() in ldr_show_pot_line() for most formats)?

The text was updated successfully, but these errors were encountered:

solardiz · 2020-11-02T19:42:11Z

Copying another observation of mine from #4388 to here:

Another detail I notice is that jumbo would skip pot lines which no format recognizes. In core, basic string matching works even for unrecognized formats, except when running with explicit "--format". Perhaps this difference is unintentional (looks like it got introduced in 0ceedb2 without being documented), and is easy to fix.

which magnum confirmed there:

Yes, definitely unintentional.

So I think should track it here, and fix it, and test the fix.

solardiz · 2020-11-02T19:47:34Z

rename/repurpose it (e.g., to just FMT_SPLIT_UNIFIES, so that we'd set it in more formats yet be able to skip split() in ldr_show_pot_line() for most formats)

I now realize that we'd first incur the overhead of valid() anyway, and then it's just a matter of checking the flag vs. calling fmt_default_split() in the typical case, so we won't achieve much speedup by having the flag except for pot lines with formats that have non-trivial split() yet don't set the flag (because their non-trivial split() actually splits things, and does not unify). Do we even have any formats like this? We have actual splitting in descrypt and LM, but in both of those we also unify, so we'd set the flag, so they are not examples where the flag would be beneficial. Any others? If not, or if this is indeed very rare, then maybe the flag isn't worth having.

magnumripper · 2020-11-02T20:55:52Z

Oh, right. And the real enemy for performance is prepare() although I think I fixed the very worst ones. There's really no good way of avoiding that function.

However, I just noticed a thing. We have code like this in loader:

			prepared = (*format)->methods.prepare(fields, *format);
		if (prepared)
			valid = (*format)->methods.valid(prepared, *format);
		else
			valid = 0;

This instance could be changed:

			prepared = alt->methods.prepare(fields, alt);
- 			if (alt->methods.valid(prepared, alt)) {
+ 			if (prepared && alt->methods.valid(prepared, alt)) {

Third place:

		prepared = alt->methods.prepare(fields, alt);
		if (!prepared)
			continue;

But our typical prepare() looks like this (here from nt2_fmt_plug.c):

static char *prepare(char *split_fields[10], struct fmt_main *self)
{
	static char out[33 + TAG_LENGTH + 1];

	if (!valid(split_fields[1], self)) {
		if (split_fields[3] && strlen(split_fields[3]) == 32) {
			sprintf(out, "%s%s", FORMAT_TAG, split_fields[3]);
			if (valid(out, self))
				return out;
		}
	}
	return split_fields[1];
}

It will never return NULL! I'm pretty sure that's a very typical implementation... and this means

A line that was good in the first place will call valid() twice (one from prepare() and one explicitly from loader).
A line that become good after prepare() will call valid() three times (twice from prepare() and one explicitly from loader).
A line that simply isn't this format will also call valid() three times (twice from prepare() and one explicitly from loader).

Thinking millions of hashes, this is a waste. We could at least improve this function so we return NULL when we already know it's not by any stretch a hash for this format.

 static char *prepare(char *split_fields[10], struct fmt_main *self)
 {
 	static char out[33 + TAG_LENGTH + 1];
 
 	if (!valid(split_fields[1], self)) {
 		if (split_fields[3] && strlen(split_fields[3]) == 32) {
 			sprintf(out, "%s%s", FORMAT_TAG, split_fields[3]);
 			if (valid(out, self))
 				return out;
+ 			else
+ 				return NULL;
 		}
 	}
 	return split_fields[1];
 }

solardiz · 2021-05-12T11:29:50Z

Perhaps most importantly, we can optimize the scanning of the formats list in ldr_show_pot_line() to cache and probe the previous line's format's valid() first.

Implemented in #4726.

Further, why do we(?) even bother to build the db->cracked_hash table when invoked with --make-charset? We should probably skip that.

Looks like we already skip that due to the return; here:

		if (db->options->flags & DB_PLAINTEXTS) {
			list_add(db->plaintexts, line);
			return;
		}

Another detail I notice is that jumbo would skip pot lines which no format recognizes. In core, basic string matching works even for unrecognized formats, except when running with explicit "--format". Perhaps this difference is unintentional (looks like it got introduced in 0ceedb2 without being documented), and is easy to fix.

which magnum confirmed there:

Yes, definitely unintentional.

So I think should track it here, and fix it, and test the fix.

This is fixed with #4726.

solardiz added maintenance/cleanup RFC / discussion Help or comments wanted labels Nov 1, 2020

solardiz added this to the Potentially 1.9.0-jumbo-2 material milestone Nov 1, 2020

magnumripper mentioned this issue Nov 3, 2020

Review all formats' prepare functions #4436

Open

solardiz mentioned this issue May 11, 2021

Bugfix for --make-charset with --format #4726

Merged

solardiz mentioned this issue Apr 20, 2024

Broken lower-case mssql hash validation #5461

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify/speedup "--show" and "--make-charset" #4430

Simplify/speedup "--show" and "--make-charset" #4430

solardiz commented Nov 1, 2020

solardiz commented Nov 2, 2020

solardiz commented Nov 2, 2020

magnumripper commented Nov 2, 2020 •

edited

Loading

solardiz commented May 12, 2021

Simplify/speedup "--show" and "--make-charset" #4430

Simplify/speedup "--show" and "--make-charset" #4430

Comments

solardiz commented Nov 1, 2020

solardiz commented Nov 2, 2020

solardiz commented Nov 2, 2020

magnumripper commented Nov 2, 2020 • edited Loading

solardiz commented May 12, 2021

magnumripper commented Nov 2, 2020 •

edited

Loading