-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review default incremental charsets provided with John #5220
Comments
That's a reasonable request. The current charsets were made from the Rockyou dataset (with dupes) damn near as-is as that was the best set in the wild at the time. Creating new charsets is easy as pie if we can come up with a good dataset to train from. LinkedIn is a bit newer but may be too old as well. How about the HIBP stuff? Any other suggestions? Also, there's a caveat: If we replace the existing charsets, older jobs can't be resumed. There are ways to handle that though: For example, we can create the new charsets with different names (eg. |
FWIW, the usual "at the start and end respectively" would not satisfy our
That depends on your reason to run it against such system's password hashes. A possible reason is to ensure the policy is in fact being enforced, and to detect cases where it is not. For that, you need not to skip/postpone testing of weaker passwords.
That "top 100k" has a comment saying it corresponds to HIBP (not specifying which version). Those passwords are only moderately different from RockYou, which our charset files were generated from. So perhaps by applying the same filter to RockYou before generating your charset file, you'll achieve similar or better results (can be better because the full of RockYou is longer than 100k). Overall, there's definitely room for improvement here, but I'm not sure exactly what we should do. For my own runs, I am using a charset file generated from (IIRC) 30x RockYou + HIBP v7 (giving higher weight to passwords that are in RockYou). This does perform moderately better than our released files in my testing - but with differences on the order of +10%, not many times like you had for the policy-enforced passwords. I was also thinking of (and experimented with) possibly making improvements to the code in JtR before releasing new charset files, but we don't really have to. |
@solardiz I was actually halfway through writing a response when you posted, so that was good timing. Completely agree that the capital start/number end isn't a good way to make passwords - but it meets the Windows password complexity rules (8 chars, and at least 3 out of uppercase/lowercase/number/symbol/Unicode), and seems to be the most common way that people do that in my experience. I also take your point about the current incremental mode sometimes being useful to identify passwords that aren't following the policies. HIBP is probably the biggest (public) dataset, although it's rather biased towards web applications. And of course there's an inherent bias in it that it comes from sites that have been compromised and generally using weaker hashing algorithms (if they hash at all). But then I suppose that hashes from public-facing web applications with weaker security is probably one of the main things that John is used for. Testing charsets built from HIBP is a little tricky though, because obviously you can't test them against the HIBP hashes, and most existing public dumps are likely to have been incorporated into HIBP already. So I'm not really sure what the best approach would be, other than trying them out on newer dumps as they emerge and seeing if they perform well. Or people testing them privately and providing feedback. The types of passwords in these dumps are so different to the ones I usually see in AD, so testing against those doesn't seem very useful. There might also be significant differences depending on how many hashes you take from HIBP (1 million was just an arbitrary number), and how many of those have been cracked when you make the charset. As a test, I generated a charset from (most of) the top 1 million hashes from HIBP v8, and ran it against the same Active Directory dump as above. The Using I think a lot of that probably comes down to the number of matching passwords: rockyou had ~380k passwords that match the In my case, rockyou actually seems better than my sample from HIBP with But I think the biggest thing is that creating a new incremental mode based in the It might even be nice if that incremental mode was the default for NT hashes (I know that LM has its own default filter) - but that's probably adding a lot of complexity and opening a whole can of worms about the default modes for every hash type, so perhaps better not to go there. And it's easy enough to just tell people to use |
Regarding Windows password policies, maybe you can get some of your client companies to deploy our Yes, out-of-sample testing of charsets generated from HIBP is tricky. However, it is possible to deliberately exclude a small random portion of HIBP from the training set, to use it as a test set. It is also possible e.g. to generate using v7 and test against what's new in v8. In the latter case, however, such testing results would be biased to more complex passwords, which might not match real-world use cases. 1 million from HIBP may perform worse than RockYou simply because it's smaller. In my case, it was ~450M of not-too-difficult-to-crack HIBP v7 mixed with a similar number of repeats from RockYou (also favoring 3+ hits, which is a ~1.1M sub-list). Sure, we could provide a pregenerated |
End of thread on john-users, where I described my experiments last year: https://www.openwall.com/lists/john-users/2021/05/05/3 |
I'm seeing more clients using the Azure AD Password Protection (although I think that's more blacklist based than complexity), but it's certainly one that I'll bear in mind. People tend to be pretty twitchy about installing things on their Domain Controllers though, so might be a tough sell.
I wonder if naming it something like "mixedalnum" would avoid the issue (and be a little more consistent with the current naming)? It also makes it a bit more obvious what it does, because unless you know the external modes then it's not immediately obviously what That thread was very interesting reading, and I hadn't really thought about trying to weight the different sources when generating the charsets (although I suppose you get this automatically when you give john a hash file to generate from). The point about using it in conjunction with wordlists is also something that I'd been thinking about: incremental mode is usually fairly late in my cracking process, so having it generate a load of candidates that I've already tried with wordlists+rules has limited value. Not really sure if much can be done about that, but comparing the post-rockyou performance does seem like a useful metric when evaluating different charsets. |
BTW, something we haven't yet tried is generating from our new
BTW,
"mixedalnum" wouldn't be any clearer to me, but maybe. |
I just tried out charsets based on the new To recap, I've used three main charsets.
Results for the
The first AD dump shows a clear win for the new ( The second AD dump has very few accounts that don't meet the Windows password complexity rules, so almost all of the cracked hashes are in the "lowercase-numbers-symbol" format. The numbers for it are so low that it's probably not a very useful datapoint. And for the
For both datasets, the |
@rbsec Thanks. You can also try generating from more (many million) of HIBP, or e.g. from https://github.com/rarecoil/hashes.org-list |
Maybe it would perform better for you with deduplication. |
Right, so I've gone away and done some more (and slightly more rigorous) testing. Incremental mode was run for 50G candidate (~20 mins) with the following charsets:
Combining anything with the biggest lists (HIBP50, HIBP100M and Hashesorg) didn't achieve much, because those list are so much bigger than what they're combined with that the results are pretty much the same as them on their own. ASCII CharsetsResults for the
None of the Policy CharsetsFor the
The Hashes.org charset performed pretty badly, and the size of the dump made it a bit awkward to deal with (generating the charset took a long time and John got up to about 20GB of RAM doing so). The HIBP 50M was the best here, cracking ~20% more than the It's worth noting that I only had ~70% of the HIBP top 50m hashes cracked, and after the Complex CharsetsI also tried making some "complex" versions of the charsets that were (roughly) aligned to Windows password complexity. Effectively extending the
The results for this were a bit of a mixed bag - some performed better than the ThoughtsThe best results for both the policy and complex charsets was the HIBP 50M, although results may be improved by cracking more of this than the ~70% that I did. The actual sweet spot could be anywhere between the 10M and 100M version that I tried - but I don't think there's much value in trying to finesse this much with only two data points. I think there's a good argument for including some kind of complex/policy charset with John, as it's a significant improvement over the current ones for this kind of use-case. The HIBP 50M Complex set was the best from my testing - but it may be better to try and align to the existing It would be good to have some more testing of the I've attached the HIBP 50M charsets I generated to this post - happy to share any others that people want to do their own testing on. |
These are interesting results, @rbsec! I just realized that our default This also makes me wonder what results you'd achieve by simply locking our default For actual charsets we might generate for distribution with JtR, we should of course exclude the length 8 check from whatever filter we'd use.
A workaround is to combine with many repeats of the smaller list(s). This basically gives higher weights to patterns seen in the smaller lists, but then uses the larger list to provide fallback patterns when a specific combination of preceding characters is not seen in the smaller lists.
I'm not sure this would improve the results - it could also hurt them - but could be worth trying. You can possibly "crack" many more of HIBP by using the hashes.org list as a wordlist.
Yes, it looks so.
Yes, we should probably revise it to make it more than just an example. |
You can try this - the same as your grep? [List.External:PolicyMod]
int mask[0x100];
void init()
{
int c;
mask[0] = 0x100;
c = 1;
while (c < 0x100)
mask[c++] = 4;
c = 'a';
while (c <= 'z')
mask[c++] = 1;
c = 'A';
while (c <= 'Z')
mask[c++] = 2;
}
void filter()
{
int i, seen;
/* This loop ends when we see NUL (sets 0x100) */
i = seen = 0;
while ((seen |= mask[word[i++]]) < 0x100)
continue;
/*
* We should have seen at least one character of each type (which "add up"
* to 7) and then a NUL (adds 0x100).
*/
if (seen != 0x107)
word = 0; // Does not conform to policy
} |
Strangely the version of /*
* We should have seen at least one character of each type (which "add up"
* to 7) and then a NUL (adds 0x100), but not any other characters (would
* add 0x200). The length must at least 6
*/
if (seen != 0x107 || i < 6)
word = 0; // Does not conform to policy
} The git history of the file seems to show that the 8 char limit has been there for years, so not 100% where this came from. Maybe I edited it years ago and forgot. I think limiting to 8 chars is probably bad for a charset, although increasing that minimum length to 8 might be good for the complex one - I'll give it a go and see if it makes any difference. In terms of the filter itself, the one you posted doesn't quite match what I was doing with grep, because I wanted to include symbols as well. I modified my existing policy to require uppercase, lowercase and either a number of an ascii symbol (may not be very efficient): [List.External:Policy]
int mask[0x100];
void init()
{
int c;
mask[0] = 0x100;
c = 1;
while (c < 0x100)
mask[c++] = 0x200;
c = 'a';
while (c <= 'z')
mask[c++] = 1;
c = 'A';
while (c <= 'Z')
mask[c++] = 2;
c = ' ';
while (c <= '@')
mask[c++] = 4;
c = '[';
while (c <= '`')
mask[c++] = 4;
c = '{';
while (c <= '~')
mask[c++] = 4;
}
void filter()
{
int i, seen;
/*
* This loop ends when we see NUL (sets 0x100) or a disallowed character
* (sets 0x200).
*/
i = -1; seen = 0;
while ((seen |= mask[word[++i]]) < 0x100)
continue;
/*
* We should have seen at least one character of each type (which "add up"
* to 7) and then a NUL (adds 0x100), but not any other characters (would
* add 0x200). The length must at least 6
*/
if (seen != 0x107 || i < 6)
word = 0; // Does not conform to policy
} It's not quite matching the Windows policy (which would allow things like
That's a good shout - I'll try that and regenerate them and see if there's much of a difference. |
Using the Hashes.org passwords as a wordlist was pretty effective at cracking more of HIBP, and (on top of the existing pot) cracked between 96% and 99% depending on the size. This gave me the following pots to build charsets from (new Hashes.org ones with the "HO" suffix, previous ones included for comparison):
However, the results were not great. In every case the new charsets performed worse than the previous ones (on two AD hash dumps) - so although cracking this way meant that there were a lot more plaintexts, the charsets generated from them were less effective. ASCII Charsets
Policy CharsetsThis was same same
Complex CharsetsThis was the complex policy above - so required uppercase, lowercase and either numbers of ASCII symbols:
|
I don't recommend that - I think possible length limits belong to usage of a charset, not to its generation.
You probably misread my code. What it does is actually very similar to what yours does, just in a simpler way. The difference is yours rejects passwords with non-ASCII characters in them, whereas mine treats non-ASCII the same as digits or symbols. As to handling of digits and symbols, our filters are the same.
We can write a filter that would match that policy more closely. We can do 3 of 4 for ASCII fairly easily. We can also try 3 of 5 treating any non-ASCII as a 5th category, although that isn't the same as what the Windows policy description says (theirs is far trickier, maybe @magnumripper would want to help there).
I'm not surprised, but this was worth trying. This also means that your previous 70% wasn't necessarily optimal - maybe the threshold is different. We could also try giving greater weight (more repeats) to passwords that were easier to crack. |
These two do: [List.External:Policy3of4]
int mask[0x100];
void init()
{
int c;
mask[0] = 0x100; // NUL
c = 1;
while (c < 0x80)
mask[c++] = 8; // Special (overridden below for alpha-numeric)
while (c < 0x100)
mask[c++] = 0x200; // 8-bit is disallowed
c = 'a';
while (c <= 'z')
mask[c++] = 1; // Lowercase
c = 'A';
while (c <= 'Z')
mask[c++] = 2; // Uppercase
c = '0';
while (c <= '9')
mask[c++] = 4; // Digits
}
void filter()
{
int i, seen, classes;
/*
* This loop ends when we see NUL (sets 0x100) or a disallowed character
* (sets 0x200).
*/
i = seen = classes = 0;
while ((seen |= mask[word[i++]]) < 0x100)
continue;
if (seen < 0x200) { // No disallowed characters
while (seen &= seen - 1) // Count character classes
classes++;
}
/*
* We should have seen at least one character of at least 3 of the 4 allowed
* classes, but not any disallowed characters.
*/
if (classes < 3)
word = 0; // Does not conform to policy
}
[List.External:Policy3of5]
int mask[0x100];
void init()
{
int c;
mask[0] = 0x100; // NUL
c = 1;
while (c < 0x80)
mask[c++] = 8; // Special (overridden below for alpha-numeric)
while (c < 0x100)
mask[c++] = 0x10; // 8-bit
c = 'a';
while (c <= 'z')
mask[c++] = 1; // Lowercase
c = 'A';
while (c <= 'Z')
mask[c++] = 2; // Uppercase
c = '0';
while (c <= '9')
mask[c++] = 4; // Digits
}
void filter()
{
int i, seen, classes;
// This loop ends when we see NUL (sets 0x100)
i = seen = classes = 0;
while ((seen |= mask[word[i++]]) < 0x100)
continue;
while (seen &= seen - 1) // Count character classes
classes++;
// We should have seen at least one character of at least 3 of the 5 classes
if (classes < 3)
word = 0; // Does not conform to policy
} I think we should actually replace the current |
You're right, that's my mistake - apologies. I'll give the As you say, the Windows policy is a lot more complicated, but if we have a fast external that's 99% correct then that's still a very useful thing to have. There may be some value to having a more accurate one, but if it comes at the cost of adding a lot of complexity (and performance?) then it's perhaps not a priority.
I'd agree with this, and that the length restriction can be stripped out (assuming there's not a significant performance hit from doing so). I don't really have a strong opinion of which one is better - because in practical terms they're largely identical for how I'd use them.
I'm sure it's not optimal, especially as the 50m and the 70% are both pretty arbitrary. But I don't think that there's much value in trying lots of variants to get a more optimal one for the two hash dumps I'm testing on, because it would likely not be optimal for other cases. But without a large public dataset of "complex" hashes to test against, I'm not really sure how we can better test it. The other thing I dislike about it is that it's not very reproducible - so the exact charsets generated will vary depending on exactly how that 70% is cracked. But unless new charsets are being frequently made and tested against each other I guess that's not such a problem. |
I agree. When using it for charset generation, we should keep in mind that the resulting incremental mode won't follow the same rules anyway - it will just favor such candidates over others to some extent. BTW, for that reason I am thinking of maybe calling these charset files and incremental modes |
They do say that naming things is one of the hardest problems in computing... I can see the benefit of a name like In some ways something like As an aside, I was curious how close the charsets would be to the externals. From a quick test with the HIBP 50M
So although most candidates do match, there's still a pretty significant number that don't. |
I think the 3of4 and 3of5 as posted are good enough: I could add support for "character classes" in external but it opens up the proverbial can of worms: Are we talking UTF-8 here, or some legacy codepage? Or even worse, some mix of them (which is probably the most common case). Then we're better off just treating any 8-bit as one category. |
The ascii.chr file that's used by default for incremental mode hasn't been change since at least 2013 (apart from an accidental change that was subsequently reverted), and doesn't reflect many of the common patterns that are seen in many passwords on newer systems.
In my experience, the majority of systems now enforce some kind of password complexity rules (for better or worse), and the most common way that users adapt to these rules is usually by sticking a capital letter and a number into their password (usually at the start and end respectively). However, based on a sample of the first 4 million candidates, generated from
ascii.chr
, the vast majority of the generated candidates don't include any uppercase characters:And the majority of the passwords are also less than 8 characters long, which is a common minimum length:
This means that the out-the-box incremental mode is very ineffective on any system that has password complexity.
As a test, I ran the default
ascii.chr
incremental modes for 30 minutes against a recent Active Directory dump containing ~56k unique hashes - it cracked 446 of them.By comparison, generating a charset from just the ~1,100 mixed-alpha-numeric passwords in the NCSC Top 100K Common Passwords cracked 3,560 hashes in the same time. And a charset generated from ~20,000 previously broken NT hashes from other unrelated Active Directory domains cracked 7,001 hashes.
I'm not saying that 1,100 words from a common password list is good way to build a character set - but although crude it seemed to be actually pretty effective.
My usage of John is heavily focused on English-speaking enterprise environments (and often Active Directory), which have different password policies and patterns from other systems, so I appreciate that this is not representative of how other people are using it. Perhaps the current incremental charsets are effective for most users, and the way I use John just makes me a bit of an outlier. And if that's the case, I'm quite happy generating my own.
But I think that it's perhaps worth revisiting the charsets included with John and considering:
ascii.chr
still a good default in 2022?The text was updated successfully, but these errors were encountered: