Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added dynamic-compiler RDP format when script builds incorrectly #3568

Merged
merged 9 commits into from
Dec 30, 2018
Merged

added dynamic-compiler RDP format when script builds incorrectly #3568

merged 9 commits into from
Dec 30, 2018

Conversation

jfoug
Copy link
Collaborator

@jfoug jfoug commented Dec 27, 2018

Here is working version. This simply detects a failed script at linkage time, emits a warning, sets a flag. Then within dynamic, if that flag lists script failure, then a RDP function within the dynamic compiler code is called, on each password to actually perform the work, since the dynamic code would not work.

Note, the dynamic compiler can still be optimized to handle more expressions. But at some point of expression complexity, the dynamic format CAN NOT be used successfully. That is the point where the RDP parsing of the expression will take over and complete the work for the expression.

This work is for #3562

@magnumripper
Copy link
Member

Cool! I will try this out and give feedback, just don't hold your breath cause I've got all sorts of things going on around here!

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 27, 2018

Looks like circle-CI is now working again , yea!

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 27, 2018

It was actually not that bad (other than figuring out how to test the script generated properly). The 'format' itself, is simply a structure to marshal data, and then the already built RDP code within dynamic_compiler. The code looks deceptively a lot like dynamic (where it actually runs the script), but it is very different. But the final running simply boils down to an array of function pointers which are called one after another. The biggest difference, is in this new code, there simply is a stack which has variables pushed onto it, the stack is popped (possibly multiple times) to get the variables, then operation is done, and the results are pushed back on to the stack. in dynamic, I simply have 2 variables. But here, since the stack is involved, it can handle near unlimited count of variables, but at a slower rate, since each gets marshaled into and out of the stack for usage/storage.

@kholia
Copy link
Member

kholia commented Dec 27, 2018

I will test this with some known dynamic expressions which were failing earlier.

This is awesome 👍

@@ -2432,6 +2479,9 @@ char *dynamic_compile_split(char *ct) {
int dynamic_assign_script_to_format(DC_HANDLE H, struct fmt_main *pFmt) {
int i;

/* assume comipler success. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numerous small errors (mostly signed/unsigned char*), but this one also. All fixed, and are in 2nd push.
The new format should handle any failing expressions. It has no optimizations, and is slow due to marshal code at each step, but it is fast enough to get the job done, AND if the expression IS a common expression (i.e. used a lot), then we can see about working them into a real format, whether it be with a dynamic script, or a fat format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo still here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed next time I check things in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 39ad0fe

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 27, 2018

NOTE, even though it is passing the checks, do not merge it yet. There are still issues. I am getting get_key(0) failures on my AVX2 builds. There is some settings that I have to make, so that the SIMD buffers are not used. I have not figured all of that out yet.
Also, I would like to get the 'name' of the algorithm used done properly. Currently it is this:

Testing: dynamic=md5($s.sha1($s).md5($p)) [256/256 AVX2 8x1]... FAILED (get_key(0))

The 256/256 AVX2 8x1 is totally wrong.

But I think both of those issues are trivial, and should easily be fixable from within the dynamic_compiler linkage function (where we determine that the RDP code is needed).

@magnumripper
Copy link
Member

magnumripper commented Dec 27, 2018

BTW I have some faint memory of .pot file issues with this format. We should look at them too if possible.

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 27, 2018

14fd1fb should be much closer to working. The code was failing when MD5_X2 was set (that is usual). This is why it was working like a champ on my VC test system. There, MD5_X2 is not set, so it was cracking all. But with X2 set, the only every other crypt key was being used, and the 2nd usage overwrote the first (not what we want at all).
I also added the algorithm name of [Dynamic RDP] so that we know it is not using SIMD or any other enhancements, and is using the Recursive Decent Parser code to perform the hash logic.

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 27, 2018

Here are examples showing the speed difference. The reason the 2nd one works, is because the compiler optimizes out the sha1($s) into the 'salt' (i.e. it is computed only at load time). But this shows the difference in speed.

NOTE, I still see a bug, where the 2nd one (which actually does a dynamic script), still lists that it will use the RDP. This should be a trivial bug to find.

$ ../run/john -test=3 -form='dynamic=md5(sha1($s.$p).md5($p))'
cmp_one() failed. This format will FAIL and needs the Slower dyna-compiler format
This expression will use the RDP dynamic compiler format.
Benchmarking: dynamic=md5(sha1($s.$p).md5($p)) [Dynamic RDP]... DONE
Many salts:     439907 c/s real, 465553 c/s virtual
Only one salt:  463917 c/s real, 473644 c/s virtual

$ ../run/john -test=3 -form='dynamic=md5(sha1($s).md5($p))'
This expression will use the RDP dynamic compiler format.
Benchmarking: dynamic=md5(sha1($s).md5($p)) [256/256 AVX2 8x3]... DONE
Many salts:     7019K c/s real, 7726K c/s virtual
Only one salt:  2573K c/s real, 3089K c/s virtual

@kholia
Copy link
Member

kholia commented Dec 28, 2018

BTW I have some faint memory of .pot file issues with this format. We should look at them too if possible.

Unprintable characters in pot file after cracking a dynamic_25 hash (#2437) perhaps?

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 28, 2018 via email

@magnumripper
Copy link
Member

magnumripper commented Dec 28, 2018

Here's some strangeness (prior to this PR):

$ echo '900150983cd24fb0d6963f7d28e17f72' > test.in
$ ../run/john test.in -form:dynamic='md5($p)'
Loaded 1 password hash (dynamic=md5($p) [256/256 AVX2 8x3])
Proceeding with single, rules:Wordlist
Press 'q' or Ctrl-C to abort, almost any other key for status
Proceeding with wordlist:../run/password.lst, rules:Wordlist
abc              (?)
1g 0:00:00:00 DONE 2/3 (2018-12-28 16:11) 100.0g/s 336000p/s 336000c/s 336000C/s 123456..Geronimo
Use the "--show --format=dynamic=md5($p)" options to display all of the cracked passwords reliably
Session completed

$ cat ../run/john.pot 
$dynamic_6000$900150983cd24fb0d6963f7d28e17f72:abc

Dynamic 6000? Where did that come from?

$ ../run/john test.in -show
0 password hashes cracked, 2 left

$ ../run/john test.in -show -form:dynamic='md5($p)'
?:abc

1 password hash cracked, 0 left

Copy link
Member

@magnumripper magnumripper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual review

src/dynamic_compiler.c Show resolved Hide resolved
@@ -2432,6 +2479,9 @@ char *dynamic_compile_split(char *ct) {
int dynamic_assign_script_to_format(DC_HANDLE H, struct fmt_main *pFmt) {
int i;

/* assume comipler success. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo still here

@magnumripper
Copy link
Member

Using this branch

$ cat test.in
900150983cd24fb0d6963f7d28e17f72

$ ../run/john test.in -form:'dynamic=md5($p)'
cmp_one() failed. This format will FAIL and needs the Slower dyna-compiler format
This expression will use the RDP dynamic compiler format.
Using default input encoding: UTF-8
Loaded 1 password hash (dynamic=md5($p) [Dynamic RDP])
Warning: no OpenMP support for this hash type, consider --fork=8
Self test failed (cmp_all(1))

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 28, 2018

Dynamic 6000? Where did that come from?

dynamic_6000 is the 'hidden' dynamic format value used in the internal script for dyna compiler. BUT I am not sure why (or how) it made its way into the .pot file. The dynamic=md5($p) string is what 'should' have made it into the .pot file. NOW, this 'is' a format which loads data from the internal 'library', so that may cause different (incorrect) behavior. I will investigate why this is happening and fix it.

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 28, 2018

NOTE, I am adding a new flag, ,rcp If that switch is there, then the format will use the RDP format, even IF the dynamic compiler 'could' build a proper script. This flag will most likely be used mostly by me, as I spend some time getting the RDP code working faster, but you never know. The ,debug flag was added also as a helping debug tool for me, but it sounds like it gets used by others.

I have thought of a few optimizations I could add to the RDP format for some formats. One big thing, is to parse and find common expressions, eliminating wasted hash calls. Those can not be added to the dynamic script, but within the RDP, since we are not limited to just having the 2 variables of dynamic. So something like md5(md5($p).md5($s.$p).md5($p)) we could store off the md5($p) into a 'local' variable, pre-hash the salt, and then simply do:
append_p
md5
store results into t1
append $s
append $p
md5
store results in t2
append t1
append t2
append t3
md5 (and return results)

Now that I have RDP working well, I will work to clean things up, and we can get this new format checked in and then I will start to work on optimizations. I am not 100% sure we need to wait on an optimized version prior to jumbo2. If things get speed improved prior to jumbo2, good, if not, I vote it is not a show stopper.

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 29, 2018 via email

…ded for dyna-lib. expanded buffers in the compiler.
@jfoug
Copy link
Collaborator Author

jfoug commented Dec 29, 2018

Using this branch

$ cat test.in
900150983cd24fb0d6963f7d28e17f72

$ ../run/john test.in -form:'dynamic=md5($p)'
cmp_one() failed. This format will FAIL and needs the Slower dyna-compiler format
This expression will use the RDP dynamic compiler format.
Using default input encoding: UTF-8
Loaded 1 password hash (dynamic=md5($p) [Dynamic RDP])
Warning: no OpenMP support for this hash type, consider --fork=8
Self test failed (cmp_all(1))

This is fixed in latest push. Something changed in core john startup, and common_init() was not being called, but 'only' for the hashes in the dynamic-compiler-lib. The md5($p) is one of them in the compiler lib. Yes, I also found this when adding (and testing) the ,rdp switch. That switch 'forces' usage of the RDP code. So I tried it on some compiler-lib formats, only to find they were failing badly. The problem was the binary() function was returning a null'd out buffer, since the atoi16 stuff was not initialized. Thus the results were never matching a null binary buffer.

@jfoug
Copy link
Collaborator Author

jfoug commented Dec 29, 2018

Fixed the john.pot output of things from the dynamic-compiler-lib (such as md5($p) ) when run on 'bare' hashes. The prepare within dynamic-compiler was not handling raw hashes properly here, because it did not know the raw hash length. I had to add that to the compiler lib structure. They all are 16 bytes now (all are MD5 hashes), but I did not want to force ONLY md5 hashes to be available in the lib. So the length of the hash had to be added to the lib structure. This was not an easy bug to find. The worst thing, is that 'most' of the functionality did work even with the dynamic_6000 signature being used. However, all formats within the lib would have 'appeared' to have cracked the hash when using the -show function, even if that hash was not right. Now, the proper @dynamic=md5($s)@01234...abc:password signature will be output into the john.pot file, so there is no mis-interpretation in the -show command.

@magnumripper
Copy link
Member

I think we can merge this now, right? Remaining issues are either of other scope or can be done later.

@magnumripper
Copy link
Member

This is a terrific improvement! 👍

@magnumripper magnumripper merged commit 8a81ca4 into openwall:bleeding-jumbo Dec 30, 2018
@jfoug
Copy link
Collaborator Author

jfoug commented Dec 30, 2018

Yes, this was a very good place to merge. It is working well. The only change I have done past this, was to get 1 hash to work in the script format (a utf16), but it is still lacking, AND not really a generic fix. There are simply too many 1-offs failing the script, the code emit part gets confused VERY easily.

That is why the RDP format is so important. It is slower, yes, BUT it overcomes the deficiencies of the code emission part of dyna-compiler, and allows it to be fixed/upgraded/replaced over time, without having users failing to be able to complete their work.

@jfoug jfoug deleted the dyna-compiler-work branch December 30, 2018 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants