-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with dynamic format #2800
Comments
Not a solution yet, but: If I add a constant into that expression, the self test passes:
And now with debug output:
Same for the format which fails:
May be someone more familiar with writing dynamic formats dynamic_nnnn can see where the problem is, and work around it by defining a new format dynamic_9001 or similar. BTW: |
Tried to convert this into a regular dynamic format and turn most test lines into comments, but format still fails.
|
@frank-dittrich There is a small typo in your dynamic script. The Here is the full working version,
Results,
|
@kholia actually, it is not my dynamic script, it is more or less Jim's debugging output, converted into a format, see the May be you can even figure out and fix the bug in the dynamic compiler? @jeroen-80 thanks to @kholia you now have a workaround. Is this format used somewhere? What product uses this format? Should we include it in jumbo? (But I was wrong about the name space. You can't use dynamic_9001 etc. Jumbo doesn't allow format names > dynamic_5000. Unfortunately, due to another bug, #2776, you'll get a warning for every dynamic format defined in run/john-local.conf.
For now, just ignore that warning or adjust verbosity with |
@frank-dittrich I will take a look at the dynamic compiler soon but I doubt I will be able to understand this complex piece of code without Jim's help. Thanks for the nudge though! :-) |
@@ -1962,7 +1963,7 @@ static int parse_expression(DC_struct *p) {
#define ELSEIF(C,T,L) else if (!strncasecmp(pCode[i], #T, L)) \
{ \
char type = pCode[i][strlen(pCode[i])-1]; \
- comp_add_script_line("Func=DynamicFunc__" #C "_crypt_input%s_%s_input2\n", use_inp1?"1":"2", use_inp1?"append":"overwrite"); \
+ comp_add_script_line("Func=DynamicFunc__" #C "_crypt_input%s_%s_input2\n", use_inp1?"1":"2", "overwrite"); \
if (type=='r') { len_comp2 += nLenCode[i]; } \
else if (type=='c'||type=='6') { len_comp2 += b64_len(nLenCode[i]); } \
else { len_comp2 += nLenCode[i]*2; } \ This incorrect patch makes the dynamic compiler emit the right code for this particular situation. It appears that a single |
This really is a limitation of dynamic. Yes, the format above CAN be handled with smarter code. BUT that does not mean the dyna compiler (or dynamic in general), can handle all expressions. As soon as someone can write this expression for me in working 'flat' dynamic (by hand), then I will look at extending the dyna compiler. BUT this is simply a limitation of dynamic in general. It was written as a simple proof of concept, that ended up working amazingly well. BUT it is limited. The new flat model has only 2 buffers, and all operations will either append or overwrite one of the buffers with the results of the current operations. So, I post this as a challenge. Write a valid flat dynamic format for this hash: md5($p.md5($p.md5($p.$s).md5($p)).$p.$s.md5($s.$p.$p).md5($s.md5($s.$p).md5($s))) here is perl script to do it, and sample #!/usr/bin/perl
# md5($p.md5($p.md5($p.$s).md5($p)).$p.$s.md5($s.$p.$p).md5($s.md5($s.$p).md5($s)))
use Digest::MD5 qw(md5_hex);
foreach my $inp (<STDIN>) {
chomp $inp;
my @ar = split(",", $inp);
my $s = $ar[0];
my $p = $ar[1];
my $h1 = md5_hex($p.md5_hex($p.md5_hex($p.$s).md5_hex($p)).$p.$s.md5_hex($s.$p.$p).md5_hex($s.md5_hex($s.$p).md5_hex($s)));
print "pw=$p salt=$s hash=$h1\n";
} So my point is that even if we do fix the above issue, this does not (and CAN NOT) fix this format for all arbitrarily expressions. Simply put, the current compiler will handle all expressions which are dependent on a single var. It can handle a few which should be able to be done in 2 vars, but it fails for many instances which could be done in 2. It can not handle anything which would require 3 or more variables to complete. The above expression I have listed would require 3 independent variables, to house the sub expressions. There currently are not 3 variables within dynamic. |
If possible/feasible, it would obviously be nice if the compiler could detect "impossible tasks" like this one and bail out with some descriptive error, instead of producing something that fails self-test. |
I have sort of left it. I still am not 100% sure that I want to abandon it. I 'could' write a format that does use oSSL CTX, and recursive decent parsing. It would be slow (sort of like the crypt(3) code), BUT be a fallback to allow any valid simple function expression to 'work', even if it does not fit into dynamic with its variable limitations. Yes, I can easily run a simple ST and it will fail. I can run the existing RD parser code, and it generates valid data. I could write something that is much thinner than dynamic (hell, I already have most of it, in the logic in dyna compiler to create the ST strings). Then simply build a format around that. This is not a trivial undertaking, but I think it is something that really needs done, since about every 6 months, someone resurrects this zombie bug over and over again ;) |
This sounds like a great idea. I would love to have this functionality. |
Some good comments were made under this issue #2990 I will copy the text here
|
Adding to my statement above, the dynamic-opencl would obviously include mask-mode acceleration from day 1. But we should not even start this until after next Jumbo release because it'd take 105% of our time (at least yours truly and @jfoug) for quite some time before we'd have a nice start, ready for merging. |
The openCL version of dynamic has scared the shit outta me, and I have even deemed it undoable. However, while going over what I was going to reply in this message, something just hit me. How about getting really out of the box, much like when I did FLAT dynamic. Simply have an offline tool, that you enter the expressions wanted, and it builds a 'nearly' workable file. The valid, salt, binary may be the only things needing work, and possibly not even those. Thus, someone could list out something like ./generic_opencl_gen 'sha256($s.sha256($p.$s)^2048.$p)' > opencl_specialsha256_fmt_plug.c which there is no format for, and this generator would print to this file, the data needed. NOTE, i do not know opencl well, especially the really good 'tricks'. For this to work, we may need some additional flags, such as which tricks to use. I know this is vastly different than I was picturing, BUT I can see getting this to work. I am not sure I can see getting a true generic opencl format working at all. My original goal, was to get md5_gen (the original shit), and simply add CPU engines (including opencl) to it. But the more I dug into that code, and from experience doing SIMD, I see opencl as nothing more than a pipe dream. There is simply WAY too much nuance you have to do to get things working well, depending upon fast format, flat format, simply salted format, salted multi iteration format, or something else. each have to be somewhat hand done. |
Hmm but that's the host code - it would be the same every time 😆 and who would write the kernel?
How is it different or harder than the existing CPU format? Let's picture a rudimentary PoC. I think what you have in CPU dynamic is a bunch of functions that all have the same prototype. In CPU dynamic it's func(uchar *input_buf1, uchar *input_buf2, uint total_len1, uint total_len2, uchar *out_1, uchar *out_2 (...and so on...)) The OpenCL dynamic would never invent a single line of code. What it would do is put prefab blocks together. Since all of them have the exact same prototype, it's as easy as on CPU. So one of these prefab blocks (functions) would be DynamicFunc__crypt_md5(uchar *input_buf1, uchar *input_buf2, uint total_len1, uint total_len2, uchar *out_1, uchar *out_2 (...and so on...)) That function is trivial. Putting it all together is not hard. |
Please test using the PR: #3568 |
Here is a couple quick tests: In this one, I did not have the input file @jeroen-80 had, but the format is passing self test, and I HAVE run against real test vectors in other areas, and it has been finding them all.
And just to show the format will work for expressions which can not be handled by dynamic in any manner whatsoever (from a post I did earlier in this thread)
I used this script to generate the input file for that last sample data: $ cat x.pl
#!/usr/bin/perl
# md5($p.md5($p.md5($p.$s).md5($p)).$p.$s.md5($s.$p.$p).md5($s.md5($s.$p).md5($s)))
use Digest::MD5 qw(md5_hex);
foreach my $p (<STDIN>) {
chomp $p;
my $s = rand_str(8);
my $h1 = md5_hex($p.md5_hex($p.md5_hex($p.$s).md5_hex($p)).$p.$s.md5_hex($s.$p.$p).md5_hex($s.md5_hex($s.$p).md5_hex($s)));
print ":$h1\$$s:$p\n";
}
sub rand_str {
my $s = "";
my $n = $_[0];
for ($i = 0; $i < $n; $i++) {
$s .= chr(rand(26)+ord('A'));
}
return $s;
} |
The This needs to be reworded I think. Should we say |
It's crazy that RDP stuff can handle such complex expressions ❤️ |
Proposed text, diff --git a/src/dynamic_compiler.c b/src/dynamic_compiler.c
index 4fa96b1f6..094a66eda 100644
--- a/src/dynamic_compiler.c
+++ b/src/dynamic_compiler.c
@@ -2517,13 +2517,13 @@ int dynamic_assign_script_to_format(DC_HANDLE H, struct fmt_main *pFmt) {
ret = pFmt->methods.cmp_exact(pFmt->params.tests[j].ciphertext, 0);
if (!ret && !failed) {
if (options.verbosity > VERB_DEFAULT)
- fprintf(stderr, "cmp_exact() failed. This format will FAIL and needs the Slower dyna-compiler format\n");
+ fprintf(stderr, "This format would have failed with the fast code path, hence falling back to slower dyna-compiler format.\n");
failed = 1;
}
}
else if (!failed) {
if (options.verbosity >= VERB_DEFAULT)
- fprintf(stderr, "cmp_one() failed. This format will FAIL and needs the Slower dyna-compiler format\n");
+ fprintf(stderr, "This format would have failed with the fast code path, hence falling back to slower dyna-compiler format.\n");
failed = 1;
}
} @frank-dittrich Do you have a better sounding message? |
On 12/27/2018 9:32 PM, Dhiru Kholia wrote:
The |cmp_one() failed. This format will FAIL and needs the Slower
dyna-compiler format| text is confusing, especially the |will FAIL| part.
This needs to be reworded I think.
Should we say |would have failed with the fast code path...|?
Change of the verbiage (or even total removal), is extreme trivial. Just
give a consensus of what we want, and it will be easy to do.
|
On 12/27/2018 9:40 PM, Dhiru Kholia wrote:
It's crazy that RDP stuff can handle such complex expressions ❤️
Since the runtime 'matches' the lexer/parser code, and uses a stack to
handle all of the recursive data objects, there really should be no
limit on complexity. For certain type parsing (math like expressions
such as this case), a recursive decent parser is actually a VERY GOOD
match for handling the 'language'. So simply matching up a stack based
code engine onto the RDP is a very natural fit. It is also pretty
simple to 'prove' that it is correct, vs working on the expressions
within dynamic. In that format, we could match the logic to the code
running engine (the dynamic engine), but getting things correct (and
PROVING them correct), was much more difficult.
|
* added dynamic-compiler RDP format when script builds incorrectly * fixed compiler errors about signed/unsigned chars * better algo label. Fixed bug when MD5_X2 set * the failed message (i.e. using RDP) needed moved inside an if clause * dyna-compiler. added ,rdp switch to force RDP format. common_init needed for dyna-lib. expanded buffers in the compiler. * fixed john.pot output of things like -form=dynamic=md5($p) * Handle extra params for compiler lib formats. Fixed WS and speeling errors, per reviews * exe bit set * VC at 2015 is C99 compliant for vsnprintf. Handle RDP where base format stored keys in input buffers Closes #1746 and #2800, see also #3389, #3125
Reply from magnum,
The text was updated successfully, but these errors were encountered: