Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved xor modifier. #1132

Merged
merged 7 commits into from
Sep 18, 2019
Merged

Improved xor modifier. #1132

merged 7 commits into from
Sep 18, 2019

Conversation

wxsBSD
Copy link
Collaborator

@wxsBSD wxsBSD commented Sep 14, 2019

Implement an improved xor modifier which allows you to do:

$a = "This program cannot" xor(1-255)

This will generate every single byte xor for the string from 1 to 255 (inclusive).

This commit also switches the default behavior of xor by itself back to the old behavior of xor'ing with 0 to 255 so it will search for the plaintext too.

@wxsBSD wxsBSD mentioned this pull request Sep 14, 2019
Copy link
Member

@plusvic plusvic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

libyara/lexer.l Outdated Show resolved Hide resolved
libyara/grammar.y Outdated Show resolved Hide resolved
- No more changes to lexer.
- Duplicate modifiers are now errors.
- Do not rely on undocumented behavior in bison.
}
| string_modifiers string_modifier
{
set_flag_or_error($$.flags, $2.flags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a problem similar to the one mentioned in my previous comments. The line set_flag_or_error($$.flags, $2.flags) does this:

$$.flags |= $2.flags;

But it turns out that $$ may have any random value initially (at least in theory). In practice it doesn't have a random value because it gets initialized by Bison as $$ = $1, but Bison doesn't guarantee this, it's just an implementation detail that we shouldn't rely on.

Also, my previous comment still holds:

Yes, that's correct, $1 is the accumulation of all the other modifiers. But $1 could have the XOR flag set and some values in xor_min and xor_max, but if $2 doesn't have the flag, the values already $1 are not copied into $$. The values in $$ are the result from this rule, and could be undefined if not explicitly set.

It works because the implementation detail in Bison that forces $$ = $1 before calling our code. But this is not documented, in theory if you don't set the result value for your reduction rule in $$ it is undefined.

I mean, this if statement is not enough:

        if ($2.flags & STRING_GFLAGS_XOR)
        {
          $$.xor_min = $2.xor_min;
          $$.xor_max = $2.xor_max;
        }

If the condition is not true $$.xor_min and $$.xor_max are not initialized, and we could lose any flag or xor limits stored in $1. This is not happening for the same reason explained above.

We should do something like:

        $$ = $1;   <-- This guarantees that $$ is initialized with whatever we have in $1, even if this already happens in the current implementation of Bison, we must do it explicitly. 

        set_flag_or_error($$.flags, $2.flags);

        if ($2.flags & STRING_GFLAGS_XOR)
        {
          $$.xor_min = $2.xor_min;
          $$.xor_max = $2.xor_max;
        } 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. I see how it works now, and thank you for pointing it out!

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Sep 17, 2019

I don't know if there is a good answer to this but this string:

$a = "foobar" xor(1-255) wide

will generate generate the xor values from 1 to 255 and the wide string, which is probably not what people will mean when they write this. When I think about those modifiers I think "generate me a wide string and then xor it with 1 to 255", not "generate a wide string and the xor version of that wide string from 1 to 255". A poor way to deal with this is to do:

$a = "f\x00o\x00o\x00b\x00a\x00r\x00" xor(1-255)

I'm not sure how to deal with this. Thoughts?

@plusvic
Copy link
Member

plusvic commented Sep 17, 2019

I don't know if there is a good answer to this but this string:

$a = "foobar" xor(1-255) wide

will generate generate the xor values from 1 to 255 and the wide string, which is probably not what people will mean when they write this. When I think about those modifiers I think "generate me a wide string and then xor it with 1 to 255", not "generate a wide string and the xor version of that wide string from 1 to 255". A poor way to deal with this is to do:

$a = "f\x00o\x00o\x00b\x00a\x00r\x00" xor(1-255)

I'm not sure how to deal with this. Thoughts?

Right now "foobar" xor(1-255) wide extracts the atoms for the wide string (in this case it will be f\x00o\x00) and then apply the xor to those atoms, which results in the interleaved zeroes also xored. That's equivalent to "f\x00o\x00o\x00b\x00a\x00r\x00" xor(1-255), and that's probably what I would expect, right? Would you expect something different?

PS: Now I see that "foobar" xor(1-255) wide is not equivalent to "f\x00o\x00o\x00b\x00a\x00r\x00" xor(1-255) the former will search for the non-xored wide string too. Is that what you meant?

@plusvic
Copy link
Member

plusvic commented Sep 17, 2019

What I would expect is:

"foobar" xor(1-255) wide only matches the xored wide string
"foobar" xor(1-255) ascii only matches the xored ascii string
"foobar" xor(1-255) ascii wide only matches the xored ascii string and the xored wide string

So, once you use the xor modifier you are not matching the original non-xored string anymore, unless you write something like:

"foobar" xor(0-255) wide it matches the original wide string because you are XORing with 0. Does it makes sense?

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Sep 17, 2019

Yes, this makes sense. I'll get it fixed and update this PR.

This commit now makes it so that "xor(1-255) wide" will no longer search for the
plaintext wide version of the string, but "xor wide" will because it implies
xor(0-255).
@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Sep 18, 2019

Updated to address all concerns (I think).

One thing I've realized we can't express is a plaintext string and an xor range that doesn't start at zero. In the current iteration of this PR can you think of a way to do that?

I think the best way to express this would be xor(0,10-20) - that is, the grammar would accept a list of ranges. In this example the 0 is just syntactic sugar for 0-0. Thoughts? Is it worth doing? I'm happy to write the code.

@plusvic
Copy link
Member

plusvic commented Sep 18, 2019

Updated to address all concerns (I think).

One thing I've realized we can't express is a plaintext string and an xor range that doesn't start at zero. In the current iteration of this PR can you think of a way to do that?

I think the best way to express this would be xor(0,10-20) - that is, the grammar would accept a list of ranges. In this example the 0 is just syntactic sugar for 0-0. Thoughts? Is it worth doing? I'm happy to write the code.

Your solution is exactly what I would do, but I would wait and see if this have enough demand and it's worth the effort. The good thing is that this would be backward compatible change. What I would certainly implement now is xor(X) as a syntactic sugar for xor(X-X) because this is easy to implement.

@plusvic plusvic merged commit ce470b7 into VirusTotal:master Sep 18, 2019
@wxsBSD wxsBSD deleted the xor_fixes_2 branch October 22, 2019 16:42
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
* Implement improved xor modifier.

* Fix breakage when xor(...) came before another modifier.

* Add documentation for xor improvements.

* Fixes from review by Victor.

- No more changes to lexer.
- Duplicate modifiers are now errors.
- Do not rely on undocumented behavior in bison.

* Apply fix to deal with undocumented bison feature we shouldn't rely on.

* Make xor work properly with other modifiers.

This commit now makes it so that "xor(1-255) wide" will no longer search for the
plaintext wide version of the string, but "xor wide" will because it implies
xor(0-255).

* Add support for xor(X) and tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants