Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improv. load & store, impl. lea #326

Merged
merged 5 commits into from
Aug 19, 2017
Merged

Improv. load & store, impl. lea #326

merged 5 commits into from
Aug 19, 2017

Conversation

flanfly
Copy link
Member

@flanfly flanfly commented Aug 19, 2017

fixes #324

@flanfly
Copy link
Member Author

flanfly commented Aug 19, 2017

Gawd! where do all those fucking warnings come from?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 58.407% when pulling 1e47c34 on flanfly:master into a1dddb1 on das-labor:master.

@flanfly flanfly merged commit f56b026 into das-labor:master Aug 19, 2017
@@ -747,7 +755,7 @@ pub fn call(a: Rvalue) -> Result<(Vec<Statement>, JumpSpec)> {
call ?, new_rip:64;
});*/
let stmts = rreil!{
call ?, (a);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had a question about this; shouldn't the lvalue for call be its return, so on x86, the rax register?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is depends on the calling conventions. Functions can have multiple return values or return nothing. My plan is to replace the current call op with something that allows you to specify a whole vector of written and read R/Lvalues, but the code isn't ready for prime time yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, k.

So if possible, keep in mind a binary encoding when designing the call op; variable length arguments make encoding system significantly more complicated.

I mention this because I'm 100% convinced the quickest, fastest, and most maintainable way to make panopticon really, really fast is to go the binary encoding route for the il, which will reduce a libc analysis from 2gb -> 200mb, and secondly, to implement your suggestions for function having the basic blocks in a single vector, and not allocated everywhere all over the heap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's also a problem w/ SSA. It assumes that all instructions have exactly one return value. My idea is to make Statement an enum with one variant for RREIL instructions and two for calls"

enum Statement {
  Simple{
    op: Operation,
    assignee: Lvalue,
  },
  ResolvedCall{
    reads: Vec<Rvalue>,
    writes: Vec<Lvalue>,
    function: Uuid,
  }
  UnresolvedCall{
    target: Rvalue,
  }
}

The number of calls per function isn't that large, so I hope the impact won't eat up the performance gains of the binary encoding.

Copy link
Member Author

@flanfly flanfly Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative way would be to insert a new mnemonic __call_summary after each call that contains the effects of the call before. So a functions trashing ECX and RAX would look like this:

call <blah>
  call ? uuid-of-blah
__call_summary
  mov ECX:32, ?
  mov RAX:64, ?

This has the advantage that we won't have "weird" instructions and can model functions that return values on the stack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new enum doesn't worry me so much as the variable length Vec<Rvalue> and Vec<Lvalue>; if it truly is variable length i believe it will force the encoding to not be fixed width anymore, which substantially increases the complexity.

It will also likely make a Vec<BinaryStatement> impossible, and instead will require BinaryStatements, which is its own vec of bytes with a size header, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this is somewhat pre-mature optimization, but I really am convinced the binary format will make things previously impossible, possible, e.g., analyzing a 20-30MB binary, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess in that case we will go with the second option. It makes the analysis a bit harder though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against first; also I don't quite understand the second option :P

Just suggesting keeping the binary format in mind for design. I actually like the first more (except for perhaps embedding the uuid, this closely couples panopticon's uuid implementation with the rreil, which is fine, but something to be aware of)

Another approach is to consider lifting the "primitive" rreil IL into an IR, which rewrites and contains all these analyses after the fact.

If we did this, we can let the rreil be dumb, and easier to encode, easier to emit, and then have a second pass take the IL as input, and output a more sophisticated IR, which contains stuff like resolved call functions with read/writes, etc., which can then be taken as input to something like a decompiler backend

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of a second pass would also decouple a future decompiler backend from the IL used, as long as their was a trans from the IL -> IR, the decompiler could have multiple frontends 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

x86 intructions need some crucial implementations
3 participants