-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improv. load & store, impl. lea #326
Conversation
Gawd! where do all those fucking warnings come from? |
@@ -747,7 +755,7 @@ pub fn call(a: Rvalue) -> Result<(Vec<Statement>, JumpSpec)> { | |||
call ?, new_rip:64; | |||
});*/ | |||
let stmts = rreil!{ | |||
call ?, (a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a question about this; shouldn't the lvalue for call be its return, so on x86, the rax register?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is depends on the calling conventions. Functions can have multiple return values or return nothing. My plan is to replace the current call op with something that allows you to specify a whole vector of written and read R/Lvalues, but the code isn't ready for prime time yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, k.
So if possible, keep in mind a binary encoding when designing the call op; variable length arguments make encoding system significantly more complicated.
I mention this because I'm 100% convinced the quickest, fastest, and most maintainable way to make panopticon really, really fast is to go the binary encoding route for the il, which will reduce a libc analysis from 2gb -> 200mb, and secondly, to implement your suggestions for function having the basic blocks in a single vector, and not allocated everywhere all over the heap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's also a problem w/ SSA. It assumes that all instructions have exactly one return value. My idea is to make Statement an enum with one variant for RREIL instructions and two for calls"
enum Statement {
Simple{
op: Operation,
assignee: Lvalue,
},
ResolvedCall{
reads: Vec<Rvalue>,
writes: Vec<Lvalue>,
function: Uuid,
}
UnresolvedCall{
target: Rvalue,
}
}
The number of calls per function isn't that large, so I hope the impact won't eat up the performance gains of the binary encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way would be to insert a new mnemonic __call_summary
after each call that contains the effects of the call before. So a functions trashing ECX
and RAX
would look like this:
call <blah>
call ? uuid-of-blah
__call_summary
mov ECX:32, ?
mov RAX:64, ?
This has the advantage that we won't have "weird" instructions and can model functions that return values on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new enum doesn't worry me so much as the variable length Vec<Rvalue>
and Vec<Lvalue>
; if it truly is variable length i believe it will force the encoding to not be fixed width anymore, which substantially increases the complexity.
It will also likely make a Vec<BinaryStatement>
impossible, and instead will require BinaryStatements
, which is its own vec of bytes with a size header, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course this is somewhat pre-mature optimization, but I really am convinced the binary format will make things previously impossible, possible, e.g., analyzing a 20-30MB binary, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I guess in that case we will go with the second option. It makes the analysis a bit harder though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against first; also I don't quite understand the second option :P
Just suggesting keeping the binary format in mind for design. I actually like the first more (except for perhaps embedding the uuid, this closely couples panopticon's uuid implementation with the rreil, which is fine, but something to be aware of)
Another approach is to consider lifting the "primitive" rreil IL into an IR, which rewrites and contains all these analyses after the fact.
If we did this, we can let the rreil be dumb, and easier to encode, easier to emit, and then have a second pass take the IL as input, and output a more sophisticated IR, which contains stuff like resolved call functions with read/writes, etc., which can then be taken as input to something like a decompiler backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of a second pass would also decouple a future decompiler backend from the IL used, as long as their was a trans from the IL -> IR, the decompiler could have multiple frontends 😎
fixes #324