-
Notifications
You must be signed in to change notification settings - Fork 279
enables functions with multiple entry #926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enables functions with multiple entry #926
Conversation
lib/bap/bap_calls_reconstructor.ml
Outdated
@@ -0,0 +1,87 @@ | |||
open Core_kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate a little bit on the purposes and tasks of this module, and the overall algorithm description? It would be even nice to have this as a comment in the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's describe, what we did here. First of all, we gently touched the disassembler to allow fall-through edges leading to roots. And this affects the cfg only and doesn't affect subroutines on the IR level since our sema lifter
doen't take into account whole program cfg at all, it deals only with subroutines cfgs, that's not sufficient for this task.
So we need to finish lifting procedure by taking into account every fall-through branch coming into subroutine entry. And by finish, I mean to edit a call site.
So algorithm, in general:
- find the fall-through branch to a subroutine, i.e. to it's entry block
- find a source IR block where it came from
- create and insert a call to the target subroutine after such block
- if the source block contains no-returns then redirect them to
a brand new call, since the only reason it was created as noreturn
is a lack of information about fall-through branches insema lifter
.
Will provide an example, like what is was before and how it looks like now.
E.g. main
subroutine in /bin/ls
. As far as I can see the only purpose is to push R15
, that makes the name of this function a little bit too serious:
000003c6: sub main(main_argc, main_argv, main_result)
00015327: main_argc :: in u32 = RDI
00015328: main_argv :: in out u64 = RSI
00015329: main_result :: out u32 = RAX
000003c2:
000003c3: v17136 := R15
000003c4: RSP := RSP - 8
000003c5: mem := mem with [RSP, el]:u64 <- v17136
after applying the algorithm in question we will see a continuation:
00015255:
00015256: call @sub_402a02 with noreturn
Considering the place where to apply this transformation, we can still
to find a better place:
- a separate pass
- module in
bap_sema
library
Does it make any sence for you?
So as far as I understand, you totally forgot about the reconstructor, and the |
well, not really. But now it works with the help of |
There are lot's of changes and better to describe them explicitly. ReconstructorIt turned out that current reconstructor violates an invariant that one block belongs only to one subroutine. Also, we update SymtabWe need information about fall-through edges that end up with roots. So far, we were storing only callees names in Sema lifterFor every block, |
25971b4
to
f88d68d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you're definitely abusing symtab here, we need to discuss, what you really want to use it for
- the question from the previous review is still there, I don't see any code that will fixes the callgraph
lib/bap_disasm/bap_disasm_symtab.mli
Outdated
val add_call_name : t -> block -> string -> t | ||
|
||
(* remembers a fall from [addr] to the given block *) | ||
val add_fall_addr : t -> block -> addr -> t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is totally non-obvious why a symtab shall remember fall addresses. The name of this function implies that we store all falls in the callgraph, which doesn't make any sense. So this should be refined.
This interface should be definitely refined.
lib/bap_disasm/bap_disasm_symtab.mli
Outdated
val find_call_name : t -> addr -> string option | ||
|
||
(* finds if there are any falls to the given [addr] *) | ||
val find_fall_addr : t -> addr -> addr option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any falls or falls that calls?
1f783c8
to
cf75d9f
Compare
This PR enhances the CFG reconstruction algorithm and weakens the notion of call, by allowing calls to be fall-though edges, not only of real branches. This enables support for functions with multiple entries and lessens the impact of Byteweight (or any other rooter) false positives. In particular, the new algorithm will correctly reconstruct the following three subroutines:
fixes #631