Skip to content

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Aug 29, 2018

Actually, this PR just restores the previous behavior, that was lost due to lack of tests and comments.

So this PR:

  1. extends a list of roots with addresses of such CFG nodes, that don't have any input edges. The effect is noticeable in a case of pure code input, since the list of roots is empty in such case, and therefore the reconstructor will create just an empty symtab.
  2. adds comments in the code to avoid losing such behavior in the future.
  3. updated documentation for reconstructor in bap.mli, since it looks quite outdated

Actually, this PR just restores previous behaviour,
that was lost due to lack of tests and comments.

So this PR:
1) extends a list of roots with addresses of such CFG nodes,
   that don't have any input edges. The effect is noticeable in
   case of pure code input, since the list of roots is empty
   in such case, and therefore the reconstructor will create
   just an empty symtab.
2) adds comments in code to avoid of loosing such behaviour in
   future.
3) updated documentation for reconstructor in `bap.mli`, since
   it looks like quite outdated
@gitoleg gitoleg changed the title updates roots in reconstructor fixes calls search in reconstructor Aug 29, 2018
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please update the comment, as provided in the comments, otherwise good to go.

Block can be a function start in the next cases:
- block address is in a set of [roots]
- block doesn't have any input edges in [cfg].
The later is extremly important in case of pure code input. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rephrase it as I'm afraid that we won't be able to decipher this later... today for example :)

A block is considered to be a function start if one of the following is true:
   - the block address belongs to the provided set of roots;
   - the block has no incoming edges

In general, all blocks should be reachable from the set of roots, thus a block without incoming edges should belong to the set of roots, so it might be tempting to say that the second clause is redundant. 

However, our implementation of the recursive descent disassembler can generate blocks that are not reachable from the initial set of roots. This can happen only if the set of roots is empty, which is treated as a special case, that instructs the disassembler to treat the first byte of the provided input as a root. 

Beyond being a questionable design decision, this behavior is actually just an example of a more general problem. The reconstructor and disassemblers could be called with different sets of roots. So, in general, we can't rely on the set of roots passed to the reconstructor for functions start classification. 

@ivg ivg merged commit eab9f1e into BinaryAnalysisPlatform:master Aug 29, 2018
@gitoleg gitoleg deleted the update-roots branch May 13, 2020 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants