implements better support for cross memory disassembling #1134
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).
Partially fixes #1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.
Problem
Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As #1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.
This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.
Solution
The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.
Caveats
The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.