Skip to content

Conversation

ivg
Copy link
Member

@ivg ivg commented Jun 16, 2020

TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).

Partially fixes #1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.

Problem

Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As #1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.

This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.

Solution

The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.

Caveats

The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.

TL;DR; jumps that cross segments and section boundaries are now
treated more thoroughly so that if a jump instruction leads to an
invalid execution chain in some other segment, the we will cancel both
the chain in the other segment and the chain that led to that jump in
the current segment (before it was canceled up to the boundaries of
its own segment).

Partially fixes BinaryAnalysisPlatform#1133, however since it is Thumb 2.0 binary for BAP it
is still mostly random data than something meaningful.

Problem
-------

Since 2.0 we have the incremental disassembler that supports
cross-sectional/cross-segmential jumps. As BinaryAnalysisPlatform#1133 shows sometimes they
can go wrong as they were treated specially and had some preferences
that regular intersectional jumps didn't have. One of the invariants
of our disassembler is that there is no valid chain of execution that
will hit the end of segment or data. In other words, that will force
the CPU into the invalid instruction state. We allow conservative
chains, so that the CPU can still hit an invalid instruction because
of a conditional branch (in other words, we allow conditional branches
to hit data). To preserve this invariant we maintain a tree of
disassembling tasks, so that once we hit data, we can unroll the chain
up to the root that started it (or the first conditional branch) and
cancel everything in between marking it also as data.

This invariant doesn't hold for jumps between sections as when we
see a jump instruction that goes out of the current memory region we
just assume that once we will get this other region of memory, it will
be disassembled nicely. However, later when we actually get access to
the memory region that contains the destination (our disassembler is
incremental and applied per each chunk of memory as it is discovered)
we may figure out that the chain starting from this address is
invalid and cancel this chain. However, since we no longer have access
to the disassembler state of the original memory region, we can't
cancel the chain that led to that jump in the original memory
region. Therefore later, when we build the whole program CFG we will
start that chain and eventually hit data and end up with an
exception.

Solution
--------

The solution is instead of discarding the task that breaches the
segment boundaries we will accumulate it in a debt list, and every
time we are handled with a new memory region we first try to payoff
the debts. And if the task is now in the boundaries and we can prove
that it hits data, then we cancel the whole chain that can now cross
section boundaries.

Caveats
-------

The debt is a list of task and each task references its parent tasks,
so in fact it is a tree of instructions covering the whole program. We
are storing the debt list in the disassembler state which is saved on
the hard drive and if the debt list is large (and since in binary
format we can't preserve sharing) it can be quite large to store and
to load. So far the assumption is that the debt list is either empty
or very small after the project is fully disassembled. If this
hypothesis will not turn true, we can either cancel all unpayed debt
at the end of disassembling or just ignore it and do not store on the
disk.
@ivg ivg force-pushed the fix-cross-memory-jumps branch from 6b581d7 to 3921672 Compare June 17, 2020 12:12
@ivg
Copy link
Member Author

ivg commented Jun 17, 2020

OK, Travis is dead so I am merging. Tested locally.

@ivg ivg merged commit 5a27c11 into BinaryAnalysisPlatform:master Jun 17, 2020
@ivg ivg deleted the fix-cross-memory-jumps branch March 9, 2022 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failure on elf32-littlearm
1 participant