Skip to content

Conversation

ivg
Copy link
Member

@ivg ivg commented Oct 15, 2019

The old representation inherited from BAP 0.x was representing
repz<code> as

if (RCX = 0) jmp next;
<code>;
RCX := RCX - 1;
if (RCX = 0)
  jmp next;
if (ZF)
  jmp start;

where next and start were concrete addresses of the current and
the next instructions. It goes without saying that such code
has a very bloated graph representation.

A more tight representation, which could also be nicecly reified into
a graph could be produced using BIL's while statement, which is
reflected to the Core Theory repeat term. The same instruction will
now have the following representation:

$0 := true;
while (RCX <> 0 & $0) {
  <code>;
  RCX := RCX - 1;
  $0 := ZF;
}

which is reified into the following IR:

entry:
  #0 := 1
  goto head
body:
  <code>
  RCX := RCX - 1
  #0 := ZF
  goto tail
head:
  goto tail
tail:
  when RCX <> 0 & #0 goto body

Which is much smaller than the original IR.

Another nice property of the new representation is that it doesn't
contain any absolute jumps therefore is rellocatable. Additionally, it
is no longer classified as jump or barrier and is a pure data effect.

Caveats

We still have a small suboptimality in the form of a bogus head: goto tail block, which comes from the how we represent graphs in our denotational semantics. This block could be removed with a simple optimization pass after an instruction is lifted, but I don't think it is worthwhile to do it right now, given how scarce such kind of instructions could be.

The old representation inherited from BAP 0.x was representing
`repz<code>` as

```
if (RCX = 0) jmp next;
<code>;
RCX := RCX - 1;
if (RCX = 0)
  jmp next;
if (ZF)
  jmp start;
```

where `next` and `start` were concrete addresses of the current and
the next instructions. It goes without saying that such code
has a very bloated graph representation.

A more tight representation, which could also be nicecly reified into
a graph could be produced using BIL's `while` statement, which is
reflected to the Core Theory `repeat` term. The same instruction will
now have the following representation:

```
$0 := true;
while (RCX <> 0 & $0) {
  <code>;
  RCX := RCX - 1;
  $0 := ZF;
}
```

which is reified into the following IR:
```
entry:
  #0 := 1
  goto tail
body:
  <code>
  RCX := RCX - 1
  #0 := ZF
  goto tail
tail:
  when RCX <> 0 & #0 goto body
```

Which is much smaller than the original IR.

Another nice property of the new representation is that it doesn't
contain any absolute jumps therefore is rellocatable. Additionally, it
is no longer classified as jump or barrier and is a pure data effect.

Caveats
-------

So far the IR translation layer for some reason produces slightly less
optimal representation, namely
```
entry:
  #0 := 1
  goto bogus;
body:
  <code>
  RCX := RCX - 1
  #0 := ZF
  goto tail
bogus:
  goto tail
tail:
  when RCX <> 0 & #0 goto body
```

I will try to remove the `bogus: goto tail` block if possible,
otherwise we will let it for the later times, when we will have more
time.
@ivg ivg requested a review from gitoleg October 15, 2019 16:11
@gitoleg gitoleg merged commit fe44212 into BinaryAnalysisPlatform:master Oct 16, 2019
gitoleg added a commit to gitoleg/bap that referenced this pull request Oct 22, 2019
The problem
After BinaryAnalysisPlatform#998, string instructions with the rep prefix are represented
with the `Bil.while` statement and have the semantics of a pure data effect
(and therefore do not break a basic block). However, when we reify while into
IR we still have jump terms. And this wasn't anticipated by the sema lifter,
which was glueing IR constituting one basic block, assuming that there
will not be any complex control flow.

This PR fixes this issue (which is not only specific to BinaryAnalysisPlatform#998, this PR just
triggered it, in general there could be other cases, when pure data effects are
reified into non-simple control structures, cf. conditional moves).

Solution
Instead of just appending blocks, ensure that they are connected by an explicit jump.
ivg pushed a commit that referenced this pull request Oct 23, 2019
The problem
After #998, string instructions with the rep prefix are represented
with the `Bil.while` statement and have the semantics of a pure data effect
(and therefore do not break a basic block). However, when we reify while into
IR we still have jump terms. And this wasn't anticipated by the sema lifter,
which was glueing IR constituting one basic block, assuming that there
will not be any complex control flow.

This PR fixes this issue (which is not only specific to #998, this PR just
triggered it, in general there could be other cases, when pure data effects are
reified into non-simple control structures, cf. conditional moves).

Solution
Instead of just appending blocks, ensure that they are connected by an explicit jump.
@ivg ivg deleted the tweak-rep-in-x86 branch June 10, 2020 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants