Skip to content

Conversation

ivg
Copy link
Member

@ivg ivg commented Jan 28, 2021

Introduction

This PR enables analysis of programs written in Primus Lisp, essentially making Primus Lisp a concrete syntax for Core Theory. For example, the following lisp program could be easily reified into BIL (thus Primus Lisp could be seen as a concrete syntax for BIL, or BIR, or any Core Theory implementation)

(defun example1 (x)
  (set R0 1)
  (set R1 2)
  (set R3 (+ R1 R2 (* R1 R2 3)))
  (set F1:1 (< R0 R1))
  (set F2:1 (< R0 R1 R2 R3 R4))
  (set F3:1 (< R0 (+ R0 R3)))
  (set R4 (memory-read (+ R1 (* 8 (/ (word-width) 4)))))
  (memory-write R4 (+ R3 R1))
  (malloc R0)
  (if (> R0 (* R0 R0))
      (exec-addr 0xDEADBEEF)
    (set R0 (* R0 R2 R3))
    (malloc (* R3 R0))))

It could be translated to the following BIL program,

example1:
{
  R0 := 1
  R1 := 2
  R3 := R1 + R2 + R1 * R2 * 3
  F1 := R0 < R1
  F2 := R0 < R1 & R1 < R2 & R2 < R3 & R3 < R4
  F3 := R0 < R0 + R3
  R4 := pad:32[mem[R1 + 0x40]]
  mem := mem with [R4, be]:u32 <- R3 + R1
  bap:call(malloc)
  #1 := R0 * R0 < R0
  if (#1) {
    jmp 0xDEADBEEF
  }
  else {
    R0 := R0 * R2 * R3
    bap:call(malloc)
  }
}

See this gist for the full example.

What else it could be used for?

Right now it enables us to write BIL and BIR programs (basically define Insn.t values) using a concrete Primus Lisp syntax and relying on a big library of macros and primitives. In the future, we plan to use Primus Lisp to write lifters (so that a new instruction could be added without recompilation or having to learn OCaml).

Also, having the static semantics for stubs written in Primus Lisp will make these stubs available for static analysis.

We also plan to eventually switch to Primus Lisp as our main representation as it is fully extensible (and can easily support the full extent of the Core Theory).

Why Primus Lisp, not BAP Lisp?

We had plans to introduce BAP Lisp instead of Primus Lisp as our main representation. However, we decided to abolish these plans and instead take Primus Lisp as it exists. Primus Lisp features interesting properties that (together with the planned staged compilation) will enable us to use this language for writing not only efficient lifters but static and dynamic analyses. A non-deterministic staged language fits nicely both as a language for both machine semantics and security policy specifications.

Plans

This PR brings a minimum viable product that enables reification of the subset of the Core Theory that is materializable into BIL programs. In the next step, we will introduce staged compilation so that operations on statically known values (values obtained from the knowledge base) will be performed by the compiler itself instead of being reified into Core Theory terms. Later we will update the Primus Interpreter to enable the same staging (that will be the solution for issue #1124) so that meta-operations on the knowledge base, dictionaries, taint-tracker state and so on won't be reflected on the Primus Interpreter machine observations and became totally transparent. We will also implement the complete set of Core Theory operations, and implement reification to Primus Lisp, to enable translation in the other direction (BIL -> Lisp) which will finally make it possible to get rid of BIL and use Primus Lisp as the main representation.

How it is implemented?

The idea is simple and applies to any other language, so this PR could also be seen as a blueprint for adding a frontend for a programming language in BAP. We store the Primus Lisp program in the knowledge base, packed as a unit, using the existing source and language properties. The semantics of operations is provided the same way it is provided for machine instructions, or BIL itself, via a knowledge base rule. The main interesting point is that Primus Lisp has an arbitrary and extensible number of primitives, which are also represented as promises. This PR basically introduces zero abstractions (other than the abstract type[e for Primus primitive which is just a pair of a name and a list of arguments) and fully relies on the existing Core Theory abstractions.

ivg added 19 commits January 15, 2021 14:55
primitives are still not published
The new implementation is more tightly integrated with the knowledge
base and will totally rely on promises to provide semantics of lisp
definitions as well as lisp primitives. There will not be anymore the
special type for the semantics of a lisp program, instead a KB value
of Theory.Semantics.cls will be used. The class was extended with the
value slot, so no effects can have values.
right now the equality is purely textual equality.
and implements a few primitives, far from complete set though.
By removing a request to compute a name when the unit is not a lisp
unit. It helps since the name computation involves the stub resolver,
which in turn requires the semantics of the nearby instructions, and
so on.

In general, the rel-symbolizer shall be rewritten as it is so easy to
break it.
@ivg ivg merged commit 155a3d1 into BinaryAnalysisPlatform:master Jan 29, 2021
@ivg ivg deleted the primus-lisp-theory branch December 1, 2021 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant