adds register aliases to the Core Theory #1343

ivg · 2021-09-01T18:24:25Z

For the whole history of BAP it was the responsibility of a lifter to resolve all registers aliases and special registers and use some predefined and arbitrary selected set of registers as the base registers.

However, recent additions of new lifters, e.g., Ghidra, and architectures that use various names for the same registers to denote different semantics (aarch64, avr), showed us the current approach doesn't scale well and that we're missing an abstraction that is widely used in the domain that we are modeling.

This PR introduces to the Core Theory register aliases and defines their semantics. It is now possible to set and access pseudo registers and the core theory machinery will automatically resolve it to the corresponding extracts, concatenations, and updates, e.g., set w1 wzr will be translated to set r1 (logand r1 0xFFFFFFFF00000000) so that the user theories will never even see the pseudo-registers. The idea is general, and in the future we might even handle the pc registers, e.g., that set pc r0 would be translated into jmp r0. But right now it is only possible to express registers in terms of other registers (i.e., as a continuous part of another register or as a concatenation of registers). Zero registers and constant registers are also supported via the corresponding roles.

The register aliasing rules themselves are defined in a pretty arbitrary manner and the underlying solver will automatically resolve each register in terms of operations on the base registers. It is necessary, though, to tell to the solver which registers are pseudo using the corresponding register role in the target description.

Overall, this mechanism makes our life much easier when we are dealing with different lifters and disassembly backends as they may have committed to different choices in the register names, or which registers to be considered the base registers, or whether to represent the status as a single register or keep each flag as a boolean register, and so on. Now we have the single view on the register file that is set up during target definition and the backends can safely use whatever aliases they like.

So far, this new feature is only used in aarch64 but we will later use it for all other architectures, new and existing. If anyone wants to specify aliasing for the existing architectures, like x86, the help is much appreciated.

Implementation Details

The desugaring (translating operations on pseudo registers to operations on the base registers) is applied using the Theory.Pass.Desugar functor, which is applied automatically to any theory that is obtained through the Theory interface, e.g., Theory.require or Theory.current. It is also possible to apply this functor manually to your theory, if for some reason you're not getting the theory from the knowledge base.

The Theory.Pass module is currently a tip of a newly emerging iceberg that will include the transformation pipeline for theories. In particular, we plan to add some useful passes for optimization and code simplification, which will be especially useful for ghidra backend that produces extremely suboptimal encoding.

Before that it was unsigned but in general we should put more control on it, as right now we have 32 aarch64 bit operations performed in 64 bits.

This release brings This release brings Ghidra as the new disassembler and lifting backend, significantly improves our Thumb lifter (especially with respect to interworking), adds forward-chainging rules and context variables to the knowledge base, support for LLVM 12, a pass that flattens IR, and a new framework for pattern matching on bytes that leverages the available patterns and actions from the Ghidra project. It also contains many bug fixes and improvements, most notable performance improvements that make bap from 30 to 50 per cent faster. See below for the full list of changes. Package-wise, we split bap into three parts: `bap-core`, `bap`, and `bap-extra`. The `bap-core` metapackage contains the minimal set of core packages that is necessary to disassemble the binary, the `bap` package extends this set with various analysis, finally, `bap-extra` includes rarely used or hard to install packages, such as the symbolic executor, which is very heavy on installation, and `bap-ghidra`, which is right now in a very experimental stage and is only installable on Ubuntu 18.04, since it requires the libghidra-dev package available from ppa, ``` sudo add-apt-repository ppa:ivg/ghidra -y sudo apt-get install libghidra-dev -y sudo apt-get install libghidra-data -y ``` Changelog ========= Features -------- - BinaryAnalysisPlatform/bap#1325 adds armeb abi - BinaryAnalysisPlatform/bap#1326 adds experimental Ghidra disassembler and lifting backend - BinaryAnalysisPlatform/bap#1332 adds the flatten pass - BinaryAnalysisPlatform/bap#1341 adds context variables to the knowledge base - BinaryAnalysisPlatform/bap#1343 adds register aliases to the Core Theory - BinaryAnalysisPlatform/bap#1358 adds LLVM 12 support - BinaryAnalysisPlatform/bap#1360 extends the knowledge monad interface - BinaryAnalysisPlatform/bap#1363 adds forward-chaining rules and Primus Lisp methods - BinaryAnalysisPlatform/bap#1364 adds a generic byte pattern matcher based on Ghidra - BinaryAnalysisPlatform/bap#1365 adds support for the Thumb IT blocks - BinaryAnalysisPlatform/bap#1369 adds some missing `t2LDR.-i12` instructions to the Thumb lifter Improvements ------------ - BinaryAnalysisPlatform/bap#1336 improves the `main` function discovery heuristics - BinaryAnalysisPlatform/bap#1337 adds more Primus Lisp stubs and fixes some existing - BinaryAnalysisPlatform/bap#1342 uses context variables to store the current theory - BinaryAnalysisPlatform/bap#1344 uses the context variables to store the Primus Lisp state - BinaryAnalysisPlatform/bap#1355 tweaks symbolization and function start identification facilities - BinaryAnalysisPlatform/bap#1353 improves arm-family support - BinaryAnalysisPlatform/bap#1356 stops proposing aliases as potential subroutine names - BinaryAnalysisPlatform/bap#1361 rewrites knowledge and primus monads - BinaryAnalysisPlatform/bap#1370 tweaks Primus Lisp' method resolution to keep super methods - BinaryAnalysisPlatform/bap#1375 error handling and performance tweaks - BinaryAnalysisPlatform/bap#1378 improves reification of calls in the IR theory (part I) - BinaryAnalysisPlatform/bap#1379 improves semantics of some ITT instructions - BinaryAnalysisPlatform/bap#1380 Fixes handling of fallthroughs in IR theory Bug Fixes --------- - BinaryAnalysisPlatform/bap#1328 fixes C.ABI.Args `popn` and `align_even` operators - BinaryAnalysisPlatform/bap#1329 fixes frame layout calculation in the Primus loader - BinaryAnalysisPlatform/bap#1330 fixes the address size computation in the llvm backend - BinaryAnalysisPlatform/bap#1333 fixes and improves label handling in the IR theor - BinaryAnalysisPlatform/bap#1338 fixes core:eff theory - BinaryAnalysisPlatform/bap#1340 fixes the Node.update for graphs with unlabeled nodes - BinaryAnalysisPlatform/bap#1347 fixes a knowledge base race condition in the run plugin - BinaryAnalysisPlatform/bap#1348 fixes endianness in the raw loader - BinaryAnalysisPlatform/bap#1349 short-circuits evaluation of terms in Bap_main.init - BinaryAnalysisPlatform/bap#1350 fixes variable rewriter and some Primus Lisp symbolic functions - BinaryAnalysisPlatform/bap#1351 fixes and improves aarch64 lifter - BinaryAnalysisPlatform/bap#1352 fixes several Primus Lisp stubs - BinaryAnalysisPlatform/bap#1357 fixes some T32 instructions that are accessing to PC - BinaryAnalysisPlatform/bap#1359 fixes handling of let-bound variables in flatten pass - BinaryAnalysisPlatform/bap#1366 fixes a bug in the `cmp` semantics - BinaryAnalysisPlatform/bap#1374 fixes handling modified immediate constants in ARM T32 encoding - BinaryAnalysisPlatform/bap#1376 fixes fresh variable generation - BinaryAnalysisPlatform/bap#1377 fixes the IR theory implementation Tooling ------- - BinaryAnalysisPlatform/bap#1319 fixes the shared folder in deb packages - BinaryAnalysisPlatform/bap#1320 removes sudo from postinst and postrm actions in the deb packages - BinaryAnalysisPlatform/bap#1321 enables push flag in the publish-docker-image action - BinaryAnalysisPlatform/bap#1323 fixes the ppx_bap version in the dev-repo opam file - BinaryAnalysisPlatform/bap#1331 fixes the docker publisher, also enables manual triggering - BinaryAnalysisPlatform/bap#1327 fixes a typo in the ubuntu dockerfiles - BinaryAnalysisPlatform/bap#1345 fixes bapdoc - BinaryAnalysisPlatform/bap#1346 nightly tests are failing due to a bug upstream

ivg added 3 commits September 1, 2021 13:59

adds the register aliasing

242973b

fixes the msb function in the lisp library

7e1b81f

uses signed extension during operands unification in Primus Lisp

3f7bbe1

Before that it was unsigned but in general we should put more control on it, as right now we have 32 aarch64 bit operations performed in 64 bits.

ivg merged commit fdd4287 into BinaryAnalysisPlatform:master Sep 1, 2021

ivg deleted the register-aliasing branch December 1, 2021 19:16

ivg mentioned this pull request Dec 8, 2021

releases BAP 2.4.0 ocaml/opam-repository#20177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adds register aliases to the Core Theory #1343

adds register aliases to the Core Theory #1343

Uh oh!

ivg commented Sep 1, 2021

Uh oh!

Uh oh!

adds register aliases to the Core Theory #1343

adds register aliases to the Core Theory #1343

Uh oh!

Conversation

ivg commented Sep 1, 2021

Implementation Details

Uh oh!

Uh oh!