adds register aliases to the Core Theory #1343
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the whole history of BAP it was the responsibility of a lifter to resolve all registers aliases and special registers and use some predefined and arbitrary selected set of registers as the base registers.
However, recent additions of new lifters, e.g., Ghidra, and architectures that use various names for the same registers to denote different semantics (aarch64, avr), showed us the current approach doesn't scale well and that we're missing an abstraction that is widely used in the domain that we are modeling.
This PR introduces to the Core Theory register aliases and defines their semantics. It is now possible to set and access pseudo registers and the core theory machinery will automatically resolve it to the corresponding extracts, concatenations, and updates, e.g.,
set w1 wzr
will be translated toset r1 (logand r1 0xFFFFFFFF00000000)
so that the user theories will never even see the pseudo-registers. The idea is general, and in the future we might even handle the pc registers, e.g., thatset pc r0
would be translated intojmp r0
. But right now it is only possible to express registers in terms of other registers (i.e., as a continuous part of another register or as a concatenation of registers). Zero registers and constant registers are also supported via the corresponding roles.The register aliasing rules themselves are defined in a pretty arbitrary manner and the underlying solver will automatically resolve each register in terms of operations on the base registers. It is necessary, though, to tell to the solver which registers are pseudo using the corresponding register role in the target description.
Overall, this mechanism makes our life much easier when we are dealing with different lifters and disassembly backends as they may have committed to different choices in the register names, or which registers to be considered the base registers, or whether to represent the status as a single register or keep each flag as a boolean register, and so on. Now we have the single view on the register file that is set up during target definition and the backends can safely use whatever aliases they like.
So far, this new feature is only used in aarch64 but we will later use it for all other architectures, new and existing. If anyone wants to specify aliasing for the existing architectures, like x86, the help is much appreciated.
Implementation Details
The desugaring (translating operations on pseudo registers to operations on the base registers) is applied using the
Theory.Pass.Desugar
functor, which is applied automatically to any theory that is obtained through theTheory
interface, e.g.,Theory.require
orTheory.current
. It is also possible to apply this functor manually to your theory, if for some reason you're not getting the theory from the knowledge base.The
Theory.Pass
module is currently a tip of a newly emerging iceberg that will include the transformation pipeline for theories. In particular, we plan to add some useful passes for optimization and code simplification, which will be especially useful for ghidra backend that produces extremely suboptimal encoding.