Description
The following LDM
instructions loads from the memory and writes to a list of registers.
0x90,0xe8,0x0e,0x00 - ldm.w r0, {r1, r2, r3}
(little endian - thumb)
In the td
file the reglist
is incorrectly defined as in
operand. I tried to fix it but the disassembler now breaks.
3fc: e890 000e ldm.w r2, llvm-objdump: /home/user/repos/llvm/llvm/include/llvm/MC/MCInst.h:70: unsigned int llvm::MCOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed
You can find the patched td
files and a binary here: Commit with patched td
files and binary (first instruction of _start
is ldm
instr. form above): Rot127/llvm-capstone@6344dae
There are two reasons for that:
1. The indices to access MI.operands
are off.
The indices to access MI->Operands
are off because the length of the register list
is only known during runtime and after the register list was decoded, the indices get not updated.
Because the reglist
is an out
operand it is stored at the beginning of the operands vector.
For example, when ARMInstPrinter::printPredicateOperand()
tries to decode the
predicate, it is given a "hardcoded" index of 2.
The reglist
is encoded as an immediate in the instruction bytes, but the
disassembler decodes it to multiple register operands.
This index 2
points to the predicate operand as defined in the td
files.
But because the writeback
(r0
) and reglist
(r1-r3
) take
index 0 and 1, 2, 3 the printing fails.
So because the ARM disassembler and printer cannot handle dynamically changing operand indices
the disassembly fails.
2. RegisterLists
are decoded by printing operand i
to MI->getNumOperands()
.
Because reglists
were incorrectly always set as in
operands, they were always
at the end of the MI->Operands
vector.
So the ARMInstPrinter::printRegisterList()
method got the index for the
first register in the list, and printed every operand until MI->getNumOperands()
.
This doesn't work if a out: reglist
is at the beginning of the
MI->Operands
vector.
Because it would just print all operands, including the in
ops.
Solution?
Can anyone think of a better solution for this, than to save the length of the reglist
and applying it as offset whenever an operand is printed?