Skip to content

auto-sync progress tracker: Refactor and implement architectures #2015

Open
@Rot127

Description

@Rot127

Note to x86: x86 is not part of this list, because we can not generate all tables in C.
Refer to capstone-engine/llvm-capstone#13 for details.

Note about changes introduced with auto-sync:
For a preview what changes will come in v6, please take a look at the WIP release guide.


This issue tracks the auto-sync refactoring and implementation effort of architecture modules.

The table below lists the responsible developers for each architecture.

In progress

Arch CS PR llvm-capstone PR Part of (planned) release Assigned developer(s) Based on LLVM repo
SPARC #2704 capstone-engine/llvm-capstone#81 v6 @Rot127 LLVM-project

.td edits upstreamed

Most LLVM td files miss some information about instructions (memory read/writes, operands incorrectly assigned as in/out etc.). Since we rely on this we need to fix it. Those fixes should be upstreamed to LLVM.

Done

Arch PR Part of release Assigned developer(s) LLVM repo
Alpha #2071 v6 @R33v0LT LLVM-project (release v3.0)
ARC #2570 v6 @R33v0LT LLVM-project
AArch64 #2026 v6 @Rot127 LLVM-project
ARM #1949 v6 @Rot127 LLVM-project
PPC #2013 v6 @Rot127 LLVM-project
TriCore #1973 v5 @imbillow TriDis
HPPA #2265 v6 @R33v0LT Not Auto-sync based
LoongArch #2349 v6 @jiegec LLVM-project
MIPS #2410 v6 @wargio LLVM-project
SystemZ #2462 v6 @Rot127 LLVM-project
Xtensa #2380 v6 @imbillow LLVM-project
BPF #2568 v6 @Roeegg2 RFC, Linux kernel docs

Arch extensions

Adding CPU extensions which are not part of upsteram LLVM is easier now.
Here are they tracked.

Arch Extension name issue previous attempt/notes Done
PPC VLE #2241 https://lists.llvm.org/pipermail/llvm-dev/2014-July/074613.html No
PPC PS (Paired-Single) None https://reviews.llvm.org/D85137 Yes
Mips NanoMips None Mediatek LLVM: https://github.com/MediaTek-Labs/llvm-project/tree/mtk-pub/nanomips-llvm16, more context: rizinorg/ideas#5 Yes
Mips EE None Not in LLVM, see: #940 (comment) No

Effort level of not refactored/implemented archs

Arch Number of operand groups Generates Note Implementation type Difficulty level
AVR ~3 Yes None New Easy
CSKY ~7 Yes None New Medium
DirectX ~1 Yes Deviates from common design. New Medium-Hard
EVM ~2 Not tested Very small module, llvm repo: https://github.com/etclabscore/evm_llvm New Easy
Hexagon ~2 No Deviates from common design. New Hard
Lanai ~10 Yes None New Easy
M68k ~28 Yes None Refactor Medium
MSP430 ~6 Yes None New Easy
SPIRV ~9 No td files faulty New Medium
VE ~8 Yes None New Medium
XCore ~15 No td files faulty Refactor Medium

Note to RISC-V: RISC-V will not be generated via LLVM because the LLVM architecture definitions are not precise enough for our use case. Instead, a SAIL based generator will be used (#2392).

Legend

  • Number of operand groups: Operand groups which have a distinct print functions. Indicates effort to implement the LLVM <-> CS mapping code (fill cs_detail and the like).
  • Generates: inc files generate with most recent backends.
  • Note: Worthy to note.
  • Implementation type: Refactor current implementation or implement new arch module.
  • Difficulty level: Guessed difficulty of this arch (base on points above and complexity like number of instructions etc.). Though "Easy" still means you have to familiarize yourself how LLVM definitions and the updater work. My guess is it will take at least a week of work.

Getting started

  • If you like to refactor an architecture module or implement a new one, please comment here and we add you. Also we can give hints to important information.
  • Please add a draft PR once you've done the first commit, so the progress is visible and there is a place for discussion.
  • Please refer to the auto-sync documentation to learn how to refactor or implement an architecture with auto-sync

TODO for refactored archs

List of missing things which should be done before v6 to get a nice round package.

Capstone

LLVM revisions

Auto-Sync

  • add refactor setting to auto-sync updater.
  • Add auto-sync unit tests
  • Translate template functions as functions, not as macros.

Backends

  • Generate decoding/printing macros as functions, if there is only a single version (allows proper debugging, which would be a blessing).

ARM

PPC

  • Encoding info

AArch64

  • Encoding info

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    In Progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions