Skip to content

[MC] Should MCFixup store MCValue instead of MCExpr? #135592

Open
@MaskRay

Description

@MaskRay

GNU Assembler utilizes struct fixup to represent both the fixup and the relocatable expression.

(Some writeup will be copied back to https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers )

struct fix {
  ...
  /* NULL or Symbol whose value we add in.  */
  symbolS *fx_addsy;

  /* NULL or Symbol whose value we subtract.  */
  symbolS *fx_subsy;

  /* Absolute number we add in.  */
  valueT fx_offset;
};

The relocation specifier is part of the instruction instead of part of struct fix. Targets have different internal representations of instructions.

In contrast, LLVM integrated assembler encodes fixups and relocatable expressions separately.

class MCFixup {
  /// The value to put into the fixup location. The exact interpretation of the
  /// expression is target dependent, usually it will be one of the operands to
  /// an instruction or an assembler directive.
  const MCExpr *Value = nullptr;      /// In GNU Assembler, this would be addsy/subsy/offset

  /// The byte index of start of the relocation inside the MCFragment.
  uint32_t Offset = 0;

  /// The target dependent kind of fixup item this is. The kind is used to
  /// determine how the operand value should be encoded into the instruction.
  MCFixupKind Kind = FK_NONE;

  /// The source location which gave rise to the fixup, if any.
  SMLoc Loc;
};

LLVM encodes relocatable expressions as MCValue,

class MCValue {
  const MCSymbol *SymA = nullptr, *SymB = nullptr;
  int64_t Cst = 0;
  uint32_t Specifier = 0;
};

The const MCExpr *MCFixup::getValue() method feels inconvenient and less elegant compared to GNU Assembler's approach for these reasons:

  • Relocation specifier can be encoded by every sub-expression in the MCExpr tree, rather than the fixup itself (or the instruction, as in GNU Assembler). Supporting all of a+4@got, a@got+4, (a+4)@got requires extensive hacks in LLVM MCParser.
  • evaluateAsRelocatable converts an MCExpr to an MCValue without updating the MCExpr itself. This leads to redundant evaluations, as MCAssembler::evaluateFixup is called multiple times, such as in MCAssembler::fixupNeedsRelaxation and MCAssembler::layout.

Storing a MCValue directly in MCFixup, or adding a relocation specifier member, could eliminate the need for many target-specific MCTargetFixup classes that manage relocation specifiers.
However, target-specific evaluation hooks would still be needed for specifiers like PowerPC @l or RISC-V %lo().

Computing label differences will be simplified as we can utilize SymA and SymB.


There are ~18 uses of MCFixup::getValue and ~141 uses of MCFixup::create.
The cleanup effort would be significant. Streamlining MCFixup::getValue and adding a shim for MCFixup::create could help. However, introducing a new member to MCFixup before cleaning up the rest of the framework might increase memory usage.

There are ~111 uses of MCAsmParser::parseExpression. Some of them could be refactored to take a relocation specifier output parameter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    mcMachine (object) code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions