Skip to content

BPF: Change callx insn encoding #81546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 13, 2024
Merged

Conversation

yonghong-song
Copy link
Contributor

Currently, the kernel verifier unsupported callx insn used the 32-bit imm field to store the target register. On the other hand, gcc used the dst_reg field to store the target register. The gcc encoding is better. This patch adjusted the coding to be the same as gcc.

Currently, the kernel verifier unsupported callx insn
used the 32-bit imm field to store the target register.
On the other hand, gcc used the dst_reg field to store the target register.
The gcc encoding is better. This patch adjusted the coding
to be the same as gcc.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
@llvmbot llvmbot added the mc Machine (object) code label Feb 12, 2024
@yonghong-song yonghong-song requested a review from 4ast February 12, 2024 22:45
@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2024

@llvm/pr-subscribers-mc

Author: None (yonghong-song)

Changes

Currently, the kernel verifier unsupported callx insn used the 32-bit imm field to store the target register. On the other hand, gcc used the dst_reg field to store the target register. The gcc encoding is better. This patch adjusted the coding to be the same as gcc.


Full diff: https://github.com/llvm/llvm-project/pull/81546.diff

3 Files Affected:

  • (modified) llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp (+1)
  • (modified) llvm/lib/Target/BPF/BPFInstrInfo.td (+2-2)
  • (modified) llvm/test/MC/BPF/insn-unit.s (+3)
diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
index 90697c6645be2f..0d1eef60c3b550 100644
--- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
+++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
@@ -229,6 +229,7 @@ struct BPFOperand : public MCParsedAsmOperand {
     return StringSwitch<bool>(Name.lower())
         .Case("if", true)
         .Case("call", true)
+        .Case("callx", true)
         .Case("goto", true)
         .Case("gotol", true)
         .Case("*", true)
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 7d443a34490146..690d53420718ff 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -622,9 +622,9 @@ class CALLX<string OpcodeStr>
                    (ins GPR:$BrDst),
                    !strconcat(OpcodeStr, " $BrDst"),
                    []> {
-  bits<32> BrDst;
+  bits<4> BrDst;
 
-  let Inst{31-0} = BrDst;
+  let Inst{51-48} = BrDst;
   let BPFClass = BPF_JMP;
 }
 
diff --git a/llvm/test/MC/BPF/insn-unit.s b/llvm/test/MC/BPF/insn-unit.s
index 58342cda7cc0ad..224eb7381aa234 100644
--- a/llvm/test/MC/BPF/insn-unit.s
+++ b/llvm/test/MC/BPF/insn-unit.s
@@ -61,6 +61,9 @@
 // CHECK-32: c3 92 10 00 00 00 00 00 	lock *(u32 *)(r2 + 16) += w9
 // CHECK: db a3 e2 ff 00 00 00 00 	lock *(u64 *)(r3 - 30) += r10
 
+  callx r2
+// CHECK: 8d 02 00 00 00 00 00 00 	callx r2
+
 // ======== BPF_JMP Class ========
   if r1 & r2 goto Llabel0    // BPF_JSET  | BPF_X
   if r1 & 0xffff goto Llabel0    // BPF_JSET  | BPF_K

@yonghong-song
Copy link
Contributor Author

cc @jemarch

@yonghong-song
Copy link
Contributor Author

Windows test failed. The following are failed tests:

********************
Failed Tests (30):
  MLIR :: python/dialects/affine.py
  MLIR :: python/dialects/amdgpu.py
  MLIR :: python/dialects/arith_dialect.py
  MLIR :: python/dialects/arith_llvm.py
  MLIR :: python/dialects/cf.py
  MLIR :: python/dialects/func.py
  MLIR :: python/dialects/linalg/opdsl/arguments.py
  MLIR :: python/dialects/linalg/opdsl/assignments.py
  MLIR :: python/dialects/linalg/opdsl/doctests.py
  MLIR :: python/dialects/linalg/opdsl/emit_convolution.py
  MLIR :: python/dialects/linalg/opdsl/emit_matmul.py
  MLIR :: python/dialects/linalg/opdsl/emit_pooling.py
  MLIR :: python/dialects/linalg/opdsl/metadata.py
  MLIR :: python/dialects/linalg/opdsl/shape_maps_iteration.py
  MLIR :: python/dialects/linalg/opdsl/test_core_named_ops.py
  MLIR :: python/dialects/linalg/ops.py
  MLIR :: python/dialects/memref.py
  MLIR :: python/dialects/ml_program.py
  MLIR :: python/dialects/nvgpu.py
  MLIR :: python/dialects/python_test.py
  MLIR :: python/dialects/scf.py
  MLIR :: python/dialects/tensor.py
  MLIR :: python/dialects/transform_bufferization_ext.py
  MLIR :: python/dialects/transform_extras.py
  MLIR :: python/ir/blocks.py
  MLIR :: python/ir/builtin_types.py
  MLIR :: python/ir/diagnostic_handler.py
  MLIR :: python/ir/dialects.py
  MLIR :: python/ir/operation.py
  MLIR :: python/pass_manager.py
Testing Time: 63.76s
Total Discovered Tests: 2561
  Skipped          :    2 (0.08%)
  Unsupported      :  324 (12.65%)
  Passed           : 2202 (85.98%)
  Expectedly Failed:    1 (0.04%)
  Unresolved       :    2 (0.08%)
  Failed           :   30 (1.17%)

They are all MLIR python tests, and these failures are not really related to this patch.

@@ -61,6 +61,9 @@
// CHECK-32: c3 92 10 00 00 00 00 00 lock *(u32 *)(r2 + 16) += w9
// CHECK: db a3 e2 ff 00 00 00 00 lock *(u64 *)(r3 - 30) += r10

callx r2
// CHECK: 8d 02 00 00 00 00 00 00 callx r2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am waiting for a compile to finish, but I wanted to say that it looks right to me. The following is a test case from binutils:

  28:   8d 06 00 00 00 00 00 00         callr %r6

Note: The mnemonic will change.

Otherwise, everything looks good! Thanks again.

@yonghong-song yonghong-song merged commit c43ad6c into llvm:main Feb 13, 2024
@yonghong-song yonghong-song deleted the callx branch February 8, 2025 06:06
niooss-ledger added a commit to niooss-ledger/ghidra that referenced this pull request Apr 1, 2025
When clang encounters indirect calls in eBPF programs, it emits a call
instruction with a register parameter (`BPF_X`) instead of an immediate
value (`BPF_K`). This encoding (`BPF_JMP | BPF_CALL | BPF_X = 0x8d`) is
decoded by llvm-objdump as `callx`.

For example, here is a simple C program with an indirect call:

    extern void (*ptr_to_some_function)(void);
    void call_ptr_to_some_function(void) {
        ptr_to_some_function();
    }

Compiling and disassembling it gives with clang 14.0 (and LLVM 14.0):

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0)
           3:  8d 00 00 00 01 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

Contrary to usual eBPF instruction, `callx`'s register operand is
encoded in the immediate field. This encoding is actually specific to
LLVM (and clang). GCC used the destination register to store the target
register.

LLVM 19.1 was modified to use GCC's encoding:
llvm/llvm-project#81546 ("BPF: Change callx insn
encoding"). For example, in an Alpine Linux 3.21 system:

    $ clang -target bpf --version
    Alpine clang version 19.1.4
    Target: bpf
    Thread model: posix
    InstalledDir: /usr/lib/llvm19/bin

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0x0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0x0)
           3:  8d 01 00 00 00 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

The instruction is now encoded `8d 01 00...`.

For refrence, here are similar commands using GCC showing it is using
the same encoding (here, compiler option `-mxbpf` is required to enable
several features including indirect calls, cf.
https://gcc.gnu.org/onlinedocs/gcc-12.4.0/gcc/eBPF-Options.html ).

    $ bpf-gcc --version
    bpf-gcc (12-20220319-1ubuntu1+2) 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]

    $ bpf-gcc -O2 -c indirect_call.c -o indirect_call.ebpf -mxbpf
    $ bpf-objdump -mxbpf -rd indirect_call.ebpf

    indirect_call_gcc-12.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
       0:  18 00 00 00 00 00 00 00   lddw %r0,0
       8:  00 00 00 00 00 00 00 00
          0: R_BPF_INSN_64  ptr_to_some_function
      10:  79 01 00 00 00 00 00 00   ldxdw %r1,[%r0+0]
      18:  8d 01 00 00 00 00 00 00   call %r1
      20:  95 00 00 00 00 00 00 00   exit

Add both `callx` instruction encodings to eBPF processor.

By the way, the eBPF Verifier used by Linux kernel currently forbids
indirect calls (it fails when `BPF_SRC(insn->code) != BPF_K`, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v6.14#n19141
). But other deployments of eBPF may already support this feature.
niooss-ledger added a commit to niooss-ledger/ghidra that referenced this pull request Apr 1, 2025
When clang encounters indirect calls in eBPF programs, it emits a call
instruction with a register parameter (`BPF_X`) instead of an immediate
value (`BPF_K`). This encoding (`BPF_JMP | BPF_CALL | BPF_X = 0x8d`) is
decoded by llvm-objdump as `callx`.

For example, here is a simple C program with an indirect call:

    extern void (*ptr_to_some_function)(void);
    void call_ptr_to_some_function(void) {
        ptr_to_some_function();
    }

Compiling and disassembling it gives with clang 14.0 (and LLVM 14.0):

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0)
           3:  8d 00 00 00 01 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

Contrary to usual eBPF instruction, `callx`'s register operand is
encoded in the immediate field. This encoding is actually specific to
LLVM (and clang). GCC used the destination register to store the target
register.

LLVM 19.1 was modified to use GCC's encoding:
llvm/llvm-project#81546 ("BPF: Change callx insn
encoding"). For example, in an Alpine Linux 3.21 system:

    $ clang -target bpf --version
    Alpine clang version 19.1.4
    Target: bpf
    Thread model: posix
    InstalledDir: /usr/lib/llvm19/bin

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0x0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0x0)
           3:  8d 01 00 00 00 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

The instruction is now encoded `8d 01 00...`.

For reference, here are similar commands using GCC showing it is using
the same encoding (here, compiler option `-mxbpf` is required to enable
several features including indirect calls, cf.
https://gcc.gnu.org/onlinedocs/gcc-12.4.0/gcc/eBPF-Options.html ).

    $ bpf-gcc --version
    bpf-gcc (12-20220319-1ubuntu1+2) 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]

    $ bpf-gcc -O2 -c indirect_call.c -o indirect_call.ebpf -mxbpf
    $ bpf-objdump -mxbpf -rd indirect_call.ebpf

    indirect_call_gcc-12.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
       0:  18 00 00 00 00 00 00 00   lddw %r0,0
       8:  00 00 00 00 00 00 00 00
          0: R_BPF_INSN_64  ptr_to_some_function
      10:  79 01 00 00 00 00 00 00   ldxdw %r1,[%r0+0]
      18:  8d 01 00 00 00 00 00 00   call %r1
      20:  95 00 00 00 00 00 00 00   exit

Add both `callx` instruction encodings to eBPF processor.

By the way, the eBPF Verifier used by Linux kernel currently forbids
indirect calls (it fails when `BPF_SRC(insn->code) != BPF_K`, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v6.14#n19141
). But other deployments of eBPF may already support this feature.
niooss-ledger added a commit to niooss-ledger/ghidra that referenced this pull request Apr 1, 2025
When clang encounters indirect calls in eBPF programs, it emits a call
instruction with a register parameter (`BPF_X`) instead of an immediate
value (`BPF_K`). This encoding (`BPF_JMP | BPF_CALL | BPF_X = 0x8d`) is
decoded by llvm-objdump as `callx`.

For example, here is a simple C program with an indirect call:

    extern void (*ptr_to_some_function)(void);
    void call_ptr_to_some_function(void) {
        ptr_to_some_function();
    }

Compiling and disassembling it gives with clang 14.0 (and LLVM 14.0):

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0)
           3:  8d 00 00 00 01 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

Contrary to usual eBPF instructions, `callx`'s register operand is
encoded in the immediate field. This encoding is actually specific to
LLVM (and clang). GCC used the destination register to store the target
register.

LLVM 19.1 was modified to use GCC's encoding:
llvm/llvm-project#81546 ("BPF: Change callx insn
encoding"). For example, in an Alpine Linux 3.21 system:

    $ clang -target bpf --version
    Alpine clang version 19.1.4
    Target: bpf
    Thread model: posix
    InstalledDir: /usr/lib/llvm19/bin

    $ clang -O2 -target bpf -c indirect_call.c -o indirect_call.ebpf
    $ llvm-objdump -rd indirect_call.ebpf

    indirect_call.ebpf:  file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
           0:  18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  r1 = 0x0 ll
                    0000000000000000:  R_BPF_64_64  ptr_to_some_function
           2:  79 11 00 00 00 00 00 00  r1 = *(u64 *)(r1 + 0x0)
           3:  8d 01 00 00 00 00 00 00  callx r1
           4:  95 00 00 00 00 00 00 00  exit

The instruction is now encoded `8d 01 00...`.

For reference, here are similar commands using GCC showing it is using
the same encoding (here, compiler option `-mxbpf` is required to enable
several features including indirect calls, cf.
https://gcc.gnu.org/onlinedocs/gcc-12.4.0/gcc/eBPF-Options.html ).

    $ bpf-gcc --version
    bpf-gcc (12-20220319-1ubuntu1+2) 12.0.1 20220319 (experimental) [master r12-7719-g8ca61ad148f]

    $ bpf-gcc -O2 -c indirect_call.c -o indirect_call.ebpf -mxbpf
    $ bpf-objdump -mxbpf -rd indirect_call.ebpf

    indirect_call_gcc-12.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <call_ptr_to_some_function>:
       0:  18 00 00 00 00 00 00 00   lddw %r0,0
       8:  00 00 00 00 00 00 00 00
          0: R_BPF_INSN_64  ptr_to_some_function
      10:  79 01 00 00 00 00 00 00   ldxdw %r1,[%r0+0]
      18:  8d 01 00 00 00 00 00 00   call %r1
      20:  95 00 00 00 00 00 00 00   exit

Add both `callx` instruction encodings to eBPF processor.

By the way, the eBPF Verifier used by Linux kernel currently forbids
indirect calls (it fails when `BPF_SRC(insn->code) != BPF_K`, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?h=v6.14#n19141
). But other deployments of eBPF may already support this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants