Skip to content

[MC][AArch64] Assertion Failure in MCAsmLexer::peekTok with Chained Malformed AArch64 Assembly Instruction Input #144126

Open
@venkyqz

Description

@venkyqz

Environment

Category Detail
Operating System Ubuntu 20.04
LLVM Version llvm-project-llvmorg-20.1.6
Architecture x86_64
Target Triple aarch64-linux-gnu
Sanitizers ASan (AddressSanitizer), UBSan (UndefinedBehaviorSanitizer)
Tool Used llvm-mc-assemble-fuzzer
Build Debug (With Assertions Enabled)

Summary

There is an assertion failure (Assertion 'ReadCount == 1' failed) in the llvm::MCAsmLexer::peekTok(bool) function. This crash consistently occurs when parsing specific malformed AArch64 assembly input following another malformed line.

The critical observation is that the first malformed line (el1, x12) by itself does NOT crash the fuzzer. Instead, it correctly reports syntax errors. However, when this same line is followed by a second specific malformed line (sqdmulh s15, s3, v4.s[0), the assertion triggers, causing the fuzzer to crash. This strongly suggests a state management issue or subtle side effect from the initial malformed input that corrupts the lexer's internal state, leading to a later assertion when processing the subsequent instruction.


Affected Component(s)

  • AArch64 Backend
  • MC Assembler Parser (MCAsmLexer, AArch64AsmParser)
  • Potentially MCAsmParser state management across multiple instruction parsing attempts.

Steps to Reproduce

  1. Prepare the input files:

    • token-bug-min (Does NOT crash when run alone):
      Create a file named token-bug-min with the following content:

      el1, x12
    • token-bug (Reproduces the crash):
      Create a file named token-bug with the following content:

      el1, x12
      sqdmulh s15, s3, v4.s[0
  2. Set up environment variables for sanitizers:

    export ASAN_OPTIONS="halt_on_error=0:abort_on_error=0:coverage=1"
    export UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0:print_stacktrace=1"
  3. Demonstrate token-bug-min behavior (no crash):

    ./llvm-mc-assemble-fuzzer \
        --triple=aarch64-linux-gnu \
        --mattr=+all \
        token-bug-min

    The output is correct (no crash):

    INFO: Seed: 705475483
    INFO: Loaded 1 modules    (1910 inline 8-bit counters): 1910 [0x174a178, 0x174a8ee),
    INFO: Loaded 1 PC tables (1910 PCs): 1910 [0x174a8f0,0x1752050),
    ./llvm-mc-assemble-fuzzer: Running 1 inputs 1 time(s) each.
    Running: token-bug-min
    Unknown buffer:1:4: error: unknown token in expression
    el1, x12
        ^
    Unknown buffer:1:4: error: invalid operand
    el1, x12
        ^
    Executed token-bug-min in 1 ms
    ***
    *** NOTE: fuzzing was not performed, you have only
    *** executed the target code on a fixed set of inputs.
    ***
    
  4. Run llvm-mc-assemble-fuzzer with the two-line token-bug input (reproduces crash):

./llvm-mc-assemble-fuzzer \
     --triple=aarch64-linux-gnu \
     --mattr=+all \
     token-bug

The output is incorrect (fuzzer crash with assertion):

    INFO: Seed: 979337232
    INFO: Loaded 1 modules   (1910 inline 8-bit counters): 1910 [0x174a178, 0x174a8ee),
    INFO: Loaded 1 PC tables (1910 PCs): 1910 [0x174a8f0,0x1752050),
    ./llvm-mc-assemble-fuzzer: Running 1 inputs 1 time(s) each.
    Running: token-bug
    Unknown buffer:1:4: error: unknown token in expression
    el1, x12
       ^
    Unknown buffer:1:4: error: invalid operand
    el1, x12
       ^
    llvm-mc-assemble-fuzzer: /llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCAsmLexer.h:117: const llvm::AsmToken llvm::MCAsmLexer::peekTok(bool): Assertion `ReadCount == 1' failed.
    ==2290937== ERROR: libFuzzer: deadly signal
        #0 0x4b4cb0 in __sanitizer_print_stack_trace (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4b4cb0)
        #1 0x460fc8 in fuzzer::PrintStackTrace() (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x460fc8)
        #2 0x446113 in fuzzer::Fuzzer::CrashCallback() (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x446113)
        #3 0x7fd18552b41f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)
        #4 0x7fd18512700a in __libc_signal_restore_set /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
        #5 0x7fd18512700a in raise /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3
        #6 0x7fd185106858 in abort /build/glibc-B3wQXB/glibc-2.31/stdlib/abort.c:79:7
        #7 0x7fd185106728 in __assert_fail_base /build/glibc-B3wQXB/glibc-2.31/assert/assert.c:94:3
        #8 0x7fd185117fd5 in __assert_fail /build/glibc-B3wQXB/glibc-2.31/assert/assert.c:103:3
        #9 0x577808 in llvm::MCAsmLexer::peekTok(bool) /llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCAsmLexer.h:117:5
        #10 0x5032da in (anonymous namespace)::AArch64AsmParser::parseOptionalMulOperand(llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:4835:25
        #11 0x4fd38c in (anonymous namespace)::AArch64AsmParser::parseOperand(llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&, bool, bool) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:5003:10
        #12 0x4e56f3 in (anonymous namespace)::AArch64AsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::SMLoc, llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:5321:11
        #13 0x570f54 in llvm::MCTargetAsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::AsmToken, llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h:446:12
        #14 0x952abc in (anonymous namespace)::AsmParser::parseAndMatchAndEmitTargetInstruction((anonymous namespace)::ParseStatementInfo&, llvm::StringRef, llvm::AsmToken, llvm::SMLoc) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:2327:42
        #15 0x9439de in (anonymous namespace)::AsmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:2317:10
        #16 0x937303 in (anonymous namespace)::AsmParser::Run(bool, bool) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:999:19
        #17 0x4bb403 in AssembleInput(char const*, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions&) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:129:18
        #18 0x4b82d4 in AssembleOneInput(unsigned char const*, unsigned long) /home/zhangqi/project/issta2024_asfuzzer/llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:234:19
        #19 0x4bbc14 in LLVMFuzzerTestOneInput /llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:243:10
        #20 0x4477d1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4477d1)
        #21 0x432f42 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x432f42)
        #22 0x4389f6 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4389f6)
        #23 0x4616b2 in main (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4616b2)
        #24 0x7fd185108082 in __libc_start_main /build/glibc-B3wQXB/glibc-2.31/csu/../csu/libc-start.c:308:16
        #25 0x40d5ed in _start (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x40d5ed)
    
    NOTE: libFuzzer has rudimentary signal handlers.
          Combine libFuzzer with AddressSanitizer or similar for better crash reports.
    SUMMARY: libFuzzer: deadly signal

Expected Behavior

The llvm-mc-assemble-fuzzer should robustly handle multiple erroneous lines. It should report specific syntax errors for both malformed instructions (el1, x12 and sqdmulh s15, s3, v4.s[0) and then exit gracefully with a non-zero status code, without crashing or triggering any assertions. The internal state of the lexer/parser should be properly reset or managed between processing instructions, preventing corruption that leads to subsequent crashes.


Actual Behavior

The tool crashes with an Assertion 'ReadCount == 1' failed error in llvm::MCAsmLexer::peekTok(bool). This assertion triggers specifically when processing the second line of the input (sqdmulh s15, s3, v4.s[0), after the first line (el1, x12) has already been processed and yielded syntax errors.

This behavior strongly indicates that the processing of the initial malformed instruction el1, x12 leaves the MCAsmLexer or a related parsing component in a corrupted or inconsistent state. When the parser then attempts to process the subsequent instruction sqdmulh s15, s3, v4.s[0, this corrupted state leads to the assertion being triggered in peekTok.


Full Crash Output / Stack Trace (from token-bug input)

➜  bin git:(master) ✗ cat token-bug
el1, x12
sqdmulh s15, s3, v4.s[0
➜  bin git:(master) ✗ export ASAN_OPTIONS="coverage=1"
➜  bin git:(master) ✗ cat token-bug
el1, x12
sqdmulh s15, s3, v4.s[0
➜  bin git:(master) ✗ export ASAN_OPTIONS="coverage=1"
export UBSAN_OPTIONS="print_stacktrace=1"
./llvm-mc-assemble-fuzzer \
    --triple=aarch64-linux-gnu \
    --mattr=+all \
     token-bug
INFO: Seed: 901112419
INFO: Loaded 1 modules   (1910 inline 8-bit counters): 1910 [0x174a178, 0x174a8ee),
INFO: Loaded 1 PC tables (1910 PCs): 1910 [0x174a8f0,0x1752050),
./llvm-mc-assemble-fuzzer: Running 1 inputs 1 time(s) each.
Running: token-bug
Unknown buffer:1:4: error: unknown token in expression
el1, x12
   ^
Unknown buffer:1:4: error: invalid operand
el1, x12
   ^
llvm-mc-assemble-fuzzer: /llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCAsmLexer.h:117: const llvm::AsmToken llvm::MCAsmLexer::peekTok(bool): Assertion `ReadCount == 1' failed.
==2288831== ERROR: libFuzzer: deadly signal
    #0 0x4b4cb0 in __sanitizer_print_stack_trace (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4b4cb0)
    #1 0x460fc8 in fuzzer::PrintStackTrace() (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x460fc8)
    #2 0x446113 in fuzzer::Fuzzer::CrashCallback() (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x446113)
    #3 0x7fd89828e41f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)
    #4 0x7fd897e8a00a in __libc_signal_restore_set /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
    #5 0x7fd897e8a00a in raise /build/glibc-B3wQXB/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3
    #6 0x7fd897e69858 in abort /build/glibc-B3wQXB/glibc-2.31/stdlib/abort.c:79:7
    #7 0x7fd897e69728 in __assert_fail_base /build/glibc-B3wQXB/glibc-2.31/assert/assert.c:94:3
    #8 0x7fd897e7afd5 in __assert_fail /build/glibc-B3wQXB/glibc-2.31/assert/assert.c:103:3
    #9 0x577808 in llvm::MCAsmLexer::peekTok(bool) /llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCAsmLexer.h:117:5
    #10 0x5032da in (anonymous namespace)::AArch64AsmParser::parseOptionalMulOperand(llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:4835:25
    #11 0x4fd38c in (anonymous namespace)::AArch64AsmParser::parseOperand(llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&, bool, bool) /llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:5003:10
    #12 0x4e56f3 in (anonymous namespace)::AArch64AsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::SMLoc, llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /llvm-project-llvmorg-20.1.6/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:5321:11
    #13 0x570f54 in llvm::MCTargetAsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::AsmToken, llvm::SmallVectorImpl<std::unique_ptr<llvm::MCParsedAsmOperand, std::default_delete<llvm::MCParsedAsmOperand> > >&) /llvm-project-llvmorg-20.1.6/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h:446:12
    #14 0x952abc in (anonymous namespace)::AsmParser::parseAndMatchAndEmitTargetInstruction((anonymous namespace)::ParseStatementInfo&, llvm::StringRef, llvm::AsmToken, llvm::SMLoc) /llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:2327:42
    #15 0x9439de in (anonymous namespace)::AsmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) /llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:2317:10
    #16 0x937303 in (anonymous namespace)::AsmParser::Run(bool, bool) /llvm-project-llvmorg-20.1.6/llvm/lib/MC/MCParser/AsmParser.cpp:999:19
    #17 0x4bb403 in AssembleInput(char const*, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions&) /llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:129:18
    #18 0x4b82d4 in AssembleOneInput(unsigned char const*, unsigned long) /llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:234:19
    #19 0x4bbc14 in LLVMFuzzerTestOneInput /llvm-project-llvmorg-20.1.6/llvm/tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp:243:10
    #20 0x4477d1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4477d1)
    #21 0x432f42 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x432f42)
    #22 0x4389f6 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4389f6)
    #23 0x4616b2 in main (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x4616b2)
    #24 0x7fd897e6b082 in __libc_start_main /build/glibc-B3wQXB/glibc-2.31/csu/../csu/libc-start.c:308:16
    #25 0x40d5ed in _start (/llvm-project-llvmorg-20.1.6/build-fuzzer/bin/llvm-mc-assemble-fuzzer+0x40d5ed)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal

Surprisingly, input only the malformed assembly instructions, looks good.

➜  bin git:(master) ✗ cat token-bug-min
el1, x12
➜  bin git:(master) ✗ export ASAN_OPTIONS="coverage=1"
export UBSAN_OPTIONS="print_stacktrace=1"
./llvm-mc-assemble-fuzzer \
    --triple=aarch64-linux-gnu \
    --mattr=+all \
     token-bug-min
INFO: Seed: 1000704437
INFO: Loaded 1 modules   (1910 inline 8-bit counters): 1910 [0x174a178, 0x174a8ee),
INFO: Loaded 1 PC tables (1910 PCs): 1910 [0x174a8f0,0x1752050),
./llvm-mc-assemble-fuzzer: Running 1 inputs 1 time(s) each.
Running: token-bug-min
Unknown buffer:1:4: error: unknown token in expression
el1, x12
   ^
Unknown buffer:1:4: error: invalid operand
el1, x12
   ^
Executed token-bug-min in 1 ms
***
*** NOTE: fuzzing was not performed, you have only
***       executed the target code on a fixed set of inputs.
***

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions