Skip to content

[llvm-objdump][ELF]Fix crash when reading strings from .dynstr #125679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 10, 2025

Conversation

cabbaken
Copy link
Contributor

@cabbaken cabbaken commented Feb 4, 2025

This change introduces a check for the strtab offset to prevent llvm-objdump from crashing when processing malformed ELF files.
It provide a minimal reproduce test for #86612 (comment).
Additionally, it modifies how llvm-objdump handles and outputs malformed ELF files with invalid string offsets.(More info: https://discourse.llvm.org/t/should-llvm-objdump-objdump-display-actual-corrupted-values-in-malformed-elf-files/84391)

Fixes: #86612

@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Cabbaken (cabbaken)

Changes

This change introduces a check for the strtab offset to prevent llvm-objdump from crashing when processing malformed ELF files.
It provide a minimal reproduce test for #86612 (comment).
Additionally, it modifies how llvm-objdump handles and outputs malformed ELF files with invalid string offsets.(More info: https://discourse.llvm.org/t/should-llvm-objdump-objdump-display-actual-corrupted-values-in-malformed-elf-files/84391)


Full diff: https://github.com/llvm/llvm-project/pull/125679.diff

2 Files Affected:

  • (modified) llvm/test/tools/llvm-objdump/ELF/dynamic-section.test (+34)
  • (modified) llvm/tools/llvm-objdump/ELFDump.cpp (+23)
diff --git a/llvm/test/tools/llvm-objdump/ELF/dynamic-section.test b/llvm/test/tools/llvm-objdump/ELF/dynamic-section.test
index 5205c5a3876d5f..33ca7e5b40c262 100644
--- a/llvm/test/tools/llvm-objdump/ELF/dynamic-section.test
+++ b/llvm/test/tools/llvm-objdump/ELF/dynamic-section.test
@@ -438,6 +438,9 @@ ProgramHeaders:
 # RUN: yaml2obj --docnum=4 %s -o %t4
 # RUN: llvm-objdump -p %t4 | FileCheck %s --strict-whitespace --check-prefix=INDENT
 
+# RUN: yaml2obj --docnum=5 %s -o %t5
+# RUN: llvm-objdump -p %t5 | FileCheck %s --strict-whitespace --check-prefix=INDENT
+
 # INDENT: {{^}}Dynamic Section:
 # INDENT: {{^}}  NEEDED 0x
 
@@ -470,3 +473,34 @@ Sections:
        Value: 0x1
      - Tag:   DT_NULL
        Value: 0x0
+
+--- !ELF
+FileHeader:
+  Class:   ELFCLASS64
+  Data:    ELFDATA2LSB
+  Type:    ET_EXEC
+  Machine: EM_X86_64
+Sections:
+  - Name:    .dynstr
+    Type:    SHT_STRTAB
+    Address: 0x1000
+    Size:    0x10
+    Content: "004400550066007700"
+  - Name: .dynamic
+    Type: SHT_DYNAMIC
+    Entries:
+     - Tag:   DT_NEEDED
+       Value: 0x1245657656
+     - Tag:   DT_STRTAB
+       Value: 0x1000
+     - Tag:   DT_NULL
+       Value: 0x0
+ProgramHeaders:
+  - Type:     PT_LOAD
+    VAddr:    0x1000
+    FirstSec: .dynstr
+    LastSec:  .dynamic
+  - Type:     PT_DYNAMIC
+    VAddr:    0x101D
+    FirstSec: .dynamic
+    LastSec:  .dynamic
\ No newline at end of file
diff --git a/llvm/tools/llvm-objdump/ELFDump.cpp b/llvm/tools/llvm-objdump/ELFDump.cpp
index e9e5b059f1786e..949bf1c6faacb8 100644
--- a/llvm/tools/llvm-objdump/ELFDump.cpp
+++ b/llvm/tools/llvm-objdump/ELFDump.cpp
@@ -14,8 +14,11 @@
 #include "ELFDump.h"
 
 #include "llvm-objdump.h"
+#include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Demangle/Demangle.h"
+#include "llvm/Object/ELF.h"
 #include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Object/ELFTypes.h"
 #include "llvm/Support/Format.h"
 #include "llvm/Support/raw_ostream.h"
 
@@ -221,6 +224,20 @@ template <class ELFT> void ELFDumper<ELFT>::printDynamicSection() {
   std::string TagFmt = "  %-" + std::to_string(MaxLen) + "s ";
 
   outs() << "\nDynamic Section:\n";
+  auto StringTableSize = (typename ELFT::Xword)0;
+  for (const auto &Sec : cantFail(Elf.sections())) {
+    if (Sec.sh_type == ELF::SHT_STRTAB)
+      StringTableSize =
+          StringTableSize < Sec.sh_size ? Sec.sh_size : StringTableSize;
+  }
+  for (const typename ELFT::Dyn &Dyn : DynamicEntries) {
+    if (Dyn.d_tag == ELF::DT_STRSZ) {
+      StringTableSize =
+          StringTableSize < Dyn.getVal() ? Dyn.getVal() : StringTableSize;
+      break;
+    }
+  }
+
   for (const typename ELFT::Dyn &Dyn : DynamicEntries) {
     if (Dyn.d_tag == ELF::DT_NULL)
       continue;
@@ -235,6 +252,12 @@ template <class ELFT> void ELFDumper<ELFT>::printDynamicSection() {
       Expected<StringRef> StrTabOrErr = getDynamicStrTab(Elf);
       if (StrTabOrErr) {
         const char *Data = StrTabOrErr->data();
+        if (Dyn.getVal() > StringTableSize) {
+          reportWarning("Invalid string table offset", Obj.getFileName());
+          outs() << format(TagFmt.c_str(), Str.c_str())
+                 << format(Fmt, (uint64_t)Dyn.getVal());
+          continue;
+        }
         outs() << format(TagFmt.c_str(), Str.c_str()) << Data + Dyn.getVal()
                << "\n";
         continue;

@cabbaken
Copy link
Contributor Author

cabbaken commented Feb 5, 2025

I'm not sure why the tests failed. It doesn't seem related to this patch.
Do these tests use llvm-objdump? Could someone help troubleshoot this?😭

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general llvm-readobj's error handling is better than llvm-objdump. You could look at its code for inspiration. The relevant patch might be https://reviews.llvm.org/D77216

Instead of a general title "Enhancing llvm-objdump stability and error handling", a specific one like "Fix ... when reading strings from .dynstr" or the like would be better and help future readers that find your patch.

…6612)

This change introduces a check for the strtab offset
to prevent llvm-objdump from crashing when processing
malformed ELF files.
Additionally, it modifies how llvm-objdump handles and
outputs malformed ELF files with invalid string offsets.
More info: https://discourse.llvm.org/t/should-llvm-objdump-objdump-display-actual-corrupted-values-in-malformed-elf-files/84391

Signed-off-by: cabbaken <cabbaken@outlook.com>
@jh7370
Copy link
Collaborator

jh7370 commented Feb 5, 2025

I'm not sure why the tests failed. It doesn't seem related to this patch. Do these tests use llvm-objdump? Could someone help troubleshoot this?😭

It's not uncommon for tests to fail for reasons unrelated to your PR. Typically this is because there's a bug at the point of main that you branched off of. Looking at the failures suggests that this is the case here. You can ignore them.

@cabbaken
Copy link
Contributor Author

cabbaken commented Feb 5, 2025

I'm not sure why the tests failed. It doesn't seem related to this patch. Do these tests use llvm-objdump? Could someone help troubleshoot this?😭

It's not uncommon for tests to fail for reasons unrelated to your PR. Typically this is because there's a bug at the point of main that you branched off of. Looking at the failures suggests that this is the case here. You can ignore them.

Got it, thanks for your kindly response~

@jh7370
Copy link
Collaborator

jh7370 commented Feb 5, 2025

The new version of patch modified commit title ...

The default behaviour for merging is to use the PR title and description, not the commit message (although the initial default of the PR title and description is taken from the commit message). If you need to update these, you should edit the PR, not the commit message.

Also, please avoid force pushing after updating, as it makes it harder to see what has changed between versions of the PR.

Finally, please add "Fixes: #86612" to the PR description. In that way, when the PR is merged, the issue should auto-close, I believe.

@cabbaken cabbaken changed the title [llvm-objdump][ELF] Enhancing llvm-objdump stability and error handling(#86612) [llvm-objdump][ELF]Fix crash when reading strings from .dynstr(#86612) Feb 5, 2025
@cabbaken
Copy link
Contributor Author

cabbaken commented Feb 5, 2025

The new version of patch modified commit title ...

The default behaviour for merging is to use the PR title and description, not the commit message (although the initial default of the PR title and description is taken from the commit message). If you need to update these, you should edit the PR, not the commit message.

Also, please avoid force pushing after updating, as it makes it harder to see what has changed between versions of the PR.

Finally, please add "Fixes: #86612" to the PR description. In that way, when the PR is merged, the issue should auto-close, I believe.

I've made the changes about the title and description of PR and will follow these guidelines in future work.

@@ -438,6 +438,9 @@ ProgramHeaders:
# RUN: yaml2obj --docnum=4 %s -o %t4
# RUN: llvm-objdump -p %t4 | FileCheck %s --strict-whitespace --check-prefix=INDENT

# RUN: yaml2obj --docnum=5 %s -o %t5
# RUN: llvm-objdump -p %t5 | FileCheck %s --strict-whitespace --check-prefix=INDENT
Copy link
Member

@MaskRay MaskRay Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2>&1 | FileCheck ... and use a different check prefix.

Test both the warning invalid string table offset and the output. The output describes how llvm-objdump handles input input, and is useful as a change detector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will fix it soon. This is my first time using the llvm-test-suite, and I'm sorry for being unfamiliar with it.

StringTableSize < Sec.sh_size ? Sec.sh_size : StringTableSize;
}
for (const typename ELFT::Dyn &Dyn : DynamicEntries) {
if (Dyn.d_tag == ELF::DT_STRSZ) {
Copy link
Member

@MaskRay MaskRay Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section header table is optional in loadable units (executables, shared objects). Dynamic loaders only read dynamic tags, not the section header table. The dynamic tags are source of truths for the dynamic symbol table size. You can keep just the DT_STRSZ code and ignore SHT_STRTAB.

(
We could emit a warning when the two values mismatch, but llvm-readobj doesn't do it. For the purpose of fixing just this crash it may not be necessary to add the warning. Emitting a warning requires quite a few tests to make for an acceptable patch)

Copy link
Contributor Author

@cabbaken cabbaken Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check SHT_STRTAB because, in getDynamicStrTab(Elf), I noticed that both dynamic entries and sections are checked. Therefore, I believe checking the sections is necessary.
The test(tools/llvm-objdump/ELF/private-headers.test) will fail without this check.
@MaskRay

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
@cabbaken cabbaken requested a review from MaskRay February 8, 2025 14:50
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed comments - I had them pending, but forgot to submit!

Type: SHT_DYNAMIC
Entries:
- Tag: DT_NEEDED
Value: 0x1245657656
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth having a tag that has a value exactly matching the size, to test the edge case behaviour?

Copy link
Contributor Author

@cabbaken cabbaken Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the edge of the value may depend on the the OS or the hardware.
Since this value is the offset of the .dynstr(we need this value to be exceed max virtual address), it may have different edge value in different architecture and OS. So we just need a value that big enough to test if llvm-objdump could process it correctly.
Please let me know if this interpretation is off or if there are additional considerations to be aware of, and thanks for your review~

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check used is Dyn.getVal() >= StringTableSize, so the edge I'm referring to is the StringTableSize, i.e. an address at exactly StringTableSize produces a warning, whereas one that is one before this value does.

Copy link
Contributor Author

@cabbaken cabbaken Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if the Dyn.getVal() just greater than StringTableSize but less than valid virtual memory size, the program won't crash. And this actually happens quite often(when you see some meaningless char in the output of llvm-objdump -x). It was be specifically described here.
So the value of Dyn.getVal() should be large enough to make the program crash, I think.
Please let me know if I misunderstood something. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be reading memory outside the file image, or you could end up trying to read memory belonging e.g. to another a process entirely (depending on how memory safety works on your particular OS). A crash is just one of possible behaviours. For example, it's not impossible you'd print something that appears meaningful but is actually completely bogus.

As a valid virtual address will depend on the host OS, that isn't something that is worth targeting. Instead, the thing you should be targeting is where the behaviour changes - namely when the address points at or past the end of the string table. In that way, you can show you get the warning when the value reaches the bad point, but not when it is just before. We don't need to trigger a crash, showing that we don't read bogus data is good enough, since it's the same problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now!
I will fix it in next commit.

for (const typename ELFT::Dyn &Dyn : DynamicEntries) {
if (Dyn.d_tag == ELF::DT_STRSZ) {
StringTableSize =
StringTableSize < Dyn.getVal() ? Dyn.getVal() : StringTableSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't exist already somewhere, I think you may want a check that shows that the DT_STRSZ value makes sense, i.e. doesn't point past the end of the data. This could be a separate PR, but if you're concerned about invalid DT_NEEDED values, it makes equal sense to be concerned about invalid DT_STRSZ. NB: this could be checked already when you get the dynamic string table later - I haven't looked.

Copy link
Contributor Author

@cabbaken cabbaken Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I will check it in the next PR.

Co-authored-by: James Henderson <James.Henderson@sony.com>
@cabbaken cabbaken changed the title [llvm-objdump][ELF]Fix crash when reading strings from .dynstr(#86612) [llvm-objdump][ELF]Fix crash when reading strings from .dynstr Feb 11, 2025
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Type: SHT_DYNAMIC
Entries:
- Tag: DT_NEEDED
Value: 0x1245657656
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be reading memory outside the file image, or you could end up trying to read memory belonging e.g. to another a process entirely (depending on how memory safety works on your particular OS). A crash is just one of possible behaviours. For example, it's not impossible you'd print something that appears meaningful but is actually completely bogus.

As a valid virtual address will depend on the host OS, that isn't something that is worth targeting. Instead, the thing you should be targeting is where the behaviour changes - namely when the address points at or past the end of the string table. In that way, you can show you get the warning when the value reaches the bad point, but not when it is just before. We don't need to trigger a crash, showing that we don't read bogus data is good enough, since it's the same problem.

Expected<const typename ELFT::Shdr *> StringTableSecOrError =
getSection<ELFT>(cantFail(Elf.sections()), Sec.sh_link);
if (!StringTableSecOrError) {
consumeError(StringTableSecOrError.takeError());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably shouldn't be throwing away this error and instead should report it and bail out of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I will fix it in next commit.

Comment on lines 233 to 235
StringTableSize = StringTableSize < (*StringTableSecOrError)->sh_size
? (*StringTableSecOrError)->sh_size
: StringTableSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ternary isn't necessary. The dynamic section will only reference one string table, so you'll only have one size by this point.

In fact, once you find the dynamic section and have read the string size from it, you can bail out of the loop.

Copy link
Contributor Author

@cabbaken cabbaken Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I’ll fix it.
But if I apply the condition (if (Sec.sh_type == ELF::SHT_DYNSYM || Sec.sh_type == ELF::SHT_DYNAMIC)), this might be necessary.
I’ll implement this first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point is probably moot, if we switch to simply relying on the size returned by getDynamicStrTab (after fixing getDynamicStrTab as I've described, because this code should disappear.

for (const typename ELFT::Dyn &Dyn : DynamicEntries) {
if (Dyn.d_tag == ELF::DT_STRSZ) {
StringTableSize =
StringTableSize < Dyn.getVal() ? Dyn.getVal() : StringTableSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should be picking the larger of the two sizes, assuming a section one is found. It would be safer to pick the smaller one. That being said, if this matches llvm-readobj, I'm okay with it.

Copy link
Contributor Author

@cabbaken cabbaken Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-readobj use ELF::DT_STRSZ to check string table size here: llvm/tools/llvm-readobj/ELFDumper.cpp:2007: parseDynamicTable().
And I think the ternary here is necessary, because, the StringTableSize may be 0 after the loop of the sections(the sections may be stripped), it will finally be 0 without this ternary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think the point is moot now, since this code should disappear once we've fixed getDynamicStrTab.

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
@jh7370
Copy link
Collaborator

jh7370 commented Feb 14, 2025

FYI, it'll be next week before I come back to this.

@artagnon artagnon removed their request for review February 15, 2025 11:39
Copy link

github-actions bot commented Feb 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
@@ -63,14 +63,22 @@ static Expected<StringRef> getDynamicStrTab(const ELFFile<ELFT> &Elf) {
if (!DynamicEntriesOrError)
return DynamicEntriesOrError.takeError();

typename ELFT::Xword StringTableSize{0};
Copy link
Member

@MaskRay MaskRay Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm style prefers = 0 to {0}. and = nullptr below

Copy link
Contributor Author

@cabbaken cabbaken Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the constructor is marked as explicit. Try this:

typename ELFT::Xword StringTableSize{0};

This worked locally for me with a Visual Studio build at least.
#125679 (comment)

I followed this. Do you have any suggestions on which style is the correct one to use here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typename ELFT::Xword StringTableSize = 0;

Copy link
Contributor Author

@cabbaken cabbaken Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typename ELFT::Xword StringTableSize = 0;

I got error with this.

No viable conversion from 'int' to 'typename ELFType<llvm::endianness::little, false>::Xword' (aka 'packed_endian_specific_integral<unsigned long, (llvm::endianness)1, 1>')

I have discussed this with @jh7370 here, I initially use

typename ELFT::Xword StringTableSize = (typename ELFT::Xword)0;

which is unnecessarily redundant, then I modified to

auto StringTableSize = (typename ELFT::Xword)0;

but @jh7370 recommended

typename ELFT::Xword StringTableSize{0};

cabbaken added a commit to cabbaken/llvm-project that referenced this pull request Feb 20, 2025
More info: llvm#125679 (comment)

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
jh7370 pushed a commit that referenced this pull request Feb 24, 2025
The dynamic string table used by the dynamic section is referenced by
the sh_link field of that section, so we should use that directly,
rather than going via the dynamic symbol table.
More info:
#125679 (comment)

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 24, 2025
…) (#127975)

The dynamic string table used by the dynamic section is referenced by
the sh_link field of that section, so we should use that directly,
rather than going via the dynamic symbol table.
More info:
llvm/llvm-project#125679 (comment)

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
@cabbaken cabbaken requested review from jh7370 and MaskRay February 28, 2025 01:31
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
@cabbaken cabbaken requested a review from jh7370 February 28, 2025 11:01
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @MaskRay, any final thoughts before this gets merged? @cabbaken, do you need me to merge this for you?

@cabbaken
Copy link
Contributor Author

cabbaken commented Mar 3, 2025

Thanks.
Yes, I want to merge this.

@MaskRay MaskRay merged commit 82f2b66 into llvm:main Mar 10, 2025
11 checks passed
@cabbaken
Copy link
Contributor Author

❤️❤️❤️❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llvm-objdump: printDynamicSection() out-of-bounds read
4 participants