Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm-objcopy][ELF] Disable huge section offset #97036

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

djtodoro
Copy link
Collaborator

@djtodoro djtodoro commented Jun 28, 2024

By default, GNU objcopy issues a warning for large offsets in ELF
files. To align with this behavior, we introduce an option to
define a maximum file offset for a section. This allows us to skip
creating excessively large files by specifying a threshold beyond
which the file offset will trigger an error.

It addresses the problem described in [0].

[0] #88878

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 28, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: Djordje Todorovic (djtodoro)

Changes

Match GNU objcopy's behaviour.
This is controled by an option, that is set to OFF by defatult.

It addresses the problem described in [0].

#88878


Full diff: https://github.com/llvm/llvm-project/pull/97036.diff

7 Files Affected:

  • (modified) llvm/include/llvm/ObjCopy/ELF/ELFConfig.h (+1)
  • (modified) llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp (+8-7)
  • (modified) llvm/lib/ObjCopy/ELF/ELFObject.cpp (+9)
  • (modified) llvm/lib/ObjCopy/ELF/ELFObject.h (+8-3)
  • (added) llvm/test/tools/llvm-objcopy/ELF/disable-huge-negative-offset.yaml (+27)
  • (modified) llvm/tools/llvm-objcopy/ObjcopyOptions.cpp (+2)
  • (modified) llvm/tools/llvm-objcopy/ObjcopyOpts.td (+4)
diff --git a/llvm/include/llvm/ObjCopy/ELF/ELFConfig.h b/llvm/include/llvm/ObjCopy/ELF/ELFConfig.h
index eafed92516c7d..29e55fcddb23a 100644
--- a/llvm/include/llvm/ObjCopy/ELF/ELFConfig.h
+++ b/llvm/include/llvm/ObjCopy/ELF/ELFConfig.h
@@ -30,6 +30,7 @@ struct ELFConfig {
   bool AllowBrokenLinks = false;
   bool KeepFileSymbols = false;
   bool LocalizeHidden = false;
+  bool DisableHugeSectionOffset = false;
 };
 
 } // namespace objcopy
diff --git a/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp b/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
index f343d1447e055..b1231033bb509 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
+++ b/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
@@ -176,11 +176,12 @@ static std::unique_ptr<Writer> createELFWriter(const CommonConfig &Config,
 }
 
 static std::unique_ptr<Writer> createWriter(const CommonConfig &Config,
+                                            const ELFConfig &ELFConfig,
                                             Object &Obj, raw_ostream &Out,
                                             ElfType OutputElfType) {
   switch (Config.OutputFormat) {
   case FileFormat::Binary:
-    return std::make_unique<BinaryWriter>(Obj, Out, Config);
+    return std::make_unique<BinaryWriter>(Obj, Out, Config, ELFConfig);
   case FileFormat::IHex:
     return std::make_unique<IHexWriter>(Obj, Out, Config.OutputFilename);
   case FileFormat::SREC:
@@ -794,10 +795,10 @@ static Error handleArgs(const CommonConfig &Config, const ELFConfig &ELFConfig,
   return Error::success();
 }
 
-static Error writeOutput(const CommonConfig &Config, Object &Obj,
-                         raw_ostream &Out, ElfType OutputElfType) {
+static Error writeOutput(const CommonConfig &Config, const ELFConfig &ELFConfig,
+                         Object &Obj, raw_ostream &Out, ElfType OutputElfType) {
   std::unique_ptr<Writer> Writer =
-      createWriter(Config, Obj, Out, OutputElfType);
+      createWriter(Config, ELFConfig, Obj, Out, OutputElfType);
   if (Error E = Writer->finalize())
     return E;
   return Writer->write();
@@ -815,7 +816,7 @@ Error objcopy::elf::executeObjcopyOnIHex(const CommonConfig &Config,
       getOutputElfType(Config.OutputArch.value_or(MachineInfo()));
   if (Error E = handleArgs(Config, ELFConfig, **Obj))
     return E;
-  return writeOutput(Config, **Obj, Out, OutputElfType);
+  return writeOutput(Config, ELFConfig, **Obj, Out, OutputElfType);
 }
 
 Error objcopy::elf::executeObjcopyOnRawBinary(const CommonConfig &Config,
@@ -833,7 +834,7 @@ Error objcopy::elf::executeObjcopyOnRawBinary(const CommonConfig &Config,
       getOutputElfType(Config.OutputArch.value_or(MachineInfo()));
   if (Error E = handleArgs(Config, ELFConfig, **Obj))
     return E;
-  return writeOutput(Config, **Obj, Out, OutputElfType);
+  return writeOutput(Config, ELFConfig, **Obj, Out, OutputElfType);
 }
 
 Error objcopy::elf::executeObjcopyOnBinary(const CommonConfig &Config,
@@ -853,7 +854,7 @@ Error objcopy::elf::executeObjcopyOnBinary(const CommonConfig &Config,
   if (Error E = handleArgs(Config, ELFConfig, **Obj))
     return createFileError(Config.InputFilename, std::move(E));
 
-  if (Error E = writeOutput(Config, **Obj, Out, OutputElfType))
+  if (Error E = writeOutput(Config, ELFConfig, **Obj, Out, OutputElfType))
     return createFileError(Config.InputFilename, std::move(E));
 
   return Error::success();
diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.cpp b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
index 02591e6f987c2..9e090f4ff1cb7 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObject.cpp
+++ b/llvm/lib/ObjCopy/ELF/ELFObject.cpp
@@ -2706,6 +2706,15 @@ Error BinaryWriter::finalize() {
     if (Sec.Type != SHT_NOBITS && Sec.Size > 0) {
       Sec.Offset = Sec.Addr - MinAddr;
       TotalSize = std::max(TotalSize, Sec.Offset + Sec.Size);
+
+      // More or less, this makes sense to do for 32-bit targets.
+      if (DisableHugeSectionOffset and not Obj.Is64Bits) {
+        int FilePosition = Sec.Offset;
+        if (FilePosition < 0)
+          return createStringError(errc::file_too_large,
+                                   "writing section " + Sec.Name +
+                                       " at huge (ie negative) file offset");
+      }
     }
 
   Buf = WritableMemoryBuffer::getNewMemBuffer(TotalSize);
diff --git a/llvm/lib/ObjCopy/ELF/ELFObject.h b/llvm/lib/ObjCopy/ELF/ELFObject.h
index 2b1895a30b41e..02abcfa7f10a0 100644
--- a/llvm/lib/ObjCopy/ELF/ELFObject.h
+++ b/llvm/lib/ObjCopy/ELF/ELFObject.h
@@ -15,6 +15,7 @@
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/MC/StringTableBuilder.h"
 #include "llvm/ObjCopy/CommonConfig.h"
+#include "llvm/ObjCopy/ELF/ELFConfig.h"
 #include "llvm/Object/ELFObjectFile.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/FileOutputBuffer.h"
@@ -366,12 +367,16 @@ class BinaryWriter : public Writer {
 
   uint64_t TotalSize = 0;
 
+  bool DisableHugeSectionOffset = false;
+
 public:
   ~BinaryWriter() {}
   Error finalize() override;
   Error write() override;
-  BinaryWriter(Object &Obj, raw_ostream &Out, const CommonConfig &Config)
-      : Writer(Obj, Out), GapFill(Config.GapFill), PadTo(Config.PadTo) {}
+  BinaryWriter(Object &Obj, raw_ostream &Out, const CommonConfig &Config,
+               const ELFConfig &ELFConfig)
+      : Writer(Obj, Out), GapFill(Config.GapFill), PadTo(Config.PadTo),
+        DisableHugeSectionOffset(ELFConfig.DisableHugeSectionOffset) {}
 };
 
 // A base class for writing ascii hex formats such as srec and ihex.
@@ -667,7 +672,7 @@ class CompressedSection : public SectionBase {
 
 public:
   CompressedSection(const SectionBase &Sec,
-    DebugCompressionType CompressionType, bool Is64Bits);
+                    DebugCompressionType CompressionType, bool Is64Bits);
   CompressedSection(ArrayRef<uint8_t> CompressedData, uint32_t ChType,
                     uint64_t DecompressedSize, uint64_t DecompressedAlign);
 
diff --git a/llvm/test/tools/llvm-objcopy/ELF/disable-huge-negative-offset.yaml b/llvm/test/tools/llvm-objcopy/ELF/disable-huge-negative-offset.yaml
new file mode 100644
index 0000000000000..5769211e2dd4f
--- /dev/null
+++ b/llvm/test/tools/llvm-objcopy/ELF/disable-huge-negative-offset.yaml
@@ -0,0 +1,27 @@
+# RUN: yaml2obj %s --docnum=1 -o %t
+# RUN: not llvm-objcopy -O binary %t %t2 --disable-huge-section-offset 2>&1 | FileCheck %s
+
+# CHECK: writing section .high_addr at huge (ie negative) file offset
+
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS32
+  Data:            ELFDATA2LSB
+  Type:            ET_EXEC
+  Machine:         EM_MIPS
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x1000
+    Content:         "00112233445566778899AABBCCDDEEFF"
+  - Name:            .data
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_WRITE ]
+    Address:         0x2000
+    Content:         "112233445566778899AABBCCDDEEFF00"
+  - Name:            .high_addr
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_WRITE ]
+    Address:         0x80001000
+    Content:         "2233445566778899AABBCCDDEEFF0011"
diff --git a/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp b/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
index 4ab3b7265f2f6..92728cb2ea865 100644
--- a/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
+++ b/llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
@@ -942,6 +942,8 @@ objcopy::parseObjcopyOptions(ArrayRef<const char *> RawArgsArr,
   Config.ExtractMainPartition =
       InputArgs.hasArg(OBJCOPY_extract_main_partition);
   ELFConfig.LocalizeHidden = InputArgs.hasArg(OBJCOPY_localize_hidden);
+  ELFConfig.DisableHugeSectionOffset =
+      InputArgs.hasArg(OBJCOPY_disable_huge_section_offsets);
   Config.Weaken = InputArgs.hasArg(OBJCOPY_weaken);
   if (auto *Arg =
           InputArgs.getLastArg(OBJCOPY_discard_all, OBJCOPY_discard_locals)) {
diff --git a/llvm/tools/llvm-objcopy/ObjcopyOpts.td b/llvm/tools/llvm-objcopy/ObjcopyOpts.td
index 4bc80eba05f8e..723f73ace7928 100644
--- a/llvm/tools/llvm-objcopy/ObjcopyOpts.td
+++ b/llvm/tools/llvm-objcopy/ObjcopyOpts.td
@@ -145,6 +145,10 @@ def localize_hidden
     : Flag<["--"], "localize-hidden">,
       HelpText<
           "Mark all symbols that have hidden or internal visibility as local">;
+def disable_huge_section_offsets
+    : Flag<["--"], "disable-huge-section-offset">,
+      HelpText<
+          "Emit an error if input section has a huge file offset">;
 defm localize_symbol
     : Eq<"localize-symbol", "Mark any defined non-common symbol named <symbol> as local">,
       MetaVarName<"symbol">;

@jh7370
Copy link
Collaborator

jh7370 commented Jul 1, 2024

@MaskRay, what do you think about this patch? I've not really got a good understanding of binary output, so can't be 100% confident about any reviewing I do.

@MaskRay
Copy link
Member

MaskRay commented Jul 2, 2024

@MaskRay, what do you think about this patch? I've not really got a good understanding of binary output, so can't be 100% confident about any reviewing I do.

Frankly, I hope that we leave llvm-objcopy unchanged for #88878.
Users can request object files with sections exceeding 0x80000000, and llvm-objcopy should handle those requests as intended.
Overly defensive measures often create more problems than they solve. If someone adds a new writer, do we port this output size limiting code to that?
An optional flag for this behavior seems unlikely to be used. In addition, -O binary is niche.

Note: we did introduce a default size limit for yaml2obj (10MiB) https://reviews.llvm.org/D81258 .
The primary reason is that yaml2obj is an internal testing tool, instead of a user-facing one.

@djtodoro
Copy link
Collaborator Author

djtodoro commented Sep 2, 2024

(copying comment from the issue itself)

Hi @MaskRay,

The idea is to assume that everything is placed at a certain address (let's say 0x40000000), and then to declare a variable in a new named section using __attribute__, so if this section is not added to the linker script layout, it will be placed at VMA 0x0 along with debug and other non-allocated sections, even though it is allocated, resulting in a sparse binary.

I've managed to (somewhat) reproduce the problem with a simple case for the MIPS architecture:

$ cat ./linker.ld 
SECTIONS
{
  . = 0x40000000;
  .text : { *(.text) }
  .data : { *(.data) }
  .bss : { *(.bss) }

  /DISCARD/ : { *(.debug*) }
}
$ cat test.c
// Place this variable in a custom section
__attribute__((section(".custom_section")))
int my_variable = 0x12345678;

int main() {
    return 0;
}
$ mips-img-linux-gnu/2017.10-05/bin/mips-img-linux-gnu-gcc test.c -g -c -o main.o && mips-img-linux-gnu/2017.10-05/bin/mips-img-linux-gnu-ld -T linker.ld main.o -o main.elf

By default, we won't see the file truncated error report, since the section at a negative offset (".pdr") won't be loaded, so it is being ignored by objcopy.

$ mips-img-linux-gnu-objcopy -O binary main.elf test.bin

(even though the file position/offset is negative for the section: -1073741824).

But, if I add a load flag for the section it will be reported:

$ mips-img-linux-gnu-objcopy --set-section-flags .pdr=load -O binary main.elf test.bin
mips-img-linux-gnu-objcopy: test.bin[.pdr]: file truncated

This issue isn't just theoretical; it reflects a real use-case with one of our embedded targets. The yaml2obj tool was only used to create a problematic scenario for testing, but our actual use-case involves auto-generated linker scripts that result in such sections.

Given these practical implications, having an optional flag in llvm-objcopy to handle such cases would be beneficial. It would provide a straightforward solution for users facing similar issues without impacting the default behavior.

In embedded systems, size really matters, and the file truncated error from BFD (bfd_error_file_truncated) helps us detect potential file size increases that could be problematic.

Thanks again for your insights.

@djtodoro
Copy link
Collaborator Author

We don't need to precisely replicate GNU objcopy's behavior, so I've introduced a new option, --set-max-huge-section-offset, which should suffice for our requirements.

Any comments on this?

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docs to include a description of the option too.

I have my doubts about whether this will really be used in practice beyond your team, if I'm honest. It is however isolated and doesn't really impact anybody else, so I'm personally not opposed. I'd still like @MaskRay to comment on this though.

llvm/lib/ObjCopy/ELF/ELFObject.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/ObjCopy/ELF/ELFConfig.h Outdated Show resolved Hide resolved
llvm/lib/ObjCopy/ELF/ELFObject.h Outdated Show resolved Hide resolved
llvm/tools/llvm-objcopy/ObjcopyOpts.td Outdated Show resolved Hide resolved
@djtodoro djtodoro force-pushed the pr/objcopy-disable-huge-sec-offset branch from ef2c3ba to df0c7f2 Compare September 11, 2024 11:21
@djtodoro
Copy link
Collaborator Author

@jh7370 Thanks a lot for your feedback!

In my opinion, the more teams that use LLVM tools instead of GNU tools for embedded systems, the higher the likelihood of this issue being triggered.

Copy link

github-actions bot commented Sep 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

By default, GNU objcopy issues a warning for large offsets in ELF
files. To align with this behavior, we introduce an option to
define a maximum file offset for a section. This allows us to skip
creating excessively large files by specifying a threshold beyond
which the file offset will trigger an error.
@djtodoro djtodoro force-pushed the pr/objcopy-disable-huge-sec-offset branch from df0c7f2 to 548f073 Compare September 11, 2024 11:26
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the LLVM PR guidelines, please don't force push to PR branches once they are being reviewed - there's zero reason to, and it makes it harder to see what's changed since the previous version of the PR. Simply add fixup commits and, if really needed, merge commits can be used to merge in changes in main into the PR branch.

Also, please don't mark comments as "resolved" where I've started the discussion - I use these to track whether I'm happy with work done to address the points and I can't do that if they've already been marked as resolved (see https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178 for more context).

@jh7370 Thanks a lot for your feedback!

In my opinion, the more teams that use LLVM tools instead of GNU tools for embedded systems, the higher the likelihood of this issue being triggered.

I don't disagree with this point, but as this option is off by default, it's unlikely most of these people will use this option.

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp Outdated Show resolved Hide resolved
llvm/docs/CommandGuide/llvm-objcopy.rst Outdated Show resolved Hide resolved
llvm/tools/llvm-objcopy/ObjcopyOpts.td Outdated Show resolved Hide resolved
@arsenm
Copy link
Contributor

arsenm commented Sep 13, 2024

Also, please don't mark comments as "resolved" where I've started the discussion - I use these to track whether I'm happy with work done to address the points and I can't do that if they've already been marked as resolved (see https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178 for more context).

This RFC was rejected and I do not want conversations left unresolved. Leaving them unresolved just adds additional work for reviewers

@jh7370
Copy link
Collaborator

jh7370 commented Sep 13, 2024

Also, please don't mark comments as "resolved" where I've started the discussion - I use these to track whether I'm happy with work done to address the points and I can't do that if they've already been marked as resolved (see https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178 for more context).

This RFC was rejected

The RFC wasn't rejected. There was simply no specific conclusion to the discussion. In fact, I believe the closest thing was that the process should be based on the reviewer's preferences, because they're doing the leg work to review and confirm they're happy that the patch meets LLVM standards and can be accepted into the codebase. Taking actions that make it harder for a reviewer to confirm the quality of the PR, or slowing down that process is detrimental to everyone's experiences. I've asked for conversations to be left unresolved so that I can review them and ensure all actions have been taken, and I've linked to the PR to make it clear my motivations for requesting this - I typically then resolve them myself when I'm happy that they have been. Other reviewers may have different preferences, but these are mine, and I've been the one most involved with this review, plus these are my conversations, so surely my preferences are what matter most?

and I do not want conversations left unresolved. Leaving them unresolved just adds additional work for reviewers

For reviewers in general, or you specifically? Marking them as resolved adds to the additional work I have to take as a reviewer. I'm already partially inclined to decline this patch, because I'm not sure of the usefulness of it in the general case (and @MaskRay has said similar things before), so anything done that makes it harder for me to review things will just make me less inclined to review this favourably, or to ignore it completely, resulting in the PR not being landable due to the lack of consensus.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've reviewed the PR and if we accept the new option, I'm happy with the implementation. I'd like further input from @MaskRay before this gets landed, but I gather he's away at the moment, so there may be a bit of a delay hearing back from him.

@djtodoro
Copy link
Collaborator Author

@jh7370 Thanks a lot for your feedback and review!

Cleanup error message for --max-section-offset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants