Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm-objcopy: Emit warnings when handling sparse ELF files #88878

Open
djtodoro opened this issue Apr 16, 2024 · 20 comments
Open

llvm-objcopy: Emit warnings when handling sparse ELF files #88878

djtodoro opened this issue Apr 16, 2024 · 20 comments

Comments

@djtodoro
Copy link
Collaborator

The objcopy from GNU binutils (bfd actually [0]) when dealing with sparse ELF input file reports warning like:

Warning: Writing section `.sectionA' to huge (ie negative) file offset 0xa200c999.

Can we implement something like this in llvm-objcopy? Since the logic for copying is implemented within the ELFObjcopy which is being used in modules other than llvm-objcopy, is it OK to emit warning in the library it self?

[0] https://github.com/bminor/binutils-gdb/blob/master/bfd/binary.c#L277

@djtodoro
Copy link
Collaborator Author

cc @MaskRay

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2024

@llvm/issue-subscribers-tools-llvm-objcopy-strip

Author: Djordje Todorovic (djtodoro)

The `objcopy` from GNU binutils (`bfd` actually [0]) when dealing with sparse ELF input file reports warning like:
Warning: Writing section `.sectionA' to huge (ie negative) file offset 0xa200c999.

Can we implement something like this in llvm-objcopy? Since the logic for copying is implemented within the ELFObjcopy which is being used in modules other than llvm-objcopy, is it OK to emit warning in the library it self?

[0] https://github.com/bminor/binutils-gdb/blob/master/bfd/binary.c#L277

@jh7370
Copy link
Collaborator

jh7370 commented Apr 16, 2024

Can we implement something like this in llvm-objcopy?

Could you clarify what you mean by "sparse" ELF files? I'm not aware of a feature like that in the regular ELF standard (that's not to say we don't care about it, I'm just not familiar with the concept). Also, could you outline your use-case, please? In other words, why would a warning be helpful?

Since the logic for copying is implemented within the ELFObjcopy which is being used in modules other than llvm-objcopy, is it OK to emit warning in the library it self?

Not directly. Rather than explain the full thing, I refer you to my lightning talk on error handling in libraries: https://www.youtube.com/watch?v=YSEY4pg1YB0. Whilst this specifically talks about the DWARF debug code, the principles are generic. You'll want a way of bubbling up a warning to the tool layer, so that the llvm-objcopy tool itself can properly handle the warning.

That being said, I suspect a warning won't be what we actually want. Either the tool should support a kind of file, in which case it should be able to handle it properly without warnings, or it shouldn't, in which case it should generate an error when it encounters such a file. I'm not convinced without more information that a warning like this would make sense.

@djtodoro
Copy link
Collaborator Author

Hi @jh7370 for your answer.

I am really sorry for putting this on the side.

In general, "sparse" ELF is not a standardized term, but it refers to instances where there is unused content between sections in the binary. objcopy from binutils is capable of recognizing such examples. For instance, here is a very simple YAML/ELF that illustrates this concept (representing a MIPS 32-bit target, as LLVM will report an out-of-memory error for a similar scenario with a 64-bit target):

$ cat test.yaml 
--- !ELF
FileHeader:
  Class:           ELFCLASS32
  Data:            ELFDATA2LSB
  Type:            ET_EXEC
  Machine:         EM_MIPS
Sections:
  - Name:            .text
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
    Address:         0x1000
    Content:         "00112233445566778899AABBCCDDEEFF"
  - Name:            .data
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_WRITE ]
    Address:         0x2000
    Content:         "112233445566778899AABBCCDDEEFF00"
  - Name:            .high_addr
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_WRITE ]
    Address:         0x80001000
    Content:         "2233445566778899AABBCCDDEEFF0011"

$ yaml2obj --docnum=1 test.yaml -o a.out.elf
$ llvm-objcopy -O binary a.out.elf test.bin

The llvm-objcopy will create a really huge file, while objcopy will report the error that indicates that the file is truncated.

$ stat test.bin
  File: test.bin
  Size: 2147483664	Blocks: 4194320    IO Block: 4096   regular file

It would be useful to prevent llvm-objcopy from creating files like this. The warning I mentioned can be implemented as an llvm::Error. I suggest adding a piece of code for it within the following file: https://github.com/llvm/llvm-project/blob/main/llvm/lib/ObjCopy/ELF/ELFObject.cpp#L2686

@jh7370
Copy link
Collaborator

jh7370 commented Jun 25, 2024

Thank you for the explanation. Just to make sure expectations are clear: I don't routinely make code changes to LLVM, but I do try to review some areas of code (llvm-objcopy being one such area). As such, for this issue to be fixed, somebody else will need to do the actual coding.

I don't think we can generate an llvm::Error in the way you are suggesting, because doing so will ultimately lead to the tool completely failing. However, it may be perfectly legitimate to create such a large binary, based on what the user has requested. We need a way to warn without failing, and ideally a way to suppress that warning. As a slight alternative, we could perhaps introduce a "--max-output-file-size" switch (exact name to be determined), which would default to some sensible value and can be overridden with a different value. If this value were exceeded, llvm-objcopy would generate an error. I think this would need a bit of discussion to ensure we don't break too many people's existing builds though.

@djtodoro
Copy link
Collaborator Author

Thank you for the explanation. Just to make sure expectations are clear: I don't routinely make code changes to LLVM, but I do try to review some areas of code (llvm-objcopy being one such area). As such, for this issue to be fixed, somebody else will need to do the actual coding.

Sure. Thanks a lot. I will do the codding.

I don't think we can generate an llvm::Error in the way you are suggesting, because doing so will ultimately lead to the tool completely failing. However, it may be perfectly legitimate to create such a large binary, based on what the user has requested. We need a way to warn without failing, and ideally a way to suppress that warning. As a slight alternative, we could perhaps introduce a "--max-output-file-size" switch (exact name to be determined), which would default to some sensible value and can be overridden with a different value. If this value were exceeded, llvm-objcopy would generate an error. I think this would need a bit of discussion to ensure we don't break too many people's existing builds though.

Hmm, it makes sense. Let me double check what is the best way to achieve that, and I will be back.

@djtodoro
Copy link
Collaborator Author

Hello @jh7370. I think I have made a solution that can match GNU's objcopy behavior, and it is controlled by an option that is set to false by default. Please take a look at #97036.

@MaskRay
Copy link
Member

MaskRay commented Jul 3, 2024

Copying my comment at #97036

Frankly, I hope that we leave llvm-objcopy unchanged.

Users can request object files with sections exceeding 0x80000000, and llvm-objcopy should handle those requests as intended.
Overly defensive measures often create more problems than they solve. If someone adds a new writer, do we port this output size limiting code to that?
An optional flag for this behavior seems unlikely to be used. In addition, -O binary is niche.

Note: we did introduce a default size limit for yaml2obj (10MiB) reviews.llvm.org/D81258 .
The primary reason is that yaml2obj is an internal testing tool, instead of a user-facing one.

@djtodoro
Copy link
Collaborator Author

djtodoro commented Jul 4, 2024

Hi @MaskRay, thanks a lot for your comment!

The problem itself is not in the YAML file (or yaml2obj), as I used it to create a problematic scenario. We are facing a real use-case with one of our embedded targets. I am working on creating a small reproducer in C and will get back to you very soon.

@djtodoro
Copy link
Collaborator Author

djtodoro commented Jul 8, 2024

Hi @MaskRay,

The idea is to assume that everything is placed at a certain address (let's say 0x40000000), and then to declare a variable in a new named section using __attribute__, so if this section is not added to the linker script layout, it will be placed at VMA 0x0 along with debug and other non-allocated sections, even though it is allocated, resulting in a sparse binary.

I've managed to (somewhat) reproduce the problem with a simple case for the MIPS architecture:

$ cat ./linker.ld 
SECTIONS
{
  . = 0x40000000;
  .text : { *(.text) }
  .data : { *(.data) }
  .bss : { *(.bss) }

  /DISCARD/ : { *(.debug*) }
}
$ cat test.c
// Place this variable in a custom section
__attribute__((section(".custom_section")))
int my_variable = 0x12345678;

int main() {
    return 0;
}
$ mips-img-linux-gnu/2017.10-05/bin/mips-img-linux-gnu-gcc test.c -g -c -o main.o && mips-img-linux-gnu/2017.10-05/bin/mips-img-linux-gnu-ld -T linker.ld main.o -o main.elf

By default, we won't see the file truncated error report, since the section at a negative offset (".pdr") won't be loaded, so it is being ignored by objcopy.

$ mips-img-linux-gnu-objcopy -O binary main.elf test.bin

(even though the file position/offset is negative for the section: -1073741824).

But, if I add a load flag for the section it will be reported:

$ mips-img-linux-gnu-objcopy --set-section-flags .pdr=load -O binary main.elf test.bin
mips-img-linux-gnu-objcopy: test.bin[.pdr]: file truncated

This issue isn't just theoretical; it reflects a real use-case with one of our embedded targets. The yaml2obj tool was only used to create a problematic scenario for testing, but our actual use-case involves auto-generated linker scripts that result in such sections.

Given these practical implications, having an optional flag in llvm-objcopy to handle such cases would be beneficial. It would provide a straightforward solution for users facing similar issues without impacting the default behavior.

In embedded systems, size really matters, and the file truncated error from BFD (bfd_error_file_truncated) helps us detect potential file size increases that could be problematic.

Thanks again for your insights.

@djtodoro
Copy link
Collaborator Author

@MaskRay any comment on this?

@djtodoro
Copy link
Collaborator Author

cc @jh7370 :)

@jh7370
Copy link
Collaborator

jh7370 commented Jul 25, 2024

cc @jh7370 :)

Not sure what you're hoping to gain from this CC. It's @MaskRay who needs to be persuaded. I still don't really follow what "binary" output is supposed to even do, and it's certainly not something I or our downstream users ever use, so I can't advocate for any particular behaviour.

@djtodoro
Copy link
Collaborator Author

OK. I wanted to double check if you may have some additional inputs after those additional comments I have made. Sorry about that. I will wait for @MaskRay.

@MaskRay
Copy link
Member

MaskRay commented Sep 17, 2024

Do you know why the error message mips-img-linux-gnu-objcopy: test.bin[.pdr]: file truncated (presumably MIPS32) is emitted?

I don't see the error with i386.

% gcc -m32 -c test.c -g -c -o main.o
% ld.bfd -m elf_i386 -T linker.ld main.o -o main.elf
ld.bfd: warning: main.elf has a LOAD segment with RWX permissions
% objcopy -O binary main.elf test.bin --set-section-flags .pdr=load

The binutils condition is if (s->filepos < 0). A negative file offset check makes sense. #97036 introduces an opt-in option. Something like --max-section-offset=0x80000000 is different from the < 0 check and the opt-in nature would prevent its adoption, and I would be conservative about adding such an option.

@djtodoro
Copy link
Collaborator Author

@MaskRay Thanks for the comment!

I would say that the C code I shared isn't the most effective reproducer, and it requires more intricate adjustments in the linker script to achieve the desired behavior.
We should be focused to this example, since it represents the problematic ELF:

$ cat test.yaml 
--- !ELF
FileHeader:
  Class:           ELFCLASS32
  Data:            ELFDATA2LSB
  Type:            ET_EXEC
  Machine:         EM_MIPS
Sections:
  - Name:            .text
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
    Address:         0x1000
    Content:         "00112233445566778899AABBCCDDEEFF"
  - Name:            .data
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_WRITE ]
    Address:         0x2000
    Content:         "112233445566778899AABBCCDDEEFF00"
  - Name:            .high_addr
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_WRITE ]
    Address:         0x80001000
    Content:         "2233445566778899AABBCCDDEEFF0011"

$ yaml2obj --docnum=1 test.yaml -o a.out.elf

When we are using the latest binutils compiled for MIPS32, the s->filepos [0] is -2147483648 since the BFD is configured for 32-bit, and the file_ptr [1] type is 32-bit integer. I guess, the reason you do not see it in the case of i386 is that BFD is still configured for BFD 64-bit, and the s->filepos is not negative number in that case (if you are using an x86_64 machine, in the yml test, you can change the Machine to EM_386 and the filepos will be 2147483648).

So, in [2], since Sec.Offset is already 64-bit [3], we can do type-casting to int for overflow, and then to check FilePos < 0, but I am not sure if we want to do that. Not sure if binutils is right in this case since we are trying to solve a problem with physical memory, since the binary produced becomes very large.

[0] https://github.com/bminor/binutils-gdb/blob/master/bfd/binary.c#L258
[1] https://github.com/bminor/binutils-gdb/blob/master/bfd/bfd-in2.h#L768
[2] #97036
[3] https://github.com/llvm/llvm-project/blob/main/llvm/lib/ObjCopy/ELF/ELFObject.h#L528

@MaskRay
Copy link
Member

MaskRay commented Sep 21, 2024

Thanks for the explanation. So this condition is enabled when binutils is built for 32-bit targets. llvm-project generally tries to avoid 32-bit/64-bit differences and I'd certainly avoid such checks.

Since we prefer generality and want to avoid magic values, --max-section-offset, even if added, would not be the default.
I wonder whether it's really useful... Say, you specify --max-section-offset 0x80000000, do you want to report an error when there is a section at 0x7fffffff with size > 1?

If you want to limit output file sizes, use ulimit -f. llvm-objcopy prints a stack trace and LLVM ERROR: IO failure on output stream: File too large.

@djtodoro
Copy link
Collaborator Author

Thanks for the explanation. So this condition is enabled when binutils is built for 32-bit targets. llvm-project generally tries to avoid 32-bit/64-bit differences and I'd certainly avoid such checks.

In this particular case, yes, but I think we can create a scenario where the offset becomes negative even in BFD 64-bit configuration.

Since we prefer generality and want to avoid magic values, --max-section-offset, even if added, would not be the default.
I wonder whether it's really useful... Say, you specify --max-section-offset 0x80000000, do you want to report an error when there is a section at 0x7fffffff with size > 1?

Eh, I see your point. I would like to see the error/warning triggered in any case where we have "Sparse ELF" generated - so the gap without useful data after using the objcopy. The objcopy from binutils won't catch this case you mentioned, but I think that it should have, since truncated ELF is being generated indeed, so the heuristic should be improved there as well.

If you want to limit output file sizes, use ulimit -f. llvm-objcopy prints a stack trace and LLVM ERROR: IO failure on output stream: File too large.

Thanks for the recommendation, but I think I see two problems with this:

  1. ulimt will limit file sizes on entire system, but one could cross compile for various targets from the machine (e.g. for non-embedded system/target)
  2. More important one, I think that we want to prevent production of large Sparse/truncated ELF files, but not only large ELF files. This feature in binutils is very useful for us, since we know that we are not wasting physical memory on our embedded device.

@MaskRay
Copy link
Member

MaskRay commented Sep 24, 2024

Thanks for the explanation. So this condition is enabled when binutils is built for 32-bit targets. llvm-project generally tries to avoid 32-bit/64-bit differences and I'd certainly avoid such checks.

In this particular case, yes, but I think we can create a scenario where the offset becomes negative even in BFD 64-bit configuration.

Since we prefer generality and want to avoid magic values, --max-section-offset, even if added, would not be the default.
I wonder whether it's really useful... Say, you specify --max-section-offset 0x80000000, do you want to report an error when there is a section at 0x7fffffff with size > 1?

Eh, I see your point. I would like to see the error/warning triggered in any case where we have "Sparse ELF" generated - so the gap without useful data after using the objcopy. The objcopy from binutils won't catch this case you mentioned, but I think that it should have, since truncated ELF is being generated indeed, so the heuristic should be improved there as well.

There is no such concept called "Sparse ELF". It is probably mentioned in one binutils comment, but is never accepted widely.
It doesn't have a clear definition. Code detecting "sparse ELF" could be very brittle and reject completely valid input object files.

For example, we could argue that output files > 4GiB are weird, but it could be very valid to generate such files (it's also correct even if we use regular .rodata,.data,.text - valid in x86-64 large code model and other architectures' large code model).

If you want to limit output file sizes, use ulimit -f. llvm-objcopy prints a stack trace and LLVM ERROR: IO failure on output stream: File too large.

Thanks for the recommendation, but I think I see two problems with this:

  1. ulimt will limit file sizes on entire system, but one could cross compile for various targets from the machine (e.g. for non-embedded system/target)

rlimit is per-process. You can do this: (ulimit -f 8; llvm-objcopy main.elf out). You can use this for any binary manipulation programs, not just objcopy.

  1. More important one, I think that we want to prevent production of large Sparse/truncated ELF files, but not only large ELF files. This feature in binutils is very useful for us, since we know that we are not wasting physical memory on our embedded device.

@djtodoro
Copy link
Collaborator Author

djtodoro commented Sep 25, 2024

Thanks for the explanation. So this condition is enabled when binutils is built for 32-bit targets. llvm-project generally tries to avoid 32-bit/64-bit differences and I'd certainly avoid such checks.

In this particular case, yes, but I think we can create a scenario where the offset becomes negative even in BFD 64-bit configuration.

Since we prefer generality and want to avoid magic values, --max-section-offset, even if added, would not be the default.
I wonder whether it's really useful... Say, you specify --max-section-offset 0x80000000, do you want to report an error when there is a section at 0x7fffffff with size > 1?

Eh, I see your point. I would like to see the error/warning triggered in any case where we have "Sparse ELF" generated - so the gap without useful data after using the objcopy. The objcopy from binutils won't catch this case you mentioned, but I think that it should have, since truncated ELF is being generated indeed, so the heuristic should be improved there as well.

There is no such concept called "Sparse ELF". It is probably mentioned in one binutils comment, but is never accepted widely. It doesn't have a clear definition. Code detecting "sparse ELF" could be very brittle and reject completely valid input object files.

For example, we could argue that output files > 4GiB are weird, but it could be very valid to generate such files (it's also correct even if we use regular .rodata,.data,.text - valid in x86-64 large code model and other architectures' large code model).

Sure, we are on the same page about that - that is why I use quotation marks when mentioning Sparse ELF. Just to clarify, I do not want to say that the ELF produced is wrong, but it would be very beneficial for some teams to have an option to prevent this type of ELF. The option we are proposing is off by default, and I guess easy to be maintained.

If you want to limit output file sizes, use ulimit -f. llvm-objcopy prints a stack trace and LLVM ERROR: IO failure on output stream: File too large.

Thanks for the recommendation, but I think I see two problems with this:

  1. ulimt will limit file sizes on entire system, but one could cross compile for various targets from the machine (e.g. for non-embedded system/target)

rlimit is per-process. You can do this: (ulimit -f 8; llvm-objcopy main.elf out). You can use this for any binary manipulation programs, not just objcopy.

Yeah, I agree. However, the problem we are trying to solve isn't the size of the file itself, but rather the large gap between sections in the ELF file (which is connected to/mentioned in the item 2).

  1. More important one, I think that we want to prevent production of large Sparse/truncated ELF files, but not only large ELF files. This feature in binutils is very useful for us, since we know that we are not wasting physical memory on our embedded device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants