Skip to content

[lldb] Set default object format to MachO in ObjectFileMachO #142704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 5, 2025

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented Jun 4, 2025

The Change

This patch sets the default object format of ObjectFileMachO to be MachO (instead of what currently ends up to be ELF, see below). This should be the correct thing to do, because the code before the line of change has already verified the Mach-O header.

The existing logic:

  • In ObjectFileMachO, the object format is unassigned by default. So it's UnknownObjectFormat (see code).
  • The code then looks at load commands like LC_VERSION_MIN_* (code) and LC_BUILD_VERSION (code) and assign the Triple's OS and Environment if they exist.
  • If the above sets the Triple's OS to macOS, then the object format defaults to MachO; otherwise it is ELF (code)

Impact

For production usage where Mach-O files have the said load commands (which is expected), this patch won't change anything.

  • Important note: It's not clear if there are legitimate production use cases where the Mach-O files don't have said load commands. If there is, the exiting code think they are ELF. This patch changes it to MachO. This is considered a fix for such files.

For unit tests, this patch will simplify the yaml data by not requiring the said load commands.

Test

A unit test is added to verify the patch, i.e. it fails without the patch and succeeds with.

$ ninja lldb-unit-test-deps
$ tools/lldb/unittests/ObjectFile/MachO/ObjectFileMachOTests --gtest_filter="*ObjectFormatWithoutVersionLoadCommand*"
...
[ RUN      ] ObjectFileMachOTest.ObjectFormatWithoutVersionLoadCommand
[       OK ] ObjectFileMachOTest.ObjectFormatWithoutVersionLoadCommand (3 ms)
...

Double-checked that this patch doesn't break any existing tests on both macOS (result) and Linux (result).

@royitaqi royitaqi requested a review from JDevlieghere as a code owner June 4, 2025 00:31
@llvmbot llvmbot added the lldb label Jun 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 4, 2025

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

The fact that ObjectFileMachO can parse the object file means that it's an Mach-O format. So it should just be set set the default format, before going through the rest of the code, which looks at load commands like LC_VERSION_MIN_* and LC_BUILD_VERSION.

For production usage where Mach-O files have the said load commands, this patch won't change anything.

  • Important note: It's not clear to me if there are legitimate production use cases where the Mach-O files don't have said load commands. If there is, then what should be their object format? Without this patch, they are currently ELF. With this patch, they will be MachO.

For unit tests, this patch will simplify the yaml data by not requiring the said load commands.

A unit test is added to verify the patch, i.e. it fails without the patch and succeeds with.

$ ninja lldb-unit-test-deps
$ tools/lldb/unittests/ObjectFile/MachO/ObjectFileMachOTests --gtest_filter="*ObjectFileFormatWithoutLoadCommand*"
[==========] Running 1 test from 1 test suite.
...
[ RUN      ] ObjectFileMachOTest.ObjectFileFormatWithoutLoadCommand
[       OK ] ObjectFileMachOTest.ObjectFileFormatWithoutLoadCommand (3 ms)
...

Full diff: https://github.com/llvm/llvm-project/pull/142704.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+1)
  • (modified) lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp (+55)
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 3950454b7c90e..0079672c5cbd0 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -5148,6 +5148,7 @@ void ObjectFileMachO::GetAllArchSpecs(const llvm::MachO::mach_header &header,
   llvm::Triple base_triple = base_arch.GetTriple();
   base_triple.setOS(llvm::Triple::UnknownOS);
   base_triple.setOSName(llvm::StringRef());
+  base_triple.setObjectFormat(llvm::Triple::MachO);
 
   if (header.filetype == MH_PRELOAD) {
     if (header.cputype == CPU_TYPE_ARM) {
diff --git a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
index 0ef2d0b85fd36..aab3e996343b3 100644
--- a/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
+++ b/lldb/unittests/ObjectFile/MachO/TestObjectFileMachO.cpp
@@ -94,4 +94,59 @@ TEST_F(ObjectFileMachOTest, IndirectSymbolsInTheSharedCache) {
   for (size_t i = 0; i < 10; i++)
     OF->ParseSymtab(symtab);
 }
+
+TEST_F(ObjectFileMachOTest, ObjectFileFormatWithoutLoadCommand) {
+  // A Mach-O file without the load command LC_BUILD_VERSION.
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x0100000C
+  cpusubtype:      0x00000000
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      152
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         184
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          184
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  // Perform setup.
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+
+  // Verify that the object file is recognized as Mach-O.
+  ASSERT_EQ(object_file->GetArchitecture().GetTriple().getObjectFormat(),
+            llvm::Triple::MachO);
+}
 #endif

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit.

@dmpots
Copy link
Contributor

dmpots commented Jun 4, 2025

It's not clear if there are legitimate production use cases where the Mach-O files don't have said load commands. If there is, then what is their expected object format? Without this patch, they are currently ELF. With this patch, they will become MachO

At the point in the code where you made your change we already assume that we have a Mach-O file because we are passing in and examining the header. I think it makes sense to set the triple to be Mach-O here.

Even if we have Mach-O files in the wild that don't have the version load commands, it seems those still must be treated as Mach-O files to make any sense of them.

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 4, 2025

It's not clear if there are legitimate production use cases where the Mach-O files don't have said load commands. If there is, then what is their expected object format? Without this patch, they are currently ELF. With this patch, they will become MachO

At the point in the code where you made your change we already assume that we have a Mach-O file because we are passing in and examining the header. I think it makes sense to set the triple to be Mach-O here.

Even if we have Mach-O files in the wild that don't have the version load commands, it seems those still must be treated as Mach-O files to make any sense of them.

Thanks for pointing that out.

I agree. This patch is fixing this bug for those Mach-O files (if they exist).

Updated the PR's description accordingly.

--

FWIW, I will run check lldb tests to verify if there is any such cases there.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is ok. Mach-o files have LC_SEGMENT_64 load commands, so no need to change the test really. I mean, we can probably remove it and still have a mach-o file, but not sure we need to.

@royitaqi royitaqi merged commit d4d2f06 into llvm:main Jun 5, 2025
8 of 9 checks passed
@jasonmolenda
Copy link
Collaborator

nb this PR is causing a failure on the x86_64 macOS CI bot https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/ in TestUniversal.py where image list -t -b returns a triple name of x86_64-apple-macosx10.9.0-macho instead of x86_64-apple-macosx10.9.0 and the API test isn't expecting that. I don't have an opinion on where this should be fixed yet -- if there's an issue with this PR, the way the test is written, or the way image list is printing the ArchSpec for the modules. I'm looking into it now.

@jasonmolenda
Copy link
Collaborator

In ObjectFileMachO::GetAllArchSpecs(), we call ArchSpec::SetArchitecture(),

base_arch.SetArchitecture(eArchTypeMachO, header.cputype, header.cpusubtype);

given a mach_header. This does

   890 	        m_triple.setArchName(llvm::StringRef(core_def->name));

Then in llvm's Triple::Triple(),

   1068	  ObjectFormat = getDefaultFormat(*this);

And that's where the object file format is being set to ELF,

   923 	static Triple::ObjectFormatType getDefaultFormat(const Triple &T) {
-> 924 	  switch (T.getArch()) {
   925 	  case Triple::UnknownArch:
   926 	  case Triple::aarch64:

   936 	    default:
   937 	      return T.isOSDarwin() ? Triple::MachO : Triple::ELF;

The triple being passed in here is just "arm64" at this point, so the T.isOSDarwin() fails, ObjectFile is set to ELF and it is never reset.

But I don't see why the ModuleArchSpec being set to ObjectFileELF (which I'd never noticed before, ever) was not showing up in the triples, but -macho is, when it's set in ObjectFileMachO. Maybe it's an ordering thing, where now that the object file is being set in ObjectFileMachO::GetAllArchSpecs it is constructing a triple string based on that, and it enters the picture.

It seems like it would be better to set the ObjectFile format in ArchSpec::SetArchitecture when we know we're building a triple for a mach-o file, to fix the erroneous ObjectFileELF setting from llvm when we only gave it a cpu name. but let me look a little more.

@jasonmolenda
Copy link
Collaborator

Aha, I see. Any call to llvm's setObjectFormat is going to add this to the triple string if there is no environ set:

   1608	void Triple::setObjectFormat(ObjectFormatType Kind) {
   1609	  if (Environment == UnknownEnvironment)
-> 1610	    return setEnvironmentName(getObjectFormatTypeName(Kind));
   1611	
   1612	  setEnvironmentName((getEnvironmentTypeName(Environment) + Twine("-") +
   1613	                      getObjectFormatTypeName(Kind)).str());
   1614	}

Well that's not great. I really don't want -macho in our triples, it looks terrible.

@jasonmolenda
Copy link
Collaborator

jasonmolenda commented Jun 5, 2025

In ArchSpec::SetArchitecture if we passed in the vendor in addition to the architecture, that would set the ObjectFile to MachO correctly without calling Triple::setObjectFormat which adds the object file name to the triple as an environment, if there is no environment. something like

diff --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4da..6daaf448fd5 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -887,7 +887,10 @@ bool ArchSpec::SetArchitecture(ArchitectureType arch_type, uint32_t cpu,
         // Always use the architecture name because it might be more
         // descriptive than the architecture enum ("armv7" ->
         // llvm::Triple::arm).
-        m_triple.setArchName(llvm::StringRef(core_def->name));
+        std::string initial_triple = core_def->name;
+        if (arch_type == eArchTypeMachO)
+          initial_triple += "-apple-";
+        m_triple.setArchName(initial_triple.c_str());
         if (arch_type == eArchTypeMachO) {
           m_triple.setVendor(llvm::Triple::Apple);
 

although that's not very prettily done.

UPDATE: ah my bad, this isn't sufficient for llvm Triple::setArchName to set the ObjectFile to MachO; Triple.cpp's getDefaultFormat(), is looking for an OS not the vendor to decide if it's MachO.

jasonmolenda added a commit that referenced this pull request Jun 5, 2025
…hO` (#142704)"

This reverts commit d4d2f06.

Temporarily reverting until we can find a way to get the correct
ObjectFile set in Module's Triples without adding "-macho" to the
triple string for each Module.  This is breaking TestUniversal.py
on the x86_64 macOS CI bots.
@jasonmolenda
Copy link
Collaborator

I've reverted this commit on main for now,

commit f961d6a89abe5a6fb70afc043f33b2efcec77536
Author: Jason Molenda <jmolenda@apple.com>
Date:   Thu Jun 5 16:23:27 2025 -0700

    Revert "[lldb] Set default object format to `MachO` in `ObjectFileMachO` (#142704)"

Until we can find a way to set the ObjectFile correctly without adding "-macho" to each Module's triple string. TestUniversal.py failing is the specific failure caused by this change, but it's also an undesirable change to have "-macho" added to triple strings from a UI point of view.

jasonmolenda added a commit that referenced this pull request Jun 6, 2025
… non-Mach-O files (#139170)"

This reverts commit 3096f87.

Reverting this commit because it depends on another PR
that was reverted, #142704

Both can be reapplied once we find a correct fix for that.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 6, 2025
…bugMap` for non-Mach-O files (#139170)"

This reverts commit 3096f87.

Reverting this commit because it depends on another PR
that was reverted, llvm/llvm-project#142704

Both can be reapplied once we find a correct fix for that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants