Skip to content

[lldb] Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files #139170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 5, 2025

Conversation

royitaqi
Copy link
Contributor

@royitaqi royitaqi commented May 8, 2025

Change

SymbolFileDWARFDebugMap::CreateInstance() will return nullptr if the file is not a Mach-O.

Benefit

This may improve Linux debugger launch time by skipping the creation of SymbolFileDWARFDebugMap during the SymbolFile::FindPlugin() call, which loops through a list of SymbolFile plugins and tries to find the one that provides the best abilities. If the SymbolFileDWARFDebugMap is created during this loop, it will load the symbol table of the file in question and loop through all the compile units in the debug map (the OSO entries) to calculate the abilities.

Tests

Two unittests are added to verify the creation and the absence of SymbolFileDWARFDebugMap for Mach-O and non-Mach-O files, respectively.

Ran ninja check-lldb on both Darwin and Linux. No regression spotted. See results compared for mac and linux.

$ ninja lldb-unit-test-deps
$ tools/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFTests --gtest_filter="*SymbolFileDWARFDebugMapTests*"

@royitaqi royitaqi requested a review from JDevlieghere as a code owner May 8, 2025 22:59
@llvmbot llvmbot added the lldb label May 8, 2025
@royitaqi royitaqi changed the title Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files [lldb] Don't create instance of SymbolFileDWARFDebugMap for non-Mach-O files May 8, 2025
@llvmbot
Copy link
Member

llvmbot commented May 8, 2025

@llvm/pr-subscribers-lldb

Author: None (royitaqi)

Changes

Change

SymbolFileDWARFDebugMap::CreateInstance() will return nullptr if the file is not a Mach-O.

Benefit

This may improve Linux debugger launch time by skipping the creation of SymbolFileDWARFDebugMap during the SymbolFile::FindPlugin() call, which loops through a list of SymbolFile plugins and tries to find the one that provides the best abilities. If the SymbolFileDWARFDebugMap is created during this loop, it will load the symbol table of the file in question and loop through all the compile units in the debug map (the OSO entries) to calculate the abilities.

Tests

Two tests are added to verify the creation and the absence for Mach-O and non-Mach-O files.

Ran ninja check-lldb on both Darwin and Linux. No regression spotted. See results compared for mac and linux.


Full diff: https://github.com/llvm/llvm-project/pull/139170.diff

3 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp (+3)
  • (modified) lldb/unittests/SymbolFile/DWARF/CMakeLists.txt (+1)
  • (added) lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp (+142)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
index 961c212e2e6dc..1793c23950d55 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
@@ -38,6 +38,7 @@
 #include "lldb/Target/StackFrame.h"
 
 #include "LogChannelDWARF.h"
+#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
 #include "SymbolFileDWARF.h"
 #include "lldb/lldb-private-enumerations.h"
 
@@ -246,6 +247,8 @@ llvm::StringRef SymbolFileDWARFDebugMap::GetPluginDescriptionStatic() {
 }
 
 SymbolFile *SymbolFileDWARFDebugMap::CreateInstance(ObjectFileSP objfile_sp) {
+  if (objfile_sp->GetPluginName() != ObjectFileMachO::GetPluginNameStatic())
+    return nullptr;
   return new SymbolFileDWARFDebugMap(std::move(objfile_sp));
 }
 
diff --git a/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt b/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
index d5b0be7ea2a28..5aacb24fc5206 100644
--- a/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
+++ b/lldb/unittests/SymbolFile/DWARF/CMakeLists.txt
@@ -4,6 +4,7 @@ add_lldb_unittest(SymbolFileDWARFTests
   DWARFDIETest.cpp
   DWARFIndexCachingTest.cpp
   DWARFUnitTest.cpp
+  SymbolFileDWARFDebugMapTests.cpp
   SymbolFileDWARFTests.cpp
   XcodeSDKModuleTests.cpp
 
diff --git a/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp b/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp
new file mode 100644
index 0000000000000..8c65fca889a40
--- /dev/null
+++ b/lldb/unittests/SymbolFile/DWARF/SymbolFileDWARFDebugMapTests.cpp
@@ -0,0 +1,142 @@
+//===-- SymbolFileDWARFDebugMapTests.cpp ----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "Plugins/ObjectFile/ELF/ObjectFileELF.h"
+#include "Plugins/ObjectFile/Mach-O/ObjectFileMachO.h"
+#include "Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.h"
+#include "TestingSupport/SubsystemRAII.h"
+#include "TestingSupport/TestUtilities.h"
+
+#include "lldb/Core/Module.h"
+#include "llvm/Testing/Support/Error.h"
+
+#include "gtest/gtest.h"
+
+using namespace lldb;
+using namespace lldb_private;
+using namespace lldb_private::plugin::dwarf;
+
+class SymbolFileDWARFDebugMapTests : public testing::Test {
+  SubsystemRAII<ObjectFileELF, ObjectFileMachO> subsystems;
+};
+
+TEST_F(SymbolFileDWARFDebugMapTests, CreateInstanceReturnNonNullForMachOFile) {
+  // Make sure we don't crash parsing a null unit DIE.
+  const char *yamldata = R"(
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x01000007
+  cpusubtype:      0x80000003
+  filetype:        0x00000001
+  ncmds:           1
+  sizeofcmds:      152
+  flags:           0x00002000
+  reserved:        0x00000000
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         152
+    segname:         __TEXT
+    vmaddr:          0
+    vmsize:          4
+    fileoff:         184
+    filesize:        4
+    maxprot:         7
+    initprot:        7
+    nsects:          1
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0000000000000000
+        content:         'AABBCCDD'
+        size:            4
+        offset:          184
+        align:           0
+        reloff:          0x00000000
+        nreloc:          0
+        flags:           0x80000400
+        reserved1:       0x00000000
+        reserved2:       0x00000000
+        reserved3:       0x00000000
+...
+)";
+
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+  auto debug_map =
+      SymbolFileDWARFDebugMap::CreateInstance(object_file->shared_from_this());
+  ASSERT_NE(debug_map, nullptr);
+}
+
+TEST_F(SymbolFileDWARFDebugMapTests, CreateInstanceReturnNullForNonMachOFile) {
+  // Make sure we don't crash parsing a null unit DIE.
+  const char *yamldata = R"(
+--- !ELF
+FileHeader:
+  Class:   ELFCLASS64
+  Data:    ELFDATA2LSB
+  Type:    ET_EXEC
+  Machine: EM_386
+DWARF:
+  debug_abbrev:
+    - Table:
+        - Code:            0x00000001
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_addr_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         5
+      AddrSize:        4
+      UnitType:        DW_UT_compile
+      Entries:
+        - AbbrCode:        0x00000001
+          Values:
+            - Value:           0x8 # Offset of the first Address past the header
+        - AbbrCode:        0x0
+  debug_addr:
+    - Version: 5
+      AddressSize: 4
+      Entries:
+        - Address: 0x1234
+        - Address: 0x5678
+  debug_line:
+    - Length:          42
+      Version:         2
+      PrologueLength:  36
+      MinInstLength:   1
+      DefaultIsStmt:   1
+      LineBase:        251
+      LineRange:       14
+      OpcodeBase:      13
+      StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
+      IncludeDirs:
+        - '/tmp'
+      Files:
+        - Name:            main.cpp
+          DirIdx:          1
+          ModTime:         0
+          Length:          0
+)";
+
+  llvm::Expected<TestFile> file = TestFile::fromYaml(yamldata);
+  EXPECT_THAT_EXPECTED(file, llvm::Succeeded());
+  auto module_sp = std::make_shared<Module>(file->moduleSpec());
+  ASSERT_NE(module_sp, nullptr);
+  auto object_file = module_sp->GetObjectFile();
+  ASSERT_NE(object_file, nullptr);
+  auto debug_map =
+      SymbolFileDWARFDebugMap::CreateInstance(object_file->shared_from_this());
+  ASSERT_EQ(debug_map, nullptr);
+}

@royitaqi royitaqi requested review from clayborg and jeffreytan81 May 8, 2025 23:00
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins shouldn't depend on each other. The correct way to do this would be to have a method in ObjectFile, something like ObjectFile::SupportsDebugMap() which returns false for all implementations except the ObjectFileMachO one.

@bulbazord
Copy link
Member

+1 to Jonas's comment. Plugins ideally don't depend on each other and they should strive to fail gracefully if they depend on a specific plugin instance that's not available. I recognize that many plugins depend on each other today, but I don't think we should be adding more direct dependencies if we can help it.

@royitaqi
Copy link
Contributor Author

royitaqi commented May 9, 2025

@JDevlieghere I see your point (some learning questions below about the principle).

However, I feel ObjectFile::SupportsDebugMap() is still closely tied to a specific plugin (not through compilation, but semantics). Imagine each plugin needs to add such a method into the ObjectFile class, then the class will be filled with plugin-specific methods like this, which I feel breaks the API scalability the plugin system wanted.

Instead, maybe it's better if we frame "DebugMap" as an ObjectFileCapability enum. Other capabilities can be added, too, e.g. does the object file imply split dwarf, etc. Then the API will be ObjectFile::GetSupportedCapabilities().

WDYT?

(Note: I am trying to think about how does this relate to the CalculateAbilities() function in those SymbolFile subclasses. I feel they are different, so it should be okay.)

-- (my learning question about the principle of "Plugins shouldn't depend on each other") --

I guess I'm missing a lot of context here. From my limited and naive view, there are natural connections between different kinds of plugins because how they usually work together in existing systems. E.g. macOS usually use Mach-O files and can use either SymbolFileDWARF and SymbolFileDWARFDebugMap, but never split-DWARF (dwp/dwo). I don't understand the design principle of "Plugins shouldn't depend on each other". It feels like it's trying to ignore the natural connection between these plugins. I'm guessing we have that principle so as to avoid having to model a complex (and dynamic) dependency problem? (E.g. "A and B" and "B and C" should always work together, but "A and C" should never work together.)

@labath
Copy link
Collaborator

labath commented May 12, 2025

FWIW, I completely agree @royitaqi. I'm not saying inter-plugin dependencies should be a free-for-all (*), but I think that some kinds of dependencies make sense and I definitely think this is one of them. The debug map format is intimately tied to the MachO and darwin ecosystems and I think there's approximately zero chance that it will be implemented elsewhere. Using the debug map requires MachO files, so why not just admit that?

This isn't the only example of that. For example, ProcessElfCore requires elf core files. ProcessMachCore requires MachO core files. And I think that's a perfectly natural state of the world and doing anything else would make things a lot more convoluted. I think something like SupportsDebugMap is a step in the wrong direction. It's true that removes the link time dependency but the moral dependency remains, and in fact this makes it worse, because now every object file plugin needs to know about the debug map. Why should ObjectFilePECOFF need to think about whether it supports a random symbol file plugin?

(*) I think that plugin dependencies should form a DAG. The fact that SymbolFileDWARFDebugMap depends on ObjectFileMachO means that ObjectFileMachO should not depend on SymbolFileDWARFDebugMap -- which I think is also perfectly natural restriction. And even if there is a reason why you might want to depend the other way, you can try to break it using the usual dependency reversal techniques, but you'd only need to touch these two plugins -- not every plugin of the two kinds.

I might also say that, for the sake of clarity, the dependency direction should be uniform across all plugins of the same type. So that SymbolFileDWARFDebugMap depends on ObjectFileMachO means that no other object file plugin can depend on any other symbol file plugin, which is, again, I think a reasonable requirement, and I would say that if we ever find a case where we have two plugin pairs which want to depend in the opposite directions, that this means that those plugin types are not designed properly (maybe they ought to be a single plugin?).

@JDevlieghere
Copy link
Member

FWIW, I completely agree @royitaqi. I'm not saying inter-plugin dependencies should be a free-for-all (*), but I think that some kinds of dependencies make sense and I definitely think this is one of them. The debug map format is intimately tied to the MachO and darwin ecosystems and I think there's approximately zero chance that it will be implemented elsewhere. Using the debug map requires MachO files, so why not just admit that?

To answer your last question: because the plugins (which is really a misnomer), as I understand it, are there to provide abstraction: the caller shouldn't have to know what concrete instance an ObjectFile is. The no-plugins-dependency enforces that at compile/link time.

I'll be the first to admit that the current approach has shortcomings (like it does here, I totally agree with that). However we should also remember that it exists for a reason and that a lot of time and effort has been spent on detangling the plugins to fix layering issues and reduce binary sizes. The current rule is easy to understand and easy to enforce which makes it appealing and ensures we don't make things worse as we're still working on this.

IMHO, the downsides being listed here don't outweigh the benefits, especially since we have a bunch of examples of those "moral dependencies". That said, if there's a way to enforce the DAG described in the asterisk, I'm totally fine with changing the rule to allow inter-plugin dependencies "where it makes sense" (which is still subjective). I made a similar argument in another PR a while ago, but I don't think we can expect reviewers to mentally verify this DAG every time someone tries to introduce a new inter-plugin dependencies.

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

PS: I'm actually glad we're having this discussion. Whatever the outcome, I'd like to document it on the website so contributors can know what to expect going forward. This comes up periodically and there's no reason this should be a surprise for existing or new contributors.

@labath
Copy link
Collaborator

labath commented May 13, 2025

To answer your last question: because the plugins (which is really a misnomer), as I understand it, are there to provide abstraction: the caller shouldn't have to know what concrete instance an ObjectFile is. The no-plugins-dependency enforces that at compile/link time.

This is my problem with that. You're saying the "caller shouldn't have to know" -- which is a statement about his state of mind. We can't enforce the state of mind through syntactic restrictions. We can prevent the caller from saying he expects a specific subclass, but there's not way to prevent code from expecting that implicitly (perhaps even without it knowing). Functions like SupportsDebugMap make it worse because they encourage writing code like that and that makes the plugin system (the abstractions created by it) less useful as a whole. Like, if I now wanted to create a new symbol file plugin that works only with a specific object file, do I add SupportsPsym to every object file plugin? Or we don't have to look at new plugins, we have two good examples already: SymbolFileBreakpad and SymbolFileJSON. Both of these currently depend on the their respective object file plugin. Should that be replaced by something else (and would the result be better if we did)?

Nonetheless, I agree that plugins should create abstractions, but I think it's important to think about who are those abstractions directed at. The core code definitely shouldn't need to "look behind the curtain" (ideally not even through calls like SupportsDebugMap), but I don't think that means that noone else isn't allowed to do that. This situation is really simple, so it doesn't demonstrate the full scope of the problem, which is why I want to go back to the ProcessCore->ObjectFile example.

ProcessElfCore needs to extract a lot of detailed information from ObjectFileELF. Same goes for ProcessMachCore. At some level, the information these two pairs are exchanging is the same (threads, registers, ...), so one could imagine a generic interface to pass this -- but would that help anything? I say "no", because you still want to have the two plugins to work in pair -- if you didn't, then you wouldn't need separate process core plugins. This setup would make sense if we had a generic ProcessCore class (not a plugin) which works off of information provided by the object plugins -- but that's not the situation we're in now. You can make the same case for dynamic loaders -- should MacOSX-DYLD work with ObjectFilePECOFF?

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

Enforcement is tricky, because "where it makes sense" is obviously a judgement call. It's also complicated by the fact that current graph is not a DAG. Linux linkers actually kind of enforce the graph implicitly, but since it's not a DAG now, we have a workaround and that prevents us catching any new bad deps. I guess it would be possible to do something similar to NO_PLUGIN_DEPENDENCIES in cmake, which only allows dependencies that we consider acceptable, though it's not foolproof: for example SymbolFileJSON doesn't list ObjectFileJSON in its cmake dependencies even though it definitely needs it.

I think the best we can do is some cmake check plus some documentation about which dependencies we consider "okay".
I don't think this can be foolproof though. Going by the examples above, I would say that we should permit dependencies on object file plugins from SymbolFiles, DynamicLoaders and Processes -- but there are still exceptions to that. For example, ProcessGdbRemote and DynamicLoaderStatic (and maybe SymbolFileDWARF) probably shouldn't depend on any specific ObjectFile plugin since they're meant to work with more than one (I guess that could be a general rule -- but it's not enforcable one).

A rule like "there shall be no inter-plugin dependencies" is easy to enforce and that makes it appealing, but I think it will be very hard for us to come into compliance with that (and I don't think we'd like the result even if we did). If we say some deps are okay, then a lot of things become automatically compliant, and we can focus on the high-value high-complexity deps like SymbolFile<->TypeSystem.

@JDevlieghere
Copy link
Member

This is my problem with that. You're saying the "caller shouldn't have to know" -- which is a statement about his state of mind. We can't enforce the state of mind through syntactic restrictions. We can prevent the caller from saying he expects a specific subclass, but there's not way to prevent code from expecting that implicitly (perhaps even without it knowing). Functions like SupportsDebugMap make it worse because they encourage writing code like that and that makes the plugin system (the abstractions created by it) less useful as a whole. Like, if I now wanted to create a new symbol file plugin that works only with a specific object file, do I add SupportsPsym to every object file plugin? Or we don't have to look at new plugins, we have two good examples already: SymbolFileBreakpad and SymbolFileJSON. Both of these currently depend on the their respective object file plugin. Should that be replaced by something else (and would the result be better if we did)?

Nonetheless, I agree that plugins should create abstractions, but I think it's important to think about who are those abstractions directed at. The core code definitely shouldn't need to "look behind the curtain" (ideally not even through calls like SupportsDebugMap), but I don't think that means that noone else isn't allowed to do that. This situation is really simple, so it doesn't demonstrate the full scope of the problem, which is why I want to go back to the ProcessCore->ObjectFile example.

ProcessElfCore needs to extract a lot of detailed information from ObjectFileELF. Same goes for ProcessMachCore. At some level, the information these two pairs are exchanging is the same (threads, registers, ...), so one could imagine a generic interface to pass this -- but would that help anything? I say "no", because you still want to have the two plugins to work in pair -- if you didn't, then you wouldn't need separate process core plugins. This setup would make sense if we had a generic ProcessCore class (not a plugin) which works off of information provided by the object plugins -- but that's not the situation we're in now. You can make the same case for dynamic loaders -- should MacOSX-DYLD work with ObjectFilePECOFF?

To be clear, I pretty much agree with everything you've said. I'm not arguing that the current approach is better, at least not on merit. My point is that barring a better alternative, those shortcoming are the price to pay to avoid the free-for-all nobody wants. I'm glad that you're offering an alternative below.

TL;DR: My stance is that if there's a way to enforce the DAG, I'm totally fine with allowing inter-plugin dependencies where they make sense. Barring that I think the current rule, albeit overly strict, is the easiest way to enforce we don't regress inter-plugin dependencies in general.

Enforcement is tricky, because "where it makes sense" is obviously a judgement call. It's also complicated by the fact that current graph is not a DAG. Linux linkers actually kind of enforce the graph implicitly, but since it's not a DAG now, we have a workaround and that prevents us catching any new bad deps. I guess it would be possible to do something similar to NO_PLUGIN_DEPENDENCIES in cmake, which only allows dependencies that we consider acceptable, though it's not foolproof: for example SymbolFileJSON doesn't list ObjectFileJSON in its cmake dependencies even though it definitely needs it.

I think the best we can do is some cmake check plus some documentation about which dependencies we consider "okay". I don't think this can be foolproof though. Going by the examples above, I would say that we should permit dependencies on object file plugins from SymbolFiles, DynamicLoaders and Processes -- but there are still exceptions to that. For example, ProcessGdbRemote and DynamicLoaderStatic (and maybe SymbolFileDWARF) probably shouldn't depend on any specific ObjectFile plugin since they're meant to work with more than one (I guess that could be a general rule -- but it's not enforcable one).

I'm fine with that. What I'm looking for is something that's (relatively) easy to understand and (relatively) easy to spot in code reviews. I'd love to have something foolproof (or work towards that) but I acknowledge that given the current state of things that's not achievable. What you describe here checks both boxes.

A rule like "there shall be no inter-plugin dependencies" is easy to enforce and that makes it appealing, but I think it will be very hard for us to come into compliance with that (and I don't think we'd like the result even if we did). If we say some deps are okay, then a lot of things become automatically compliant, and we can focus on the high-value high-complexity deps like SymbolFile<->TypeSystem.

Ack, agreed.

To make this actionable, @labath or @royitaqi, is either of you willing to sign up to add the CMakery and document the new policy?

@labath
Copy link
Collaborator

labath commented May 20, 2025

I'm glad we're in (abstract) agreement. As much as I would like to, I can't promise to do this by any particular date. I'd can try, but I don't know when will that happen.

In a way, I feel that this patch isn't a good reason to motivate this (although I think that legalising the existing core file dependencies is a good reason) for this as I think there are acceptable workarounds (or even not workarounds) for the problem. In particular, I think that the solution from #139554 which looks at the Triple is okay. The Triple object even has a isBinFormatMachO method...

@royitaqi royitaqi force-pushed the dont-create-debug-map-on-linux branch from 96b17f8 to 09bd622 Compare May 30, 2025 05:38
@royitaqi royitaqi force-pushed the dont-create-debug-map-on-linux branch from 09bd622 to 57145ec Compare May 30, 2025 16:21
@royitaqi
Copy link
Contributor Author

royitaqi commented May 30, 2025

Thanks for the great discussion, @JDevlieghere and @labath. I appreciate it and am learning a lot by reading through it. As far as I can understand, I think what you have aligned make sense.

Re. CMakery and document, sorry that I cannot promise to do this by a timeline, because as a CMake noob I will probably need to learn about it first before I can understand what exactly needs to be done. With that said, will keep this in my mind and learn towards it when I have a chance.

--

Also thanks to @labath for the suggestion about using Triple of the ObjectFile. I have updated the code accordingly.

(already fixed) However, I am having difficulty making a test that I'm adding to pass . Would love your help if you have the time. See my inline comment in the test CreateInstanceReturnNonNullForMachOFile.

@royitaqi royitaqi requested review from labath and JDevlieghere June 3, 2025 22:48
@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 3, 2025

Updated the code to using Triple of the ObjectFile (thanks to @labath for the suggestion). Updated the test (thanks to @dmpots for debugging with me).

@labath / @JDevlieghere: Please kindly take a look when you have the time.

@dmpots dmpots self-requested a review June 3, 2025 22:58
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@royitaqi royitaqi merged commit 3096f87 into llvm:main Jun 5, 2025
10 checks passed
jasonmolenda added a commit that referenced this pull request Jun 6, 2025
… non-Mach-O files (#139170)"

This reverts commit 3096f87.

Reverting this commit because it depends on another PR
that was reverted, #142704

Both can be reapplied once we find a correct fix for that.
@jasonmolenda
Copy link
Collaborator

NB I reverted this on main because it depends on
#142704
which I also reverted as it adds "-macho" to every Module's Triple string, and we need to find a way to mark the object file as MachO without doing that.

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 6, 2025

NB I reverted this on main because it depends on #142704 which I also reverted as it adds "-macho" to every Module's Triple string, and we need to find a way to mark the object file as MachO without doing that.

Hi @jasonmolenda,

I appreciate you actively trying to make the code base better, and I can see why the other revert may make sense, BUT:

it depends on #142704

I respectfully disagree. This patch doesn't depend on #142704.

  • In a normal production environment (e.g. on a regular macOS machine) and in tests, the object format of mach-o files are already MachO (because they have the load command LC_BUILD_VERSION or LC_VERSION_MIN_*).
  • The only caveat of this patch may be if there are mach-o files out there which don't have such load commands. However, their triple is already wrong (object format being ELF), so their behavior is already undefined at best. So I don't think we need to worry about them.

Please kindly LMK if I missed something in the above.

Thanks,
Roy

@jasonmolenda
Copy link
Collaborator

I'll admit I didn't debug it very far, but when I reverted the ObjectFileMachO.cpp change, lldb/test/Shell/Breakpoint/case-sensitive.test started failing on the bots. When I built main myself and ran it, it failed on my desktop. When I used the built lldb on the case-sensitive.test.tmp binary this test built, lldb could not set a source & line breakpoint -- it had no debug info. When I also reverted this PR, the failure went away.

@jasonmolenda
Copy link
Collaborator

e.g. https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/27150/ but it reproduced on my arm64 macOS desktop no problem when I did ninja check-lldb-shell

@jasonmolenda
Copy link
Collaborator

jasonmolenda commented Jun 6, 2025

Ultimately I expect

  // Don't create a debug map if the object file isn't a Mach-O.
  if (!objfile_sp->GetArchitecture().GetTriple().isAppleMachO())

was not succeeding for this binary.

The compile command in the shell test is doing %build %p/Inputs/case-sensitive.c --nodefaultlib -o %t and looking at the built binary in my checkout, it links against no libraries and has no LC_BUILD load command.

Pretty unusual for a userland binary, but you could definitely see something like this with a standalone firmware/kernel Mach-O binary. I don't think depending on the existence of the LC_BUILD load command to set the ObjectFile's architecture is a good choice, it will be fragile.

@jasonmolenda
Copy link
Collaborator

jasonmolenda commented Jun 6, 2025

FTR I suspect the correct fix for #142704 is

  1. This Triple object file setting should be done in ArchSpec::SetArchitecture, so it doesn't depending on hitting an LC_BUILD load command, if anything is going to depend on it. There are many MachO binaries that don't have LC_BUILDs in them, for non-userland process debugging - kernels and firmware debugging.
  2. There needs to be a way of setting the Triple's object file to MachO (because it is defaulting to ELF when we start with just "arm64") without adding "-macho" to the triple string. I don't see a way to do that with the llvm Triple methods today.

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 6, 2025

@jasonmolenda I appreciate the details. IIRC I ran ninja check-lldb both before and after rebasing the patch onto #142704, but from what you said I must have not done that. Let me double check on my side.

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 6, 2025

You are right, lldb/test/Shell/Breakpoint/case-sensitive.test failed on my arm64 macOS, too. So what I did must be that I tested before switching to using isAppleMachO(), then never tested it again until rebased on top of #142704.

Thank you for catching this and the revert. My bad (mistakenly thought the patch is independent).

@royitaqi
Copy link
Contributor Author

royitaqi commented Jun 6, 2025

the correct fix for #142704

How about:

  1. In getDefaultFormat(), after getOS(), if it's unknown then don't check isOSDarwin() and instead check if the vendor is Apple (which is when ObjectFileMachO::GetAllArchSpecs calls base_arch.SetArchitecture, which calls m_triple.setVendor(llvm::Triple::Apple)).

I think this is related to what you said in #142704. I guess my question is "why not look at vendor" (esp when the OS is not set).

UPDATE: ah my bad, this isn't sufficient for llvm Triple::setArchName to set the ObjectFile to MachO; Triple.cpp's getDefaultFormat(), is looking for an OS not the vendor to decide if it's MachO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants