You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+29-23Lines changed: 29 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,6 @@
2
2
3
3
# Building a class-dump in <del>2019</del> 2020
4
4
5
-
## This is being actively worked on, this message will disappear when I am happy with the writeup
6
-
7
5
Building out a "class-dump"-like introspection tool for Apple platforms has changed considerably since the original [class-dump](http://stevenygard.com/projects/class-dump/) came out. Learning these new (and old) technologies can be quite intimidating due to the steep learning curve and somewhat hard to find documentation.
8
6
9
7
This article *attempts* to explain the complete process of programmatically inspecting a [Mach-O](https://en.wikipedia.org/wiki/Mach-O) (Apple) binary to display the compiled Swift types and Objective-C classes by discussing the following:
@@ -40,7 +38,8 @@ This writeup takes its sweet time explaining things, but there's a lot of concep
40
38
41
39
If you know most of this stuff, I'd recommend just jumping to the appropriate section that you need to learn.
42
40
43
-
At the time of writing this [Ghidra](https://www.nsa.gov/resources/everyone/ghidra/), [Frida](https://frida.re), [Hopper](https://www.hopperapp.com), [IDA](https://www.hex-rays.com/products/ida/), [jtool](http://www.newosxbook.com/tools/jtool.html), & friends all could provide better Swift introspection support. If you work on any of the mentioned tools, I'd recommend you follow the Swift parts.
41
+
For all you "heavyweights" out there ([Ghidra](https://www.nsa.gov/resources/everyone/ghidra/), [Hopper](https://www.hopperapp.com), [IDA](https://www.hex-rays.com/products/ida/), [jtool](http://www.newosxbook.com/tools/jtool.html), & friends), I recommend you check out the Swift part as I have some suggestions on how to provide better Swift support for your tool.
42
+
44
43
45
44
<palign="center">
46
45
<imgsrc="https://media.giphy.com/media/8YmZ14DOpivXMuckSI/giphy.gif"alt="And here we go">
@@ -76,7 +75,7 @@ In the file `<mach-o/loader.h>`, there exists a C struct called **`mach_header_6
76
75
77
76
> **Note:** When referring to C System headers on your OS X machine, you can usually resolve the header location to the following Terminal command: `echo $(xcrun --show-sdk-path)/usr/include`. This resolves to the base directory search path for C system headers. The resolved filepath of loader.h can be viewed via: `cat $(xcrun --show-sdk-path)/usr/include/mach-o/loader.h | less -R`
78
77
79
-
Looking at the `mach_header_64` struct, it contains the following:
78
+
The `mach_header_64` struct contains the following:
80
79
81
80
```c
82
81
struct mach_header_64 {
@@ -91,7 +90,7 @@ struct mach_header_64 {
91
90
};
92
91
```
93
92
94
-
Cross reference this with any *compiled* executable. I'll pick **grep**, feel free to pick anything else:
93
+
Cross reference the above `mach_header_64` with any *compiled* executable. I'll pick **grep**, feel free to pick anything else:
95
94
96
95
```bash
97
96
lolgrep:~$ xxd -g 4 -e $(which grep)| head -2
@@ -285,9 +284,9 @@ Section
285
284
...
286
285
```
287
286
288
-
From the above output, the first section inside the `__TEXT` segment is a section (confusingly) called `__text`. It's this section where compiled code resides (unless someone is doing something sneaky).
287
+
From the above output, the first section inside the `__TEXT` segment is a section (confusingly) called **`__text`**. It's this section where compiled code resides (unless someone is doing something sneaky).
289
288
290
-
>**Note:** You'll often see both the segment and section grouped together via a period to specify the exact Mach-O location. For example, using the above paragraph, I could also say all compiled code resides in the `__TEXT.__text` section. Most tools out there use this methodology.
289
+
>**Note:** You'll often see both the Mach-O segment and section grouped together via a period to specify the exact Mach-O location. For example, using the above paragraph, I could also say all compiled code resides in the `__TEXT.__text` section. Most tools out there use this methodology.
291
290
292
291
There are many, many interesting Mach-O sections. One could write a novel on just this topic. Again, check out the Mach-O links above to learn more about the different types of Mach-O sections.
293
292
@@ -323,7 +322,7 @@ Compile ex.c:
323
322
lolgrep:/tmp$ clang ex.c -o ex
324
323
```
325
324
326
-
...then query the `*GlobalInt` integer symbols using the `nm` tool (which displays symbol table information, more on that later)
325
+
Query the `*GlobalInt` integer symbols using the **`nm`** tool (which displays symbol table information, more on that later)
327
326
328
327
```bash
329
328
lolgrep:/tmp$ nm -m ex | grep GlobalInt
@@ -688,7 +687,7 @@ Looking at the `nlist_64` value for the `someData` and `someFunction` symbols gi
* The `n_value` will give the virtual address *if the symbol is implemented locally*.
@@ -816,7 +815,7 @@ In a `MH_EXECUTE` type image, any C/Objective-C/Swift function don't need to be
816
815
817
816
> **Note:** Just because there's no symbol in the symbol table for some code doesn't mean that you can't infer that a function is there. The **`LC_FUNCTION_STARTS`** load command will export a list of all the function/method locations (*only code, NOT data*) that are implemented by an image. This information is formatted in **[ULEB](https://en.wikipedia.org/wiki/ULEB)**. This is useful for debuggers and crash analytics.
818
817
819
-
What if the above code was compiled as a shared library? What would happen to the symbol table? Compile ex4.c, but now add the `-shared` option:
818
+
What if the above code was compiled as a shared library? What would happen to the symbol table? Compile ex4.c, but now add the **`-shared`** option:
820
819
821
820
```bash
822
821
lolgrep:/tmp$ clang -shared ex4.c -o ex4.shared
@@ -927,14 +926,16 @@ It's the dereferenced values, `0x0000000100002148` and the `0x0000000100002198`
927
926
(Class) $3 = AnotherClass
928
927
```
929
928
930
-
> **Note:** As of around clang version `clang-1100.0.33.8` (in Xcode 11), the default configuration for compiling the Objective-C `__objc_class_list` Mach-O section was moved from the `__DATA` Mach-O segment to the `__DATA_CONST` Mach-O segment. This change is discussed in the DYLD opcodes part of the writeup, but just be aware that if you have an older version of clang, you'll see `__objc_class_list` in the `__DATA` Mach-O segment.
929
+
> **Note:** As of around clang version `clang-1100.0.33.8` (in Xcode 11), the default configuration for compiling the Objective-C `__objc_class_list` Mach-O section was moved from the `__DATA` Mach-O segment to the `__DATA_CONST` Mach-O segment. This "new" Mach-O segment disables write access to areas that only needs to be written upon image loading (via dyld opcodes) and nothing more. Be aware that if you have an older version of clang, you'll see `__objc_class_list` in the `__DATA` Mach-O segment.
931
930
932
931
---
933
932
<aname="objc4"></a>
934
933
## 3.2 Objc4
935
934
---
936
935
937
-
It's quite insightful to look at the source code to build Objective-C.
936
+
You learned where the Objective-C classes are located in memory and on disk, now it's time to look at the layout of an Objective-C class. There's *much* more info than the `<objc/runtime.h>` header most developers know about.
937
+
938
+
This Objective-C class layout can be found on Apple's [opensource site](https://opensource.apple.com).
938
939
939
940
The most recent opensource Objective-C class layout (at the time of writing this) can be found in a header named **[objc4/objc4-756.2/objc-runtime.new.h](https://opensource.apple.com/source/objc4/objc4-756.2/runtime/objc-runtime-new.h.auto.html)**
940
941
@@ -1265,8 +1266,9 @@ Build out the following Objective-C file called **ex7.m**:
1265
1266
1266
1267
intmain () { return 0; }
1267
1268
```
1269
+
In the above code, `SubArray` inherits from `NSArray`, which isn't implemented in your code, but referenced via the `Foundation` module. You'll see dyld binds `NSArray` to `superclass` field of the `SubArray` class.
1268
1270
1269
-
Compile ex7.m, make sure to include the `-fno-pie` option:
1271
+
Compile **ex7.m**, make sure to include the `-fno-pie` option:
@@ -1360,7 +1362,7 @@ These relative pointers will point to something called a **nominal type descript
1360
1362
1361
1363
If you clicked on the above link, that's a little hard on the eyes, right? Figuring out the offsets for C++ classes can be a pain the ass due to inheritance. Fortunately, [Scott Knight](https://twitter.com/sdotknight) provides an *excellent*[article](https://knight.sc/reverse%20engineering/2019/07/17/swift-metadata.html) with simplified C struct offsets. If you're interested in the Swift layouts, I'd strongly suggest you read Scott's work, since Scott does a much better job explaining all the Swift struct layouts. So instead of focusing on all the different structs like Scott, I'll do a deep dive into one struct layout: the layout for Swift classes.
1362
1364
1363
-
Here's the simplified layout for a Swift class in Swift 5
1365
+
Here's the simplified layout for a Swift class in Swift 5:
int32_t SuperclassType // The type of the superclass, expressed as a mangled type name
1378
-
uint32_t MetadataNegativeSizeInWords
1379
-
uint32_t MetadataPositiveSizeInWords
1380
+
uint32_t MetadataNegativeSizeInWords // Ignore for this writeup
1381
+
uint32_t MetadataPositiveSizeInWords // Ignore for this writeup
1380
1382
uint32_t NumImmediateMembers // Number of additional members stored after this class (aka NumImmediateMembers * sizeof(void*) payload)
1381
1383
uint32_t NumFields // Number of properties stored in this class
1382
1384
uint32_t FieldOffsetVectorOffset; // The offset of the field offset vector for this struct's stored properties in its metadata
@@ -1451,7 +1453,7 @@ BOOM! And that's Swift reflection in a nutshell!
1451
1453
## 5.2 Swift Methods in a Class
1452
1454
---
1453
1455
1454
-
The `NominalClassDescriptor` has 11 `int32_t` members, totalling 44 bytes. Immediately following the `NominalClassDescriptor`, there exists a varying amount of data. I won't get into the nitty gritty of this (check out the **TrailingObjects.h** header if you want to learn more), but the prologue of the `NominalClassDescriptor` will look like the following:
1456
+
The `NominalClassDescriptor` has 11 `int32_t` members, totalling 44 bytes. Immediately following the `NominalClassDescriptor`, there exists a varying amount of data. I won't get into the nitty gritty of this (check out the [TrailingObjects.h](https://github.com/apple/swift/blob/master/include/swift/ABI/TrailingObjects.h) header if you want to learn more), but the prologue of the `NominalClassDescriptor` will look like the following (provided the class has implemented some methods):
1455
1457
1456
1458
```c
1457
1459
// End of NominalClassDescriptor here...
@@ -1492,7 +1494,7 @@ private:
1492
1494
};
1493
1495
```
1494
1496
1495
-
>**Note:** If you're building a Swift introspection tool, the `MethodDescriptorFlags` are absolute gold. The `Impl` will give you a virtual address, which you can cross reference to the symbol table to (hopefully) get the name of symbol. Unfortunately, if the symbol table is stripped, you can't resolve the name. Fortunately, you can still get a decent idea of the stripped symbol's function by consulting the `Flags` field. For example, if the `Flag` tells you the method is a **Getter**, then you can look at the assembly of the function to find the **direct field offset** value. Once you know that value, you can cross reference that to the property offset to realize that method is the getter of the Swift property!
1497
+
>**Note:** If you're building a Swift introspection tool, the `MethodDescriptorFlags` are absolute gold. The `Impl` will give you a virtual address, which you can cross reference to the symbol table to (hopefully) get the name of symbol. As you learned earlier, if the symbol table is stripped, you can't resolve the name. Fortunately, you can still get a decent idea of the stripped symbol's function by consulting the `Flags` field. For example, if the `Flag` tells you the method is a **Getter**, then you can look at the assembly of the function to find the **direct field offset** value. Once you know that value, you can cross reference the corresponding property (and it's offset) to realize that method is the getter of that Swift property!
1496
1498
1497
1499
You will programmatically explore the Swift methods implemented in a Swift class. Build out **ex9.swift** with the following code:
1498
1500
@@ -1520,7 +1522,7 @@ Query the location of the `NominalClassDescriptor` via LLDB:
1520
1522
1521
1523
The `image lookup -rs`command will do a regex search forthe symbol "type descriptor" that's constrained to anythingin the ex9 image. This is equivalent to you manually resolving the location of the nominal type descriptor via the relative pointers from `__TEXT.__swift5_types` array in the earlier example.
1522
1524
1523
-
For me, the `NominalClassDescriptor`for`AClass` is at **0x0000000100000f18**. Remember, this class has a size of 0x2c (44) bytes. Resolve this offset via LLDB to grab the `VTableOffset` and `VTableSize`which immediately follow it.
1525
+
For me, the `NominalClassDescriptor`for`AClass` is at **0x0000000100000f18**. Remember, the `NominalClassDescriptor` has a size of 0x2c (44) bytes. Resolve this offset via LLDB to grab the `VTableOffset` and `VTableSize` immediately following it.
1524
1526
1525
1527
```bash
1526
1528
(lldb) x/2wx `0x0000000100000f18 + 44`
@@ -1557,15 +1559,17 @@ Excellent! You were able to resolve this method via Swift metadata to get the ad
1557
1559
0000000100000f4c s method descriptor forex9.AClass.aFunc() -> ()
1558
1560
```
1559
1561
1560
-
Again, both `nm` and the Swift metadata tells us the `aFunc()` will be found at address 0x00000100000d80
1562
+
Again, both `nm` and the Swift metadata tells us the `ex9.AClass.aFunc()` will be found at address 0x00000100000d80
1561
1563
1562
1564
1563
1565
---
1564
1566
<a name="swift_calling_convention"></a>
1565
1567
## 5.3 Swift Calling Convention
1566
1568
---
1567
1569
1568
-
The calling convention differs a bit in Swift in both ARM and x86 families on Apple platforms. If you're totally new to this stuff, I'd recommend reading [Mike Ash](https://twitter.com/mikeash?lang=en)'s [writeup](https://www.mikeash.com/pyblog/objc_msgsends-new-prototype.html) or [this article]https://www.raywenderlich.com/615-assembly-register-calling-convention-tutorial), which explains the C and Objective-C x86_64 calling conventions first.
1570
+
The calling convention differs a bit in Swift in both ARM and x86 families on Apple platforms. If you're totally new to this stuff, I'd recommend reading [Mike Ash](https://twitter.com/mikeash?lang=en)'s [writeup](https://www.mikeash.com/pyblog/objc_msgsends-new-prototype.html) or [this article](https://www.raywenderlich.com/615-assembly-register-calling-convention-tutorial), which explains the C and Objective-C x86_64 calling conventions first.
1571
+
1572
+
Before we can talk about Swift, let's briefly recap the calling convention of Objective-C for x86_64 and ARM64:
1569
1573
1570
1574
Using the `-[NSString writeToFile:atomically:]` method as an example:
1571
1575
@@ -1583,6 +1587,8 @@ X86_64 RDI RSI RDX RCX
1583
1587
1584
1588
If you're a deer in the headlights reading this, please read the above link(s) first.
1585
1589
1590
+
Now onto Swift:
1591
+
1586
1592
Swift changes the `self` around to `R13` on x86_64 and `X20` on ARM64. Since there's no need for an Objective-C `Selector`, the `RSI`/`X1` registers can be used for arguments.
1587
1593
1588
1594
*This means that all arguments for Swift can start at the "first" register (`RDI`/`X0`) and the `self` argument will be at `R13`/`X20`. This has the additional benefit that these registers can survive across calling frames, i.e. they won't get lost after returning from a frame*
@@ -1713,11 +1719,11 @@ This will dump the ARM64 assembly for the "Objective-C viewDidLoad" thunk method
1713
1719
00000001000075b0 ret
1714
1720
```
1715
1721
1716
-
I've added asterisks to the interesting ARM64 assembly instructions. X0 (`self`) will get `retain`'d, X0 will transfer `self` to X20 and then call the Swift side of the `viewDidLoad` at address **0x100007368**. Again, **this method is not visible to the Swift metadata**.
1722
+
I've added asterisks to the interesting ARM64 assembly instructions. `X0` (`self`) will get `retain`'d, `X0` will transfer `self` to `X20` and then call the Swift side of the `viewDidLoad` at address **0x100007368**. Again, **this method is not visible to the Swift metadata**.
1717
1723
1718
1724
For those of you who are introspection tool builders, hopefully you'll see a window to improve your toolset:
1719
1725
* Even though Swift method names can be stripped out, you can infer the names of a lot of these methods using the `MethodDescriptorFlags` flags for methods.
1720
-
* You can use the Objective-C runtime's bridging thunk methods to find "hidden" bridged Swift methods
1726
+
* You can use the Objective-C runtime's bridging thunk methods to find the "hidden" bridged Swift methods
1721
1727
* If you know a stripped symbol is Swift code using the above methods, you can infer there will be a different calling convention in play and can better use this knowledge for your diassembly engine.
1722
1728
1723
1729
I can't wait to see what y'all can do with this in the future 🍻
0 commit comments