Skip to content

Commit f2eec4c

Browse files
committed
Writeup done?
1 parent 2bd2d81 commit f2eec4c

File tree

1 file changed

+29
-23
lines changed

1 file changed

+29
-23
lines changed

docs/index.md

Lines changed: 29 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
# Building a class-dump in <del>2019</del> 2020
44

5-
## This is being actively worked on, this message will disappear when I am happy with the writeup
6-
75
Building out a "class-dump"-like introspection tool for Apple platforms has changed considerably since the original [class-dump](http://stevenygard.com/projects/class-dump/) came out. Learning these new (and old) technologies can be quite intimidating due to the steep learning curve and somewhat hard to find documentation.
86

97
This article *attempts* to explain the complete process of programmatically inspecting a [Mach-O](https://en.wikipedia.org/wiki/Mach-O) (Apple) binary to display the compiled Swift types and Objective-C classes by discussing the following:
@@ -40,7 +38,8 @@ This writeup takes its sweet time explaining things, but there's a lot of concep
4038

4139
If you know most of this stuff, I'd recommend just jumping to the appropriate section that you need to learn.
4240

43-
At the time of writing this [Ghidra](https://www.nsa.gov/resources/everyone/ghidra/), [Frida](https://frida.re), [Hopper](https://www.hopperapp.com), [IDA](https://www.hex-rays.com/products/ida/), [jtool](http://www.newosxbook.com/tools/jtool.html), & friends all could provide better Swift introspection support. If you work on any of the mentioned tools, I'd recommend you follow the Swift parts.
41+
For all you "heavyweights" out there ([Ghidra](https://www.nsa.gov/resources/everyone/ghidra/), [Hopper](https://www.hopperapp.com), [IDA](https://www.hex-rays.com/products/ida/), [jtool](http://www.newosxbook.com/tools/jtool.html), & friends), I recommend you check out the Swift part as I have some suggestions on how to provide better Swift support for your tool.
42+
4443

4544
<p align="center">
4645
<img src="https://media.giphy.com/media/8YmZ14DOpivXMuckSI/giphy.gif" alt="And here we go">
@@ -76,7 +75,7 @@ In the file `<mach-o/loader.h>`, there exists a C struct called **`mach_header_6
7675

7776
> **Note:** When referring to C System headers on your OS X machine, you can usually resolve the header location to the following Terminal command: `echo $(xcrun --show-sdk-path)/usr/include`. This resolves to the base directory search path for C system headers. The resolved filepath of loader.h can be viewed via: `cat $(xcrun --show-sdk-path)/usr/include/mach-o/loader.h | less -R`
7877
79-
Looking at the `mach_header_64` struct, it contains the following:
78+
The `mach_header_64` struct contains the following:
8079

8180
```c
8281
struct mach_header_64 {
@@ -91,7 +90,7 @@ struct mach_header_64 {
9190
};
9291
```
9392

94-
Cross reference this with any *compiled* executable. I'll pick **grep**, feel free to pick anything else:
93+
Cross reference the above `mach_header_64` with any *compiled* executable. I'll pick **grep**, feel free to pick anything else:
9594

9695
```bash
9796
lolgrep:~$ xxd -g 4 -e $(which grep) | head -2
@@ -285,9 +284,9 @@ Section
285284
...
286285
```
287286

288-
From the above output, the first section inside the `__TEXT` segment is a section (confusingly) called `__text`. It's this section where compiled code resides (unless someone is doing something sneaky).
287+
From the above output, the first section inside the `__TEXT` segment is a section (confusingly) called **`__text`**. It's this section where compiled code resides (unless someone is doing something sneaky).
289288

290-
>**Note:** You'll often see both the segment and section grouped together via a period to specify the exact Mach-O location. For example, using the above paragraph, I could also say all compiled code resides in the `__TEXT.__text` section. Most tools out there use this methodology.
289+
>**Note:** You'll often see both the Mach-O segment and section grouped together via a period to specify the exact Mach-O location. For example, using the above paragraph, I could also say all compiled code resides in the `__TEXT.__text` section. Most tools out there use this methodology.
291290
292291
There are many, many interesting Mach-O sections. One could write a novel on just this topic. Again, check out the Mach-O links above to learn more about the different types of Mach-O sections.
293292

@@ -323,7 +322,7 @@ Compile ex.c:
323322
lolgrep:/tmp$ clang ex.c -o ex
324323
```
325324

326-
...then query the `*GlobalInt` integer symbols using the `nm` tool (which displays symbol table information, more on that later)
325+
Query the `*GlobalInt` integer symbols using the **`nm`** tool (which displays symbol table information, more on that later)
327326

328327
```bash
329328
lolgrep:/tmp$ nm -m ex | grep GlobalInt
@@ -688,7 +687,7 @@ Looking at the `nlist_64` value for the `someData` and `someFunction` symbols gi
688687
<pre>
689688
nlist_64 fields: n_value n_type n_sect n_desc n_strx
690689
raw data: 0000000100002018 0f 0a 0000 0000001c _someData
691-
0000000100000f20 0f 01 0000 00000026 _someFunctin
690+
0000000100000f20 0f 01 0000 00000026 _someFunction
692691
</pre>
693692

694693
* The `n_value` will give the virtual address *if the symbol is implemented locally*.
@@ -816,7 +815,7 @@ In a `MH_EXECUTE` type image, any C/Objective-C/Swift function don't need to be
816815

817816
> **Note:** Just because there's no symbol in the symbol table for some code doesn't mean that you can't infer that a function is there. The **`LC_FUNCTION_STARTS`** load command will export a list of all the function/method locations (*only code, NOT data*) that are implemented by an image. This information is formatted in **[ULEB](https://en.wikipedia.org/wiki/ULEB)**. This is useful for debuggers and crash analytics.
818817
819-
What if the above code was compiled as a shared library? What would happen to the symbol table? Compile ex4.c, but now add the `-shared` option:
818+
What if the above code was compiled as a shared library? What would happen to the symbol table? Compile ex4.c, but now add the **`-shared`** option:
820819

821820
```bash
822821
lolgrep:/tmp$ clang -shared ex4.c -o ex4.shared
@@ -927,14 +926,16 @@ It's the dereferenced values, `0x0000000100002148` and the `0x0000000100002198`
927926
(Class) $3 = AnotherClass
928927
```
929928

930-
> **Note:** As of around clang version `clang-1100.0.33.8` (in Xcode 11), the default configuration for compiling the Objective-C `__objc_class_list` Mach-O section was moved from the `__DATA` Mach-O segment to the `__DATA_CONST` Mach-O segment. This change is discussed in the DYLD opcodes part of the writeup, but just be aware that if you have an older version of clang, you'll see `__objc_class_list` in the `__DATA` Mach-O segment.
929+
> **Note:** As of around clang version `clang-1100.0.33.8` (in Xcode 11), the default configuration for compiling the Objective-C `__objc_class_list` Mach-O section was moved from the `__DATA` Mach-O segment to the `__DATA_CONST` Mach-O segment. This "new" Mach-O segment disables write access to areas that only needs to be written upon image loading (via dyld opcodes) and nothing more. Be aware that if you have an older version of clang, you'll see `__objc_class_list` in the `__DATA` Mach-O segment.
931930
932931
---
933932
<a name="objc4"></a>
934933
## 3.2 Objc4
935934
---
936935

937-
It's quite insightful to look at the source code to build Objective-C.
936+
You learned where the Objective-C classes are located in memory and on disk, now it's time to look at the layout of an Objective-C class. There's *much* more info than the `<objc/runtime.h>` header most developers know about.
937+
938+
This Objective-C class layout can be found on Apple's [opensource site](https://opensource.apple.com).
938939

939940
The most recent opensource Objective-C class layout (at the time of writing this) can be found in a header named **[objc4/objc4-756.2/objc-runtime.new.h](https://opensource.apple.com/source/objc4/objc4-756.2/runtime/objc-runtime-new.h.auto.html)**
940941

@@ -1265,8 +1266,9 @@ Build out the following Objective-C file called **ex7.m**:
12651266

12661267
int main () { return 0; }
12671268
```
1269+
In the above code, `SubArray` inherits from `NSArray`, which isn't implemented in your code, but referenced via the `Foundation` module. You'll see dyld binds `NSArray` to `superclass` field of the `SubArray` class.
12681270
1269-
Compile ex7.m, make sure to include the `-fno-pie` option:
1271+
Compile **ex7.m**, make sure to include the `-fno-pie` option:
12701272
12711273
```bash
12721274
lolgrep:/tmp$ clang -fmodules ex7.m -o ex7 -fno-pie
@@ -1360,7 +1362,7 @@ These relative pointers will point to something called a **nominal type descript
13601362

13611363
If you clicked on the above link, that's a little hard on the eyes, right? Figuring out the offsets for C++ classes can be a pain the ass due to inheritance. Fortunately, [Scott Knight](https://twitter.com/sdotknight) provides an *excellent* [article](https://knight.sc/reverse%20engineering/2019/07/17/swift-metadata.html) with simplified C struct offsets. If you're interested in the Swift layouts, I'd strongly suggest you read Scott's work, since Scott does a much better job explaining all the Swift struct layouts. So instead of focusing on all the different structs like Scott, I'll do a deep dive into one struct layout: the layout for Swift classes.
13621364

1363-
Here's the simplified layout for a Swift class in Swift 5
1365+
Here's the simplified layout for a Swift class in Swift 5:
13641366

13651367
```c
13661368
struct NominalClassDescriptor {
@@ -1375,8 +1377,8 @@ struct NominalClassDescriptor {
13751377

13761378
// Implemented in NominalClassDescriptor
13771379
int32_t SuperclassType // The type of the superclass, expressed as a mangled type name
1378-
uint32_t MetadataNegativeSizeInWords
1379-
uint32_t MetadataPositiveSizeInWords
1380+
uint32_t MetadataNegativeSizeInWords // Ignore for this writeup
1381+
uint32_t MetadataPositiveSizeInWords // Ignore for this writeup
13801382
uint32_t NumImmediateMembers // Number of additional members stored after this class (aka NumImmediateMembers * sizeof(void*) payload)
13811383
uint32_t NumFields // Number of properties stored in this class
13821384
uint32_t FieldOffsetVectorOffset; // The offset of the field offset vector for this struct's stored properties in its metadata
@@ -1451,7 +1453,7 @@ BOOM! And that's Swift reflection in a nutshell!
14511453
## 5.2 Swift Methods in a Class
14521454
---
14531455
1454-
The `NominalClassDescriptor` has 11 `int32_t` members, totalling 44 bytes. Immediately following the `NominalClassDescriptor`, there exists a varying amount of data. I won't get into the nitty gritty of this (check out the **TrailingObjects.h** header if you want to learn more), but the prologue of the `NominalClassDescriptor` will look like the following:
1456+
The `NominalClassDescriptor` has 11 `int32_t` members, totalling 44 bytes. Immediately following the `NominalClassDescriptor`, there exists a varying amount of data. I won't get into the nitty gritty of this (check out the [TrailingObjects.h](https://github.com/apple/swift/blob/master/include/swift/ABI/TrailingObjects.h) header if you want to learn more), but the prologue of the `NominalClassDescriptor` will look like the following (provided the class has implemented some methods):
14551457

14561458
```c
14571459
// End of NominalClassDescriptor here...
@@ -1492,7 +1494,7 @@ private:
14921494
};
14931495
```
14941496
1495-
> **Note:** If you're building a Swift introspection tool, the `MethodDescriptorFlags` are absolute gold. The `Impl` will give you a virtual address, which you can cross reference to the symbol table to (hopefully) get the name of symbol. Unfortunately, if the symbol table is stripped, you can't resolve the name. Fortunately, you can still get a decent idea of the stripped symbol's function by consulting the `Flags` field. For example, if the `Flag` tells you the method is a **Getter**, then you can look at the assembly of the function to find the **direct field offset** value. Once you know that value, you can cross reference that to the property offset to realize that method is the getter of the Swift property!
1497+
> **Note:** If you're building a Swift introspection tool, the `MethodDescriptorFlags` are absolute gold. The `Impl` will give you a virtual address, which you can cross reference to the symbol table to (hopefully) get the name of symbol. As you learned earlier, if the symbol table is stripped, you can't resolve the name. Fortunately, you can still get a decent idea of the stripped symbol's function by consulting the `Flags` field. For example, if the `Flag` tells you the method is a **Getter**, then you can look at the assembly of the function to find the **direct field offset** value. Once you know that value, you can cross reference the corresponding property (and it's offset) to realize that method is the getter of that Swift property!
14961498
14971499
You will programmatically explore the Swift methods implemented in a Swift class. Build out **ex9.swift** with the following code:
14981500
@@ -1520,7 +1522,7 @@ Query the location of the `NominalClassDescriptor` via LLDB:
15201522
15211523
The `image lookup -rs` command will do a regex search for the symbol "type descriptor" that's constrained to anything in the ex9 image. This is equivalent to you manually resolving the location of the nominal type descriptor via the relative pointers from `__TEXT.__swift5_types` array in the earlier example.
15221524
1523-
For me, the `NominalClassDescriptor` for `AClass` is at **0x0000000100000f18**. Remember, this class has a size of 0x2c (44) bytes. Resolve this offset via LLDB to grab the `VTableOffset` and `VTableSize` which immediately follow it.
1525+
For me, the `NominalClassDescriptor` for `AClass` is at **0x0000000100000f18**. Remember, the `NominalClassDescriptor` has a size of 0x2c (44) bytes. Resolve this offset via LLDB to grab the `VTableOffset` and `VTableSize` immediately following it.
15241526
15251527
```bash
15261528
(lldb) x/2wx `0x0000000100000f18 + 44`
@@ -1557,15 +1559,17 @@ Excellent! You were able to resolve this method via Swift metadata to get the ad
15571559
0000000100000f4c s method descriptor for ex9.AClass.aFunc() -> ()
15581560
```
15591561
1560-
Again, both `nm` and the Swift metadata tells us the `aFunc()` will be found at address 0x00000100000d80
1562+
Again, both `nm` and the Swift metadata tells us the `ex9.AClass.aFunc()` will be found at address 0x00000100000d80
15611563
15621564
15631565
---
15641566
<a name="swift_calling_convention"></a>
15651567
## 5.3 Swift Calling Convention
15661568
---
15671569
1568-
The calling convention differs a bit in Swift in both ARM and x86 families on Apple platforms. If you're totally new to this stuff, I'd recommend reading [Mike Ash](https://twitter.com/mikeash?lang=en)'s [writeup](https://www.mikeash.com/pyblog/objc_msgsends-new-prototype.html) or [this article]https://www.raywenderlich.com/615-assembly-register-calling-convention-tutorial), which explains the C and Objective-C x86_64 calling conventions first.
1570+
The calling convention differs a bit in Swift in both ARM and x86 families on Apple platforms. If you're totally new to this stuff, I'd recommend reading [Mike Ash](https://twitter.com/mikeash?lang=en)'s [writeup](https://www.mikeash.com/pyblog/objc_msgsends-new-prototype.html) or [this article](https://www.raywenderlich.com/615-assembly-register-calling-convention-tutorial), which explains the C and Objective-C x86_64 calling conventions first.
1571+
1572+
Before we can talk about Swift, let's briefly recap the calling convention of Objective-C for x86_64 and ARM64:
15691573
15701574
Using the `-[NSString writeToFile:atomically:]` method as an example:
15711575
@@ -1583,6 +1587,8 @@ X86_64 RDI RSI RDX RCX
15831587
15841588
If you're a deer in the headlights reading this, please read the above link(s) first.
15851589
1590+
Now onto Swift:
1591+
15861592
Swift changes the `self` around to `R13` on x86_64 and `X20` on ARM64. Since there's no need for an Objective-C `Selector`, the `RSI`/`X1` registers can be used for arguments.
15871593
15881594
*This means that all arguments for Swift can start at the "first" register (`RDI`/`X0`) and the `self` argument will be at `R13`/`X20`. This has the additional benefit that these registers can survive across calling frames, i.e. they won't get lost after returning from a frame*
@@ -1713,11 +1719,11 @@ This will dump the ARM64 assembly for the "Objective-C viewDidLoad" thunk method
17131719
00000001000075b0 ret
17141720
```
17151721
1716-
I've added asterisks to the interesting ARM64 assembly instructions. X0 (`self`) will get `retain`'d, X0 will transfer `self` to X20 and then call the Swift side of the `viewDidLoad` at address **0x100007368**. Again, **this method is not visible to the Swift metadata**.
1722+
I've added asterisks to the interesting ARM64 assembly instructions. `X0` (`self`) will get `retain`'d, `X0` will transfer `self` to `X20` and then call the Swift side of the `viewDidLoad` at address **0x100007368**. Again, **this method is not visible to the Swift metadata**.
17171723
17181724
For those of you who are introspection tool builders, hopefully you'll see a window to improve your toolset:
17191725
* Even though Swift method names can be stripped out, you can infer the names of a lot of these methods using the `MethodDescriptorFlags` flags for methods.
1720-
* You can use the Objective-C runtime's bridging thunk methods to find "hidden" bridged Swift methods
1726+
* You can use the Objective-C runtime's bridging thunk methods to find the "hidden" bridged Swift methods
17211727
* If you know a stripped symbol is Swift code using the above methods, you can infer there will be a different calling convention in play and can better use this knowledge for your diassembly engine.
17221728
17231729
I can't wait to see what y'all can do with this in the future 🍻

0 commit comments

Comments
 (0)