machofile is a module to parse Mach-O binary files
Inspired by Ero Carrera's pefile, this module aims to provide a similar capability but for Mach-O binaries instead. Reference material and documentation used to gain the file format knowledge, the basic structures and constant are taken from the resources listed below.
machofile is self-contained. The module has no dependencies; it is endianness independent; and it works on macOS, Windows, and Linux.
While there are other mach-o parsing modules out there, the motivations behind developing this one are:
- first and foremost, for me this was a great way to deep dive and learn more about the Mach-O format and structures
- to provide a simple way to parse Mach-O files for analysis
- to not depend on external modules (e.g. lief, macholib, macho, etc.), since everything is directly extracted from the file and is all in pure python.
This is the very first/alpha version still (2023.11.04), so please let me know if you try or find bugs but also be gentle ;) code will be optimized and more features will be added in the near future.
Current Features:
- Parse Mach-O Header
- Parse Load Commands
- Parse File Segments
- Parse Dylib Commands
- Parse Dylib List
Note: as of now, this has initially be tested against x86 and x86_64 Mach-O samples.
Next features to be implemented:
- extract Entry Point
- Parse Code Signature information
- Embedded strings
- File Attributes
- data entropy calculation
- flag for suspicious libraries
- Packer detection
- Hashes: dylib hash, import hash, export hash, ...
- prettify output to console
- add output option to yaml and json
- add options to parse only specific structures
Those are the people that I would like to thank for being the inspiration that led me to write this module:
- Ero Carrera (@erocarrera) for writing and maintaining the pefile module
- Patrick Wardle (@patrickwardle) for the great work in sharing his macOS malware analysis and research, and brigning to life OBTS :)
- Greg Lesnewich (@greglesnewich) for his work on macho-similarity
You can either use it from command line or import it as a module in your python code, and call each function individually to parse only the structures you are interested in.
It expect to be supplied with either a file path or a data buffer to parse.
import machofile
macho = MachO(file_path='/path/to/machobinary')
macho = MachO('/path/to/machobinary')
The above two lines are equivalent and would load the Mach-O file and parse it. If the data buffer is already available, it can be supplied directly with:
import machofile
macho = MachO(data=bytes_variable)
You will then need to invoke the parse()
method to start the parsing process,
and can then call each function individually to parse only the structures you are interested in.
macho.parse()
dylib_cmd_list, dylib_lst = macho.get_dylib_commands()
...
From CLI, at the moment it just retrieves all the structures parsed, in the future there will be flags to just get one specific structure or a list of them.
% python3 machofile-cli.py -h
usage: machofile-cli.py [-h] -f FILE [-a] [-i] [-hd] [-l] [-sg] [-d]
Parse Mach-O file structures.
options:
-h, --help show this help message and exit
-f FILE, --file FILE Path to the file to be parsed
-a, --all Print all info about the file
-i, --info Print general info about the file
-hd, --header Print Mach-O header info
-l, --load_cmd_t Print Load Command Table and Command list
-sg, --segments Print File Segments info
-d, --dylib Print Dylib Command Table and Dylib list
Example output:
% python3 machofile-cli.py -a -f b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b
[General File Info]
Filename: b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b
Filesize: 54240
Filetype: Mach-O i386 executable
Flags: <NOUNDEFS|DYLDLINK|TWOLEVEL>
MD5: 20ffe440e4f557b9e03855b5da2b3c9c
SHA1: 1bf61ecad8568a774f9fba726a254a9603d09f33
SHA256: b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b
[Mac-O Header]
magic: MH_MAGIC (32-bit)
cputype: Intel i386
cpusubtype: x86_ALL, x86_64_H, x86_64_LIB64
filetype: MH_EXECUTE
ncmds: 13
sizeofcmds: 1180
flags: MH_NOUNDEFS, MH_DYLDLINK, MH_TWOLEVEL
[Load Cmd table]
{'cmd': 'LC_SEGMENT', 'cmdsize': 56}
{'cmd': 'LC_SEGMENT', 'cmdsize': 192}
{'cmd': 'LC_SEGMENT', 'cmdsize': 328}
{'cmd': 'LC_SEGMENT', 'cmdsize': 192}
{'cmd': 'LC_SEGMENT', 'cmdsize': 56}
{'cmd': 'LC_SYMTAB', 'cmdsize': 24}
{'cmd': 'LC_DYSYMTAB', 'cmdsize': 80}
{'cmd': 'LC_LOAD_DYLINKER', 'cmdsize': 28}
{'cmd': 'LC_UUID', 'cmdsize': 24}
{'cmd': 'LC_UNIXTHREAD', 'cmdsize': 80}
{'cmd': 'LC_LOAD_DYLIB', 'cmdsize': 52}
{'cmd': 'LC_LOAD_DYLIB', 'cmdsize': 52}
{'cmd': 'LC_CODE_SIGNATURE', 'cmdsize': 16}
[Load Commands]
LC_CODE_SIGNATURE
LC_DYSYMTAB
LC_LOAD_DYLIB
LC_LOAD_DYLINKER
LC_SYMTAB
LC_UNIXTHREAD
LC_UUID
[File Segments]
SEGNAME VADDR VSIZE OFFSET SIZE MAX_VM_PROTECTION INITIAL_VM_PROTECTION NSECTS FLAGS
----------------------------------------------------------------------------------------
__PAGEZERO 0 4096 0 0 0 0 0 0
__TEXT 4096 28672 0 28672 7 5 2 0
__DATA 32768 4096 28672 4096 7 3 4 0
__IMPORT 36864 4096 32768 4096 7 7 2 0
__LINKEDIT 40960 20480 36864 17376 7 1 0 0
[Dylib Commands]
DYLIB_NAME_OFFSET DYLIB_TIMESTAMP DYLIB_CURRENT_VERSION DYLIB_COMPAT_VERSION DYLIB_NAME
----------------------------------------------------------------------------------------------------------
24 2 65536 65536 b'/usr/lib/libgcc_s.1.dylib'
24 2 7274759 65536 b'/usr/lib/libSystem.B.dylib'
[Dylib Names]
b'/usr/lib/libgcc_s.1.dylib'
b'/usr/lib/libSystem.B.dylib'
- https://opensource.apple.com/source/xnu/xnu-2050.18.24/EXTERNAL_HEADERS/mach-o/loader.h
- https://github.com/apple-oss-distributions/lldb/blob/10de1840defe0dff10b42b9c56971dbc17c1f18c/llvm/include/llvm/Support/MachO.h
- https://iphonedev.wiki/Mach-O_File_Format
- https://lowlevelbits.org/parsing-mach-o-files/
- https://github.com/aidansteele/osx-abi-macho-file-format-reference
- https://lief-project.github.io/doc/latest/tutorials/11_macho_modification.html
- https://github.com/VirusTotal/yara/blob/master/libyara/include/yara/macho.h
- https://github.com/corkami/pics/blob/master/binary/README.md
- https://github.com/qyang-nj/llios/tree/main