tl;dr
The next release of LIEF (v0.13.0) is fixing several Mach-O layout issues when adding new sections/segments. I also added the support for the two new load commands:
LC_DYLD_CHAINED_FIXUPS
LC_DYLD_EXPORTS_TRIE
The support of LIEF for modifying Mach-O binaries was mostly limited to adding new load commands and thus, extending the load commands table.
The tutorial #11 explains the technical details to extend the load commands table which consists in shifting the content right after the load commands table and patching the relocations accordingly.
Nevertheless, the Mach-O binaries generated by LIEF after the modifications were somehow
inconsistent regarding codesign
. As a consequence, the binaries generated
by LIEF could not be signed and executed on iOS or – more recently – an Apple M1.
To better understand what was wrong, let’s consider the following script in which we add two new segments:
1import lief
2
3target = lief.parse("mbedtls_selftest_arm64.bin")
4
5segment = lief.MachO.SegmentCommand("__NEW", [0] * 0x123)
6target.add(segment)
7
8segment = lief.MachO.SegmentCommand("__NEW", [0] * 0x456)
9target.add(segment)
10
11target.write("test.out")
Under the hood, LIEF was relocating the binary to add two new LC_SEGMENT
commands and was allocating
space at the end of the file to store the content of the new segments.
In particular, the new segments data were located
after the content of the __LINKEDIT
segment which breaks the layout required by codesign
.
The following figure depicts the layout of a Mach-O file from the original layout to the layout generated by LIEF v0.13.0.
In LIEF v0.13.0 we fixed this inconsistency to make sure that the content of the new segments are located
before the content of the __LINKEDIT
segment.
We can perform this change without breaking the binary as __LINKEDIT
is a kind of self-contained blob of data1.
codesign
requires the __LINKEDIT
segment at the end of the file because the signature is appended at the end of the file.
Otherwise, codesign
would have to perform the similar relocation process done by LIEF.
__LINKEDIT
The __LINKEDIT
segment plays an important role in the layout of the Mach-O format and its execution.
This segment is used to store information about the exports, the symbols, the relocations, the signature, and more broadly,
information used by the dyld
loader to load the binary.
This segment has a known layout which is described in the following figure:
This layout is very strict and its content must follow the same order as mentioned in the previous figure.
In addition, there are sanity checks that ensure all the __LINKEDIT
’s chunks are contiguous within the __LINKEDIT
content.
If the layout is wrong, the executable could run but it won’t likely pass the codesign
checks.
This strict layout can be seen – at first sight – as a major hurdle for modifying Mach-O files but since the
__LINKEDIT
segment is located at the end of the file, we can extend it or shrink it quite easily.
Completely regenerating the __LINKEDIT segment enables to perform advanced modifications like creating exports and adding or removing symbols as it is discussed in the next sections.
LC_DYLD_CHAINED_FIXUPS & LC_DYLD_EXPORTS_TRIE
Compared to the ELF and PE formats, the relocations and the exported functions of Mach-O binaries are not wrapped by a table of entries
In the Mach-O format, the relocations are encoded either:
LC_DYLD_INFO
commandLC_DYLD_CHAINED_FIXUPS
On the other hand, the exports are encoded in a Trie located either
LC_DYLD_INFO
commandLC_DYLD_EXPORTS_TRIE
LC_DYLD_CHAINED_FIXUPS
appeared more recently compared to the LC_DYLD_INFO
command for which the differences are
described in the blog post: How iOS 15 makes your app launch faster.
The LC_DYLD_EXPORTS_TRIE
has the same structure as LC_DYLD_INFO[Export Trie]
but the export information
has been moved in this dedicated load command.
Converting a binary into a library can be useful to harness a fuzzed binary or to instrument/debug a specific function in a controlled environment (like an unknown cryptography function or a whiteboxed function)
In the tutorial #8, we described the process to perform this transformation on an ELF binary and the transformation for a Mach-O binary is a bit more straightforward.
Let’s consider the following code:
1#include <stdint.h>
2#include <stdio.h>
3#include <stdlib.h>
4
5static int X = 1;
6
7int compute() {
8 return X++;
9}
10
11int main(int argc, const char** argv) {
12 for (size_t i = 0; i < argc; ++i) {
13 printf("compute(): %d\n", compute());
14 }
15 return 0;
16}
It can be compiled with:
1romain@Mac-M1 % clang -O3 -fvisibility=hidden -Wl,-x -o bin2lib.bin bin2lib.c
Which produces this executable: bin2lib.bin
To convert this binary into a library, we first need to change its type in the Mach-O’s header:
1import lief
2bin2lib = lief.parse("bin2lib.bin")
3
4bin2lib.header.file_type = lief.MachO.FILE_TYPES.DYLIB
5
6bin2lib.write("bin2lib.dyld")
It’s should be technically enough, but dyld_info
raises some concerns:
1romain@Mac-M1 % dyld_info ./bin2lib.dylib
2dyld_info: './bin2lib.dylib' in './bin2lib.dylib' MH_DYLIB is missing LC_ID_DYLIB
This can be confirmed by looking at the source code of dyld.
To fix this error, we just have to create a new LC_ID_DYLIB
command:
1import lief
2bin2lib = lief.parse("bin2lib.bin")
3
4bin2lib.header.file_type = lief.MachO.FILE_TYPES.DYLIB
5+ bin2lib.add(lief.MachO.DylibCommand.id_dylib("bin2lib.dylib", 0, 1, 2))
6
7bin2lib.write("bin2lib.dyld")
Which enables to dlopen bin2lib.dyld
1import ctypes
2handler = ctypes.cdll.LoadLibrary("bin2lib.dyld")
3# <CDLL './bin2lib.dyld', handle 208270460 at 0x107d277f0>
Thanks to the improvements on the __LINKEDIT
segment, we can now create new exports.
If we consider the stripped function int compute()
from the binary in the previous section,
we can create a new export as follows:
address = 0x100003f18
original.add_exported_function(address, "_compute")
Another use case of these improvements is the capability to inject code in Mach-O file and to re-sign the modified binary. Code signing is not required for x86-64 binaries but it becomes mandatory when targeting the arm64 architecture.
Let’s consider the library _heapq.cpython-39-darwin.so
which is one of the first libraries dynamically loaded by the Python
interpreter. The injection consists in:
_heapq.cpython-39-darwin.so
that will embed our shellcodeBy running the python interpreter with the environment variable DYLD_PRINT_APIS=1
we can observe the following
output:
1romain@Mac-M1 ~ % DYLD_PRINT_APIS=1 python3 -c "import io"
2dyld[76439]: _dyld_is_memory_immutable(0x1b3f8cea0, 26) => 1
3dyld[76439]: dlopen("/opt/homebrew/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload/_heapq.cpython-39-darwin.so", 0x00000002)
4dyld[76439]: dlopen(_heapq.cpython-39-darwin.so) => 0x208f35800
5dyld[76439]: dlsym(0x208f35800, "PyInit__heapq")
6dyld[76439]: dlsym("PyInit__heapq") => 0x104bcb824
It suggests that PyInit__heapq
is a suitable function for redirecting the execution to the shellcode’s entrypoint.
To create the shellcode, we can use gdelugre/shell-factory
developed by a former colleague and which provides no less than a C++ STL-like to create shellcode.
Thanks to this project, we can create the following shellcode:
1volatile uintptr_t ORIGINAL_EP = 0xdeadc0de;
2volatile uintptr_t IMAGEBASE = 0x00c0de;
3using PyInit__heapq_t = void(*)();
4
5inline uintptr_t imagebase() {
6 /*
7 * The value of IMAGEBASE is set by the injector.
8 * After the patch, it contains the relative virtual address of &IMAGEBASE
9 * in the final binary.
10 */
11 return reinterpret_cast<uintptr_t>(&IMAGEBASE) - IMAGEBASE;
12}
13
14SHELLCODE_ENTRY
15{
16 uintptr_t base = imagebase();
17 Pico::printf("LIEF says hello!\n");
18 Pico::printf("Time to jump on the real function: %p\n", ORIGINAL_EP);
19 auto PyInit__heapq = reinterpret_cast<PyInit__heapq_t>(base + ORIGINAL_EP);
20 return PyInit__heapq();
21}
Pico::printf
The attentive reader may have noticed the Pico::printf("[...] %p")
which is
correctly supported by shell-factory (see: include/pico/format.h)
The compiled shellcode can be downloaded here: lief_demo_darwin_arm64.bin.
To inject the shellcode in _heapq.cpython-39-darwin.so
, we first need to copy the shellcode’s segments in
the library:
1shellcode = lief.parse("lief_demo_darwin_arm64.bin")
2heapq = lief.parse("_heapq.cpython-39-darwin.so")
3
4for segment in shellcode.segments:
5 seg_name = segment.name.replace("__", "")
6 seg = lief.MachO.SegmentCommand(f"__L{new_seg_name}", list(segment.content))
7
8 heapq.add(new_seg)
Then, we have to patch the Mach-O exports trie to change the address of PyInit__heapq
to the shellcode’s entrypoint:
1shellcode_rva_entry = ...
2for exp in heapq.dyld_info.exports:
3 if exp.symbol.name != "_PyInit__heapq":
4 continue
5
6 original = exp.address
7 exp.address = shellcode_rva_entry
8 return original
Finally, we can rewrite the library:
1heapq.write("_heapq.cpython-39-darwin.so.patched")
and sign it:
1romain@Mac-M1 ~ % codesign -f --verbose -s - _heapq.cpython-39-darwin.so.patched
Now when running the Python interpreter, we can observe the execution of the shellcode:
1romain@Mac-M1 ~ % python3
2LIEF says hello!
3Time to jump on the real function: 0x15f8
4Python 3.9.5 (default, May 3 2021, 19:12:05)
5[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
6Type "help", "copyright", "credits" or "license" for more information.
7>>>
Injection
The script that contains the complete logic of the transformation is available
here and,
_heapq.cpython-39-darwin.so.patched
can be downloaded here.
Surprisingly, we open the patched version of the library (_heapq.cpython-39-darwin.so.patched
)
in IDA and we jump on the symbol _PyInit__heapq
, it actually displays this function:
IDA Version 7.7.211224, January 18, 2022
Which is the original function and not the function associated with the shellcode whilst the patched library prints LIEF says hello [...]
On the other hand, if we get the address of _PyInit__heapq
with LIEF:
1import lief
2patched = lief.parse("./_heapq.cpython-39-darwin.so.patched")
3symbol = patched.get_symbol("_PyInit__heapq")
4print(hex(symbol.export_info.address))
The result is:
_PyInit__heapq: 0xf824
Jumping on this address gives a better output (once manually disassembled):
We recognize the shellcode’s entrypoint function .
What’s happened in IDA since this is the function located at 0xf824
which is executed and thus, resolved by dyld
and not IDA?
IDA is confused because Mach-O’s symbols can be stored in two different commands:
LC_DYLD_INFO.export_trie
/ LC_DYLD_EXPORTS_TRIE
LC_SYMTAB
LC_DYLD_INFO.export_trie
/ LC_DYLD_EXPORTS_TRIE
are used to store the exported symbols while
LC_SYMTAB
stores symbols for other purposes.
The important point is that the same symbol can be duplicated in these two commands with different addresses.
LC_SYMTAB
over the exports trie while
the Mach-O loader uses the exports trie.The following figure illustrates why it can be confusing:
Actually, I intentionally took a shortcut in the LIEF script that resolves the address of _PyInit__heapq
and
we can programmatically access these two addresses as follows:
1import lief
2patched = lief.parse("./_heapq.cpython-39-darwin.so.patched")
3symbol = patched.get_symbol("_PyInit__heapq")
4+ print(hex(symbol.value))
5print(hex(symbol.export_info.address))
6
7+ # 0x15f8 address from the LC_SYMTAB
8 # 0xf824 address from the export trie
Version 3.0
Version 10.1.2 - Jan 26, 2022
Version: 5.6.6 - Mar 22, 2022
1$ r2 _heapq.cpython-39-darwin.so.patched
2[0x00000000]> aaa
3...
4[0x00000000]> ia
5
6[Imports]
7nth vaddr bind type lib name
8―――――――――――――――――――――――――――――――――
90 0x000021ec NONE FUNC PyErr_SetString
101 0x00000000 NONE FUNC PyExc_IndexError
112 0x00000000 NONE FUNC PyExc_RuntimeError
123 0x00000000 NONE FUNC PyExc_TypeError
134 0x000021f8 NONE FUNC PyList_Append
145 0x00002204 NONE FUNC PyList_SetSlice
156 0x00002210 NONE FUNC PyModuleDef_Init
167 0x0000221c NONE FUNC PyModule_AddObject
178 0x00002228 NONE FUNC PyObject_RichCompareBool
189 0x00002234 NONE FUNC PyUnicode_FromString
1910 0x00002240 NONE FUNC _PyArg_CheckPositional
2011 0x0000224c NONE FUNC _Py_Dealloc
2112 0x00000000 NONE FUNC _Py_NoneStruct
2213 0x00000000 NONE FUNC dyld_stub_binder
23
24[Exports]
25
26nth paddr vaddr bind type size lib name
27―――――――――――――――――――――――――――――――――――――――――――――――――――
280 0x000015f8 0x000015f8 GLOBAL FUNC 0 _PyInit__heapq
On the other hand, the afl
command outputs a better result:
1[0x00000000]> afl
20x000015f8 1 12 sym._PyInit__heapq
30x00001604 6 108 sym._heapq_exec
40x00002238 1 8 fcn.00002238
50x00002220 1 8 fcn.00002220
60x00002250 1 8 fcn.00002250
7...
80x0000f824 1 88 sym.imp._PyInit__heapq
These changes strengthen LIEF to read and modify Mach-O binaries. It should enable to develop and create new reverse engineering and binary analysis techniques.
For those who are interested in Mach-O (and ELF) tricks that could prevent static analysis tools from working correctly, I’ll present The Poor Man’s Obfuscator at Pass The Salt in July 2022 :)
In the general case, we can’t insert content between two arbitrary segments as it could break the binary.
For instance, if the __TEXT
segment references variables in the __DATA
segment with relative addressing,
inserting some data between these two segments will likely break the relative addressing. ↩︎