11 - Mach-O Modification

This tutorial covers Mach-O format modification and introduces some internal aspects of the format.

Files and scripts used in this tutorial are available in the tutorials repository.


Introduction

A basic Mach-O binary (i.e., not FAT) can be represented in four parts, as described in this diagram:

../_images/image1.png

The first part begins with a header that can be accessed through the lief.MachO.Binary.header attribute. The second part contains the load commands table, which can be iterated over using lief.MachO.Binary.load_commands. This is optionally followed by padding or free space. Finally, the fourth part contains the raw data (assembly code, rebase bytecode, signatures, etc.).

Load commands such as SegmentCommand or DyldInfo can be associated with raw data located after the load command table and the padding section. The padding section is used by macOS to sign the binary after compilation by adding a custom command. The codesign utility extends the raw data area with the signature and adds an LC_CODE_SIGNATURE or LC_DYLIB_CODE_SIGN_DRS command in the padding area.

Since load commands are the base unit of the Mach-O format—segments, shared libraries, entry points, etc., are all commands—the ability to add arbitrary commands to a binary enables interesting possibilities such as code injection, anti-analysis, etc.

Different techniques exist for adding new commands to a Mach-O binary:

  • Replacing an existing load command that is not mandatory for execution, such as UUIDCommand or CodeSignature.

  • Using the padding area to expand the command header.

The main limitation of these techniques is that the size and number of commands that can be added are tied to the padding section size or the size of the command replaced.

If the padding size is small, we cannot add a LOAD_DYLIB command with a very long library path. Moreover, codesign may complain if there is insufficient space to add the LC_CODE_SIGNATURE because we are using space that was reserved for it.

The following sections discuss format modifications and how LIEF addresses these limitations.

When PIE Makes Things Easier

macOS and iOS executables are typically compiled with flags that make them position-independent. Instructions generated by the compiler use relative addressing associated with rebase information.

To simplify, PIE binaries allow the raw data section to be mapped at a random base address. LIEF leverages this by shifting the raw data section within the format.

../_images/image2.png

Such a transformation also requires maintaining consistent format metadata. Specifically, when we shift the raw data, we must update relocations, segment offsets, virtual addresses, etc. Once the raw data is shifted and the metadata updated, we have arbitrary space between the load command table and the raw data section. Thus, we can extend the load command table as shown in the figure below:

../_images/image3.png

Warning

The size of the shift must be aligned with the page size to avoid issues with section and segment alignment.

Maintaining format consistency after a shift transformation is complex. The next section presents parts of the Mach-O format that must be updated to maintain consistency.

When Mach-O Makes Things Harder

After the shift operation, we must update several load commands:

We also need to update:

  • Relocations

  • Binding information

  • Export information

While the ELF and PE formats use structures for internal storage of relocations and exports, the Mach-O format uses bytecode to rebase the binary. Export information is stored in a trie data structure. The use of tries and bytecode reduces binary size but makes updates more difficult, as we must interpret and regenerate the bytecode.

Rebase Bytecode

As mentioned previously, recent Mach-O loaders use bytecode to relocate (or rebase) the binary. The offset and size of the bytecode are specified in the lief.MachO.DyldInfo.rebase attribute. Basically, the bytecode is composed of REBASE_OPCODES that define addresses to relocate.

Warning

Note that the Section object has a relocation_offset attribute. This appears to be used only for Mach-O object files (lief.MachO.FILE_TYPES.OBJECT) or executables using an old version of the Mach-O loader.

This offset points to a list of relocation structures (not bytecode), the number of which is defined by numberof_relocations.

To determine which addresses must be relocated, we must interpret the bytecode.

The lief.MachO.DyldInfo.show_rebases_opcodes attribute returns the bytecode as pseudo-code:

import lief
app = lief.parse("MachO64_x86-64_binary_id.bin")
print(app.dyld_info.show_rebases_opcodes)
[SET_TYPE_IMM] Type: POINTER
[SET_SEGMENT_AND_OFFSET_ULEB] Segment Index := 2 (__DATA) Segment Offset := 0x20
[DO_REBASE_ULEB_TIMES]
  for i in range(26):
      rebase(POINTER, __DATA, 0x20)
      Segment Offset += 0x8 (0x28)

      rebase(POINTER, __DATA, 0x28)
      Segment Offset += 0x8 (0x30)

      rebase(POINTER, __DATA, 0x30)
      Segment Offset += 0x8 (0x38)

      rebase(POINTER, __DATA, 0x38)
      Segment Offset += 0x8 (0x40)

      rebase(POINTER, __DATA, 0x40)
      Segment Offset += 0x8 (0x48)
      ...
[DONE]

From the output above, we can see that the loader will rebase pointers in the __DATA segment at offsets 0x20, 0x28, 0x38, ....

For those who only care about the exact addresses being relocated, this output is not very user-friendly. LIEF also provides a representation of this bytecode by creating lief.MachO.Relocation objects, which are the result of interpreting the bytecode.

The lief.MachO.Binary.relocations attribute returns an iterator over lief.MachO.Relocation objects that model a relocation, similar to lief.ELF.Relocation and lief.PE.Relocation.

for relocation in app.relocations:
  print(relocation)
100002020 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _err
100002028 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _errx
100002030 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _exit
100002038 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _fprintf
100002040 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _free
100002048 POINTER 64 DYLDINFO  __DATA.__la_symbol_ptr _fwrite
...

Using this representation, we can update relocations by adding the shift size to the lief.MachO.Relocation.address attribute.

When the Mach-O builder reconstructs the final binary, it regenerates and optimizes the rebase bytecode according to the current state of the relocations. The process can be summarized by the following diagram:

../_images/lief_bytecode.png

Binding Bytecode

The Mach-O loader also uses bytecode to bind imported functions or symbols. This bytecode is used in three different binding methods:

  • Normal binding

  • Weak binding (used when the same symbol is defined multiple times)

  • Lazy binding (bound only when the symbol is accessed)

The bytecode can be pretty-printed with show_bind_opcodes, show_weak_bind_opcodes, and show_lazy_bind_opcodes:

print(app.dyld_info.show_bind_opcodes)
[SET_DYLIB_ORDINAL_IMM]
    Library Ordinal := 1
[SET_SYMBOL_TRAILING_FLAGS_IMM]
    Symbol name := ___stderrp
    Is Weak ? false
[SET_TYPE_IMM]
    Type := POINTER
[SET_SEGMENT_AND_OFFSET_ULEB]
    Segment := __DATA
    Segment Offset := 0x10
[DO_BIND]
    bind(POINTER, __DATA, 0x10, ___stderrp, library_ordinal=/usr/lib/libSystem.B.dylib, addend=0, is_weak_import=false)
    Segment Offset += 0x8 (0x18)

The representation and update process are identical to those described in the Rebase Bytecode section.

Export Trie

For exported functions and symbols, the Mach-O format uses a trie structure to store export information. The trie offset and size are specified in the export_trie attribute.

Once parsed, trie entries are represented via the ExportInfo object and can be retrieved using the export_info attribute.

app = lief.parse("FAT_MachO_x86_x86-64_library_libdyld.dylib")
print(app.dyld_info.show_export_trie)
...
_@off.0x17
    _N@off.0x21
        _NS@off.0x50
            _NSI@off.0x5d
                _NSInstallLinkEditErrorHandlers@off.0x11d
                _NSInstallLinkEditErrorHandlers{addr: 0x126b, flags: 0}
...
for s in app.symbols:
  if s.has_export_info:
    print(s.export_info)
Node Offset: 128
Flags:       0
Address:     126b
Symbol:      _NSInstallLinkEditErrorHandlers

Node Offset: 5f6
Flags:       0
Address:     2168
Symbol:      _NSIsSymbolDefinedInObjectFileImage

Node Offset: 1a0
Flags:       0
Address:     1391
Symbol:      _NSIsSymbolNameDefined
...

After the shift operation, export information is patched by updating the address attribute, and a new export trie is generated from the updated data.

Removing the Signature

Removing the LC_CODE_SIGNATURE command is a basic modification that is very useful when modifying Mach-O files. Since the signature verifies the integrity of the binary, this command typically needs to be removed after modifying the file. The binary can be re-signed once all modifications are finished.

LIEF provides the lief.MachO.Binary.remove_signature() function to remove this command:

ssh = lief.parse("/usr/bin/ssh")

ssh.remove_signature()

ssh.write("ssh.nosigned")

Code Injection with Shared Libraries

As explained in the talk on format modification [1], one way to inject code into a program’s memory space is to force the loader to load a library (previously unlinked) that contains a constructor function.

For a Mach-O binary, this can be achieved by adding one of these load commands:

Consider an example using clang. First, we create a small library that defines a constructor:

#include <stdio.h>
#include <stdlib.h>

__attribute__((constructor))
void my_constructor(void) {
  printf("Hello World\n");
}

This is compiled with:

$ clang -fPIC -shared libexample.c -o libexample.dylib

Then, we add a new LOAD_DYLIB using the lief.MachO.Binary.add_library() function:

import lief
clang = lief.parse("/usr/bin/clang")

clang.add_library("/Users/romain/libexample.dylib")

clang.write("/tmp/clang.new")

Finally, we run clang.new and observe that Hello World is printed before the main execution of clang:

$ chmod u+x /tmp/clang.new

$ /tmp/clang.new
Hello World
clang: error: no input files

We can also see the new LOAD_DYLIB command using otool:

$ otool -l /tmp/clang.new|grep -C4 LOAD_DYLIB

...
cmdsize 16
dataoff 73864
datasize 0
Load command 16
        cmd LC_LOAD_DYLIB
    cmdsize 56
       name /Users/romain/libexample.dylib (offset 24)
 time stamp 2 Thu Jan  1 01:00:02 1970
    current version 0.0.0

Adding a Section/Segment

Since we can allocate arbitrary space between the load command table and the raw data, we can also extend an existing LoadCommand. In particular, Mach-O segments are commands associated with the LIEF object lief.MachO.SegmentCommand.

To add a new section to the __TEXT segment, we must extend the load command associated with that segment to accommodate a new section structure. We must also reserve space for the section’s content. Since the content of the __TEXT segment begins at offset 0 and ends somewhere in the raw data, the appropriate place to insert the new content is between the end of the load command table and the beginning of the raw data:

../_images/extendtxt.png

The process described above is implemented via the lief.MachO.Binary.add_section() method.

In this example, we will inject assembly code that executes /bin/sh:

app = lief.parse("MachO64_x86-64_binary_id.bin")

raw_shell = [...] # Assembly code
section = lief.MachO.Section("__shell", raw_shell)

section.alignment = 2
section += lief.MachO.SECTION_FLAGS.SOME_INSTRUCTIONS
section += lief.MachO.SECTION_FLAGS.PURE_INSTRUCTIONS

section = app.add_section(section)
print(section)

We can then change the entry point by setting the lief.MachO.MainCommand.entrypoint attribute:

__TEXT = app.get_segment("__TEXT")
app.main_command.entrypoint = section.virtual_address - __TEXT.virtual_address

Finally, we remove the signature and reconstruct the binary:

app.remove_signature()
app.write("./id.modified")

The execution of id.modified should yield a similar output:

Mac-mini:tmp romain$ ./id.modified
tmp @ [romain] $

You can also check other tools such as optool [2] or insert_dylib [3].

References

API