Avatar

LIEF v0.16.0

ionicons-v5-k Romain Thomas December 10, 2024
Wave

Documentation

Documentation is an important aspect of LIEF and since the beginning of the project, I have spent a decent amount of time keeping comprehensive and intuitive documentation.

Usually, I don’t include documentation updates in the changelog but in this case, I thought it could be worth sharing this experience.

LIEF is written in C++ with bindings for Python and Rust. Originally, the documentation was driven by languages API (isolated from each other) and generated by Sphinx with the Breathe plugin to reference the C++ Doxygen domain.

Recently, Rust landed in the arena. Compared to Python and C++, the Rust language embeds a built-in documentation engine to process and generate in-code documentation into html pages.

Given the new Rust bindings and the Rust built-in documentation engine, two questions emerged:

  1. Do we want to add (yet) another API page for Rust?
  2. How do we reference Rust API in Sphinx?

For the first point, I moved from a language-driven documentation structure to a functionality-driven structure. This changes the way the documentation is consumed: Instead of looking for format-specific API for a given language (e.g. Python) you look first for what you want to do (ELF processing, Dyld shared cache parsing) and then you access the language API you are interested in.

So instead of adding another Rust API reference page, the Rust API has been transparently integrated with the new layout.

Documentation layout changes

The second point has been a bit more tricky to approach. With a reverse engineering background, I really value the cross-reference feature provided by Sphinx:

1blah blah blah :py:class:`lief.ELF.Binary` another blah: :cpp:class:`LIEF::ELF::Binary`

Python is a built-in domain supported by Sphinx and Breathe extension is doing the bridge between C++ Doxygen XML files and Sphinx. For Rust, there are some attempts to create a bridge but I decided to take another path. I created a Rust-sphinx domain that cross-references to the official or nightly Rust documentation. Basically with this custom domain, the following cross-references redirect to the official or nightly documentation:

1:rust:module:`lief::assembly`
2:rust:enum:`lief::assembly::Instructions`

Is translated into:

1https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/index.html
2https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/enum.Instructions.html

By doing so, we can leverage Sphinx’s cross-reference functionalities while still keeping the built-in Rust documentation. In addition to this Rust-specific domain, I created a .. lief-api:: directive that can pack similar cross-language API into a single block.

For instance, this directive:

1.. lief-api:: lief.Binary.disassemble()
2
3    :rust:method:`lief::generic::Binary::disassemble [trait]`
4    :rust:method:`lief::generic::Binary::disassemble_symbol [trait]`
5    :rust:method:`lief::generic::Binary::disassemble_address [trait]`
6    :rust:method:`lief::generic::Binary::disassemble_slice [trait]`
7    :cpp:func:`LIEF::Binary::disassemble`
8    :py:meth:`lief.Binary.disassemble`
9    :py:meth:`lief.Binary.disassemble_from_bytes`

Is rendered as:

Documentation layout changes

This allows us to refer the API for different languages without being too verbose and impacting readability. Combined with Sphinx substitution, we can write:

This is an example that cross-reference |lief-disassemble|

.. |lief-disassemble| lief-api:: lief.Binary.disassemble()

    :rust:method:`lief::generic::Binary::disassemble [trait]`
    :rust:method:`lief::generic::Binary::disassemble_symbol [trait]`
    :rust:method:`lief::generic::Binary::disassemble_address [trait]`
    :rust:method:`lief::generic::Binary::disassemble_slice [trait]`
    :cpp:func:`LIEF::Binary::disassemble`
    :py:meth:`lief.Binary.disassemble`
    :py:meth:`lief.Binary.disassemble_from_bytes`

You can go checking out this page https://lief.re/doc/latest/formats/pe/index.html to see a concrete rendering of these changes.

Extended Features

Public Release

The extended version is now publicly available at this address: https://extended.lief.re

Assembler & Disassembler

Adding (or not adding) a disassembler in LIEF has been a long-standing question and with the extended version, I found a fair trade-off:

LIEF core focuses on executable formats, free from any extra features that might have a significant impact on the build complexity or library size.

On the other hand, LIEF extended provides additional functionalities that require a more complex build pipeline and increase the binary size. Among these extended functionalities, ther are a disassembler and an assembler based on the LLVM’s MC layer.

The disassembling API is provided at different levels:

LIEF::Binary

 1import lief
 2
 3pe = lief.PE.parse("cmd.exe")
 4for inst in pe.disassemble(0x400000):
 5    print(inst)
 6
 7    # Instruction semantic
 8    print(inst.is_syscall)
 9    print(inst.is_memory_access)
10    print(inst.is_call)
11
12    # Instruction operands (for AArch64 and x86-64)
13    if isinstance(inst, lief.assembly.aarch64.Instruction):
14        for idx, operand in enumerate(inst.operands):
15            match operand:
16              case lief.assembly.aarch64.operands.Register():
17                  print(f"OP[{idx}] -- REG: {operand.value}")
18              case lief.assembly.aarch64.operands.Memory():
19                  print(f"OP[{idx}] -- MEM: {operand.base} {operand.offset}")
20              case lief.assembly.aarch64.operands.PCRelative():
21                  print(f"OP[{idx}] -- PCR: {operand.value}")
22              case lief.assembly.aarch64.operands.Immediate():
23                  print(f"OP[{idx}] -- IMM: {operand.value}")

LIEF::dwarf::Function

1import lief
2
3elf = lief.ELF.parse("my-dbg.elf")
4dwarf: lief.dwarf.DebugInfo = elf.debug_info
5func: lief.dwarf.Function = dwarf.find_function("my_debug_function")
6
7for inst in func.instructions:
8    print(inst)

LIEF::dsc::DyldSharedCache

1import lief
2
3cache = lief.dsc.load("ios-18/")
4for inst in cache.disassemble(0x1886f4a44):
5    print(inst)

In terms of implementation, the disassembler wraps a lazy iterator that evaluates/disassembles an instruction only when the iterator is processed. It means that you don’t pay any overhead until you access the iterator’s value:

1# O(0)
2inst = macho.disassemble(0x400000)
3
4inst = macho.disassemble(0x400000)
5# O(10)
6for _ in range(10):
7  next(inst)

The .end() sentinel of the iterator is based on two properties:

  1. Either a range is specified (e.g. macho.disassemble(0x400000, /*size*/0x1000)) and the iterator past the end of the range.
  2. The instruction can’t be disassembled.

This kind of sentinel allows us to use this API: macho.disassemble(0x400000) which will disassemble (lazily) instructions at the address 0x400000 until it fails.

C++ & Rust & Python

The disassembler/assembler API is uniformly available in Rust, C++, and Python.

Capstone? Nyxstone?

As stated in the documentation the major design difference with Capstone is that LIEF uses a mainstream version of LLVM with limited patches1 on the MC layer (the current version is based on LLVM 19.1.2).

The design difference with Nyxstone is that LLVM is hidden from the public API which means that it does not require to have an LLVM version pre-install on the system. Moreover, it exposes opcodes and control-flow/semantic information about the instructions.

On the other hand, LIEF does not provide a standalone API to disassemble arbitrary instructions. The disassembler engine is bound to the object from which the API is exposed.

Assembler

In association with a disassembler, LIEF exposes a (basic) assembly API that allows generating and patching instructions:

 1import lief
 2
 3elf = lief.ELF.parse("my-android-obfuscated.so")
 4text = elf.get_section(".text")
 5# Disassembler
 6syscall = [inst for inst in elf.disassemble(bytes(text)) if inst.is_syscall]
 7
 8# Assembler
 9for syscall_inst in syscall:
10    new_bytes = elf.assemble(syscall_inst.address, "nop;") # Assemble AND patch
11    print(new_bytes.hex(", "))

Warning

In this current version, the assembler is working pretty well for x86/x86_64 and AArch64 but might break on other architectures. In addition, llvm::MCFixup are not supported.

This can be used to patch LIEF’s binary object directly at the assembly level. I have some plans to provide LIEF Binary context to the assembly engine such as if the binary defines a function call_me() that is either exported or present in the debug info, users would be able to leverage this function at the assembly level:

1fn patch_with_context(macho: &mut lief::macho::Binary) {
2  macho.assemble(0x140000090, r#"
3    adrp x0, call_me;
4    add x0, x0, :lo12:call_me;
5    mov x1, 0x90;
6    str x1, [x0];
7  "#r);
8}

And LIEF would handle the relocation/resolution process to instruct LLVM about the location and the definition of call_me.

C++ & Rust & Python

The disassembler/assembler API is seamlessly available in Rust, C++, and Python :)

Dyld Shared Cache

Initial support for processing Apple’s Dyld shared cache with LIEF has been released along with an API to deoptimize in-cache Dylib. The API looks like this:

1import lief
2
3cache = lief.dsc.load("ios-18.1/")
4for dylib in cache.libraries:
5    print(f"0x{dylib.address:016x} {dylib.path}")
6    # Extract the dylib as a regular lief.MachO.Binary
7    macho: lief.MachO.Binary = dylib.get()

Warning

Please note that the deoptimization feature is not working well on all the shared cache libraries. This support is going to be improved over time.

One could also use this API to diff two shared caches:

 1use lief;
 2let ios_17 = lief::dsc::load_from_path("ios-17.7.1");
 3let ios_18 = lief::dsc::load_from_path("ios-18.1.1");
 4
 5let libraries_17: HashSet<String> = ios_17.libraries()
 6                                          .map(|lib| lib.path())
 7                                          .collect();
 8
 9let libraries_18: HashSet<String> = ios_18.libraries()
10                                          .map(|lib| lib.path())
11                                          .collect();
12
13println!("{:?}", libraries_17.symmetric_difference(&libraries_18))

Rust

Rust bindings got their first mutable functions which are listed in the changelog. These mutable functions are limited but they allow us to make basic modifications like adding a library or patching assembly code:

1fn add_library(elf: &mut lief::elf::Binary) {
2  elf.add_library("libtest.so");
3  elf.write("patched.elf");
4}
1fn patch_asm(elf: &mut lief::macho::Binary) {
2  macho.assemble(0x100004090, r#"
3    mov x0, x16;
4    br x0;
5  "#);
6  macho.write("patched.macho");
7}

In addition, the support for the x86_64-unknown-linux-musl target triple is now available and the minimal GLIBC version for x86_64-unknown-linux-gnu has been lowered to 2.28. It means that Linux Rust bindings can now run on Debian 10, Ubuntu 19.10, … while before it required Debian 11 or Ubuntu 20.04.

The new x86_64-unknown-linux-musl triple can be used to generate full static without any dependencies to the libstdc++, libc, ....

For instance, given this code:

 1use lief;
 2use lief::generic::Section;
 3
 4fn main() {
 5    let path = std::env::args().last().unwrap();
 6    let mut file = std::fs::File::open(path).expect("Can't open the file");
 7
 8    if let Some(lief::Binary::PE(pe)) = lief::Binary::from(&mut file) {
 9        for section in pe.sections() {
10            println!(
11                "{:20}: [0x{:016x}-0x{:016x}]",
12                section.name(),
13                section.virtual_address(),
14                section.virtual_address() + section.virtual_size() as u64
15            );
16        }
17    }
18}

We can generate a dependencies-free executable by running:

1$ cargo build [--release] --target x86_64-unknown-linux-musl
1$ ldd target/x86_64-unknown-linux-musl/release/reader
2      statically linked
1$ target/x86_64-unknown-linux-musl/release/reader steam.exe
2.text               : [0x0000000000001000-0x00000000002cbe53]
3.rdata              : [0x00000000002cc000-0x00000000003a7fa2]
4.data               : [0x00000000003a8000-0x000000000043ada0]
5.rsrc               : [0x000000000043b000-0x0000000000471b8c]
6.reloc              : [0x0000000000472000-0x0000000000490c74]

Python Bindings

LIEF is now using nanobind v2.4.0 which improves the support for typing.

Among these typing improvements, C++ enums flags are now properly inheriting from enum.Flag which results in a better interface with Python code.

Additionally, typing stub files (*.pyi) are now generated with the nanobind’s built-in stubgen.py instead for mypy.

Final Words

Additional changes are listed in the detailed changelog.

Many thanks to dornstetter and kohnakagawa for their feedback about the dyld shared cache feature.

Thank you also to Konstantin Vinogradov and dctoralves for their sponsorship.


  1. All the patches have been PR-submitted to the LLVM. You can check LIEF & LLVM for the details ↩︎

Avatar
Romain Thomas Posted on December 10, 2024