Documentation is an important aspect of LIEF and since the beginning of the project, I have spent a decent amount of time keeping comprehensive and intuitive documentation.
Usually, I don’t include documentation updates in the changelog but in this case, I thought it could be worth sharing this experience.
LIEF is written in C++ with bindings for Python and Rust. Originally, the documentation was driven by languages API (isolated from each other) and generated by Sphinx with the Breathe plugin to reference the C++ Doxygen domain.
Recently, Rust landed in the arena. Compared to Python and C++, the Rust language embeds a built-in documentation engine to process and generate in-code documentation into html pages.
Given the new Rust bindings and the Rust built-in documentation engine, two questions emerged:
For the first point, I moved from a language-driven documentation structure to a functionality-driven structure. This changes the way the documentation is consumed: Instead of looking for format-specific API for a given language (e.g. Python) you look first for what you want to do (ELF processing, Dyld shared cache parsing) and then you access the language API you are interested in.
So instead of adding another Rust API reference page, the Rust API has been transparently integrated with the new layout.
The second point has been a bit more tricky to approach. With a reverse engineering background, I really value the cross-reference feature provided by Sphinx:
1blah blah blah :py:class:`lief.ELF.Binary` another blah: :cpp:class:`LIEF::ELF::Binary`
Python is a built-in domain supported by Sphinx and Breathe extension is doing the bridge between C++ Doxygen XML files and Sphinx. For Rust, there are some attempts to create a bridge but I decided to take another path. I created a Rust-sphinx domain that cross-references to the official or nightly Rust documentation. Basically with this custom domain, the following cross-references redirect to the official or nightly documentation:
1:rust:module:`lief::assembly`
2:rust:enum:`lief::assembly::Instructions`
Is translated into:
1https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/index.html
2https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/enum.Instructions.html
By doing so, we can leverage Sphinx’s cross-reference functionalities while
still keeping the built-in Rust documentation. In addition to this Rust-specific
domain, I created a .. lief-api::
directive that can pack similar cross-language API
into a single block.
For instance, this directive:
1.. lief-api:: lief.Binary.disassemble()
2
3 :rust:method:`lief::generic::Binary::disassemble [trait]`
4 :rust:method:`lief::generic::Binary::disassemble_symbol [trait]`
5 :rust:method:`lief::generic::Binary::disassemble_address [trait]`
6 :rust:method:`lief::generic::Binary::disassemble_slice [trait]`
7 :cpp:func:`LIEF::Binary::disassemble`
8 :py:meth:`lief.Binary.disassemble`
9 :py:meth:`lief.Binary.disassemble_from_bytes`
Is rendered as:
This allows us to refer the API for different languages without being too verbose and impacting readability. Combined with Sphinx substitution, we can write:
This is an example that cross-reference |lief-disassemble|
.. |lief-disassemble| lief-api:: lief.Binary.disassemble()
:rust:method:`lief::generic::Binary::disassemble [trait]`
:rust:method:`lief::generic::Binary::disassemble_symbol [trait]`
:rust:method:`lief::generic::Binary::disassemble_address [trait]`
:rust:method:`lief::generic::Binary::disassemble_slice [trait]`
:cpp:func:`LIEF::Binary::disassemble`
:py:meth:`lief.Binary.disassemble`
:py:meth:`lief.Binary.disassemble_from_bytes`
You can go checking out this page https://lief.re/doc/latest/formats/pe/index.html to see a concrete rendering of these changes.
Public Release
The extended version is now publicly available at this address: https://extended.lief.re
Adding (or not adding) a disassembler in LIEF has been a long-standing question and with the extended version, I found a fair trade-off:
On the other hand, LIEF extended provides additional functionalities that require a more complex build pipeline and increase the binary size. Among these extended functionalities, ther are a disassembler and an assembler based on the LLVM’s MC layer.
The disassembling API is provided at different levels:
1import lief
2
3pe = lief.PE.parse("cmd.exe")
4for inst in pe.disassemble(0x400000):
5 print(inst)
6
7 # Instruction semantic
8 print(inst.is_syscall)
9 print(inst.is_memory_access)
10 print(inst.is_call)
11
12 # Instruction operands (for AArch64 and x86-64)
13 if isinstance(inst, lief.assembly.aarch64.Instruction):
14 for idx, operand in enumerate(inst.operands):
15 match operand:
16 case lief.assembly.aarch64.operands.Register():
17 print(f"OP[{idx}] -- REG: {operand.value}")
18 case lief.assembly.aarch64.operands.Memory():
19 print(f"OP[{idx}] -- MEM: {operand.base} {operand.offset}")
20 case lief.assembly.aarch64.operands.PCRelative():
21 print(f"OP[{idx}] -- PCR: {operand.value}")
22 case lief.assembly.aarch64.operands.Immediate():
23 print(f"OP[{idx}] -- IMM: {operand.value}")
1import lief
2
3elf = lief.ELF.parse("my-dbg.elf")
4dwarf: lief.dwarf.DebugInfo = elf.debug_info
5func: lief.dwarf.Function = dwarf.find_function("my_debug_function")
6
7for inst in func.instructions:
8 print(inst)
1import lief
2
3cache = lief.dsc.load("ios-18/")
4for inst in cache.disassemble(0x1886f4a44):
5 print(inst)
In terms of implementation, the disassembler wraps a lazy iterator that evaluates/disassembles an instruction only when the iterator is processed. It means that you don’t pay any overhead until you access the iterator’s value:
1# O(0)
2inst = macho.disassemble(0x400000)
3
4inst = macho.disassemble(0x400000)
5# O(10)
6for _ in range(10):
7 next(inst)
The .end()
sentinel of the iterator is based on two properties:
macho.disassemble(0x400000, /*size*/0x1000)
)
and the iterator past the end of the range.This kind of sentinel allows us to use this API: macho.disassemble(0x400000)
which
will disassemble (lazily) instructions at the address 0x400000
until it fails.
C++ & Rust & Python
The disassembler/assembler API is uniformly available in Rust, C++, and Python.
As stated in the documentation
the major design difference with Capstone is that LIEF uses a mainstream version of LLVM
with limited patches1 on the MC layer (the current version is based on LLVM 19.1.2
).
The design difference with Nyxstone is that LLVM is hidden from the public API which means that it does not require to have an LLVM version pre-install on the system. Moreover, it exposes opcodes and control-flow/semantic information about the instructions.
On the other hand, LIEF does not provide a standalone API to disassemble arbitrary instructions. The disassembler engine is bound to the object from which the API is exposed.
In association with a disassembler, LIEF exposes a (basic) assembly API that allows generating and patching instructions:
1import lief
2
3elf = lief.ELF.parse("my-android-obfuscated.so")
4text = elf.get_section(".text")
5# Disassembler
6syscall = [inst for inst in elf.disassemble(bytes(text)) if inst.is_syscall]
7
8# Assembler
9for syscall_inst in syscall:
10 new_bytes = elf.assemble(syscall_inst.address, "nop;") # Assemble AND patch
11 print(new_bytes.hex(", "))
Warning
In this current version, the assembler is working pretty well for
x86/x86_64 and AArch64 but might break on other architectures.
In addition, llvm::MCFixup
are not supported.
This can be used to patch LIEF’s binary object directly at the assembly level.
I have some plans to provide LIEF Binary context to the assembly engine such as if the binary
defines a function call_me()
that is either exported or present in the debug info,
users would be able to leverage this function at the assembly level:
1fn patch_with_context(macho: &mut lief::macho::Binary) {
2 macho.assemble(0x140000090, r#"
3 adrp x0, call_me;
4 add x0, x0, :lo12:call_me;
5 mov x1, 0x90;
6 str x1, [x0];
7 "#r);
8}
And LIEF would handle the relocation/resolution process to instruct LLVM about
the location and the definition of call_me
.
C++ & Rust & Python
The disassembler/assembler API is seamlessly available in Rust, C++, and Python :)
Initial support for processing Apple’s Dyld shared cache with LIEF has been released along with an API to deoptimize in-cache Dylib. The API looks like this:
1import lief
2
3cache = lief.dsc.load("ios-18.1/")
4for dylib in cache.libraries:
5 print(f"0x{dylib.address:016x} {dylib.path}")
6 # Extract the dylib as a regular lief.MachO.Binary
7 macho: lief.MachO.Binary = dylib.get()
Warning
Please note that the deoptimization feature is not working well on all the shared cache libraries. This support is going to be improved over time.
One could also use this API to diff two shared caches:
1use lief;
2let ios_17 = lief::dsc::load_from_path("ios-17.7.1");
3let ios_18 = lief::dsc::load_from_path("ios-18.1.1");
4
5let libraries_17: HashSet<String> = ios_17.libraries()
6 .map(|lib| lib.path())
7 .collect();
8
9let libraries_18: HashSet<String> = ios_18.libraries()
10 .map(|lib| lib.path())
11 .collect();
12
13println!("{:?}", libraries_17.symmetric_difference(&libraries_18))
Rust bindings got their first mutable functions which are listed in the changelog. These mutable functions are limited but they allow us to make basic modifications like adding a library or patching assembly code:
1fn add_library(elf: &mut lief::elf::Binary) {
2 elf.add_library("libtest.so");
3 elf.write("patched.elf");
4}
1fn patch_asm(elf: &mut lief::macho::Binary) {
2 macho.assemble(0x100004090, r#"
3 mov x0, x16;
4 br x0;
5 "#);
6 macho.write("patched.macho");
7}
In addition, the support for the x86_64-unknown-linux-musl
target triple is now available
and the minimal GLIBC version for x86_64-unknown-linux-gnu
has been lowered to
2.28
. It means that Linux Rust bindings can now run on Debian 10, Ubuntu 19.10, … while
before it required Debian 11 or Ubuntu 20.04.
The new x86_64-unknown-linux-musl
triple can be used to generate full static
without any dependencies to the libstdc++, libc, ...
.
For instance, given this code:
1use lief;
2use lief::generic::Section;
3
4fn main() {
5 let path = std::env::args().last().unwrap();
6 let mut file = std::fs::File::open(path).expect("Can't open the file");
7
8 if let Some(lief::Binary::PE(pe)) = lief::Binary::from(&mut file) {
9 for section in pe.sections() {
10 println!(
11 "{:20}: [0x{:016x}-0x{:016x}]",
12 section.name(),
13 section.virtual_address(),
14 section.virtual_address() + section.virtual_size() as u64
15 );
16 }
17 }
18}
We can generate a dependencies-free executable by running:
1$ cargo build [--release] --target x86_64-unknown-linux-musl
1$ ldd target/x86_64-unknown-linux-musl/release/reader
2 statically linked
1$ target/x86_64-unknown-linux-musl/release/reader steam.exe
2.text : [0x0000000000001000-0x00000000002cbe53]
3.rdata : [0x00000000002cc000-0x00000000003a7fa2]
4.data : [0x00000000003a8000-0x000000000043ada0]
5.rsrc : [0x000000000043b000-0x0000000000471b8c]
6.reloc : [0x0000000000472000-0x0000000000490c74]
LIEF is now using nanobind v2.4.0 which improves the support for typing.
Among these typing improvements, C++ enums flags are now properly inheriting
from enum.Flag
which results in a better interface with Python code.
Additionally, typing stub files (*.pyi
) are now generated with the nanobind’s built-in
stubgen.py
instead for mypy.
Additional changes are listed in the detailed changelog.
Many thanks to dornstetter and kohnakagawa for their feedback about the dyld shared cache feature.
Thank you also to Konstantin Vinogradov and dctoralves for their sponsorship.
All the patches have been PR-submitted to the LLVM. You can check LIEF & LLVM for the details ↩︎