Avatar

LIEF v0.15.0

ionicons-v5-k Romain Thomas July 21, 2024
Wave

While this new release adds new functionalities and addresses different bugs, It is worth mentioning that it is the first release to officially expose Rust binding! In addition, an extended version was also released to provide additional functionalities not strictly related to the executable formats.

Rust bindings

As discussed in these blog posts:

  1. LIEF Rust bindings updates
  2. Rust bindings for LIEF

LIEF is now available in Rust for the following architectures:

  • aarch64-unknown-linux-gnu
  • x86_64-apple-darwin
  • x86_64-pc-windows-msvc (MT/MD runtimes)
  • x86_64-unknown-linux-gnu
  • aarch64-apple-ios
  • aarch64-apple-darwin

I published the release on crates.io so you should be able to start using LIEF in Rust with:

1[package]
2name    = "lief-demo"
3version = "0.0.1"
4edition = "2021"
5
6[dependencies]
7lief = "0.15.0"

LIEF Extended

LIEF is now providing additional features thanks to an extended version. Among those features, it provides support for DWARF and PDB debug formats as well as Objective-C metadata.

Objective-C

This support is a kind of spin-off of iCDump which is now completely integrated into LIEF. Compared to the original iCDump project, it fixes the issue with the new chained relocations (c.f. issue#4) format and can be used on all the platforms supported by LIEF (including Windows) in C++/Rust/Python:

Rust:

 1let macho: lief::macho::Binary;
 2
 3if let Some(metadata) = macho.objc_metadata() {
 4    println!("Objective-C metadata found");
 5    for class in metadata.classes() {
 6        println!("name={}", class.name());
 7        for method in class.methods() {
 8            println!("  method.name={}", method.name());
 9        }
10    }
11}

Python:

 1import lief
 2macho: lief.MachO.Binary = ...
 3metadata: lief.objc.Metadata = macho.objc_metadata
 4
 5if metadata is not None:
 6    print("Objective-C metadata found")
 7
 8    for clazz in metadata.classes:
 9        print(f"name={clazz.name}")
10        for meth in clazz.methods:
11            print(f"  method.name={meth.name}")
12
13    # Generate a header like "class-dump"
14    print(metadata.to_decl())

DWARF & PDB

DWARF & PDB Hierarchy

Supporting debug formats like DWARF or PDB has been a long-standing discussion (c.f. issue #17). The main reasons to avoid supporting these formats from scratch were:

  1. The maintenance effort
  2. There already exists libraries to process these debug formats:

On the other hand, I do understand the need to be able to process debug info (if present) from a LIEF binary object. While looking at the API of the different existing projects, I noticed that they are pretty powerful to expose a low-level API that matches the debug format specifications but they don’t provide1 some kind of abstraction over the complexity of these specifications.

Developers and reverse engineers have concepts of compilation units, functions, global variables, stack variables, etc but before being able to access this information from a DWARF or a PDB file, you need to go through what a PDB DBI stream is or understand that the address of a function in DWARF can be determined by either DW_AT_entry_pc or DW_AT_low_pc.

The idea behind the support of the DWARF and PDB formats in LIEF is to:

  1. bridge concepts that make sense to the developers/reverse engineers with their concrete specifications in DWARF/PDB
  2. Have a (documented) C++ API and bindings for Python/Rust.

This LIEF bridge is based on LLVM which did the heavy job of supporting DWARF & PDB within a single framework.

The DWARF & PDB support in LIEF leverages the LLVM API to abstract concepts as listed above.

For instance, you can iterate over all the PDB’s public symbols of the ntoskrnl.pdb through:

1import lief
2
3ntoskrnl: lief.pdb.DebugInfo = lief.pdb.load("./ntoskrnl.pdb")
4
5for sym in ntoskrnl.public_symbols:
6    print(f"{sym.demangled_name}: 0x{sym.RVA:06x}")

If the PDB embeds extended information about the compilation units we can do (in Rust):

1let pdb = lief::pdb::load("peacecannary.pdb");
2for cu in pdb.compilation_units() {
3    for func in cu.functions() {
4        if func.name().starts_with("peacecannary::CObfuscator") {
5            println!("{}: {} (0x{:04x})", cu.module_name(), func.name(), func.rva());
6        }
7    }
8}

The API for the DWARF format is pretty similar:

 1import lief
 2
 3elf: lief.ELF.Binary = ...
 4# If the binary embeds DWARF debug info in the ELF:
 5dwarf: lief.dwarf.DebugInfo = elf.debug_info
 6# Otherwise:
 7dwarf: lief.dwarf.DebugInfo = lief.dwarf.load("my_dwarf.dwarf")
 8
 9for cu in dwarf.compilation_units:
10    print(f"Produced by: {cu.producer} in {cu.compilation_dir}")
11
12    for func in cu.functions:
13        print(f"0x{func.address:04x}: {func.name} ({func.size} bytes)")
14
15    for var in cu.variables:
16        if var.is_constexpr:
17            continue
18        # Look for global variables only
19        if var.address is not None and var.address > 0:
20            print(f"0x{var.address:04x}: {var.linkage_name} ({var.size} bytes)")

For more details about the API, you can take a look at these dedicated sections:

Other Updates

Mach-O AI

LIEF is now powered by AI supporting Apple *.hwx files which are some kind of Mach-O file for the Apple Neural Engine (ANE).

These *.hwx start with a new magic identifier: 0xbeefface and embed custom LC_ command like the command 0x40

LC Command 0x40

I could be interested in adding the support of this private command in LIEF so if anyone already reversed or has some info about the layout of this command, feel free to reach out.

To support unknown or non-public LC commands in LIEF, I created an artificial LIEF::MachO::UnknownCommand which is a placeholder for any Mach-O commands that are not recognized by LIEF.

For instance, we can inspect the private 0x40 command as follows:

1import lief
2target = lief.MachO.parse("personsemantics-u8-v4.H16.espresso.hwx").at(0)
3lc_0x40: lief.MachO.UnknownCommand = macho.commands[18].command
4
5print(lc_0x40.original_command) # Outputs 0x40/61
6print(bytes(lc_0x40.data)) # Print the raw content of the command

These .hwx files have been involved in the Dopamine jailbreak and you can also find a BlackHat presentation about the Apple Neural Engine: Apple Neural Engine Internal.

PE Authenticode

LIEF can inspect and verify the PE Authenticode and with this release, we can even do that in Rust!

 1use lief::pe;
 2
 3let mut file = std::fs::File::open(path).expect("Can't open the file");
 4if let Some(lief::Binary::PE(pe)) = lief::Binary::from(&mut file) {
 5    let result = pe.verify_signature(pe::signature::VerificationChecks::DEFAULT);
 6    if result.is_ok() {
 7        println!("Valid signature!");
 8    } else {
 9        println!("Signature not valid: {}", result);
10    }
11    return ExitCode::SUCCESS;
12}

This new release also adds the support of the Ms-CounterSignture attribute (OID: 1.3.6.1.4.1.311.3.3.1) and some other attributes like Ms-ManifestBinaryID (OID: 1.3.6.1.4.1.311.10.3.28)

ELF

No breaking updates for the ELF format.

LIEF is now able to parse and modify binaries compiled with the new DT_RELR and DT_ANDROID_REL_ relocations.

ELF Dynamic Array Relocated

I also added the helper: LIEF::ELF::Binary::get_relocated_dynamic_array which allows us to get a relocated view of the DT_INIT_ARRAY/DT_FINI_ARRAY.

This can be useful when – for instance – the init array values are null because of relocations:

1import lief
2
3elf: lief.ELF.Binary = ...
4
5# Return: [0, 0, 0, 0, ...]
6elf.get(lief.ELF.DynamicEntry.TAG.INIT_ARRAY).array
7
8# Return relocated values: [0x96db10, 0x9b9c14, 0xe7f660, 0xe7f70c, ...]
9elf.get_relocated_dynamic_array(lief.ELF.DynamicEntry.TAG.INIT_ARRAY)

Enums

Since the beginning of LIEF, all the enums used by the different formats were located in a single header file (e.g. LIEF/PE/enums.hpp or lief.PE.{enums, ...} in Python). Some of them were clashing with system headers that were also #define some of these enums.

To workaround this issue, we had a dirty hack based on LIEF/{ELF.PE,MachO}/undef.h that undefines these values before being included.

In LIEF 0.15.0 the scope of the enums has been redefined so that we should no longer need the undef.h.

For instance the standalone enum LIEF::ELF::ELF_SECTION_TYPES (or lief.ELF.SECTION_TYPES) has been re-scoped in the LIEF::ELF::Section class:

1// <LIEF/ELF/Section.hpp>
2class LIEF_API Section : public LIEF::Section {
3  enum class TYPE : uint64_t {
4    SHT_NULL            = 0,  /**< No associated section (inactive entry). */
5    PROGBITS            = 1,  /**< Program-defined contents. */
6    ...
7  };
8};

This means that instead of using LIEF::ELF::ELF_SECTION_TYPES::SHT_PROGBITS or lief.ELF.SECTION_TYPES.SHT_PROGBITS you should now use:

1- LIEF::ELF::ELF_SECTION_TYPES::SHT_PROGBITS
2+ LIEF::ELF::Section::TYPE::PROGBITS
3
4- lief.ELF.SECTION_TYPES.SHT_PROGBITS
5+ lief.ELF.Section.TYPE.PROGBITS

The list of the enums affected by this change is listed in the changelog.

Performances

PE Parser

I received some feedback about performance issues in the latest release (0.14.x) compared to former releases. This regression affects Mach-O and PE binaries and I’m happy to say that this v0.15.0 release should be faster on ELF, PE, and Mach-O compared to previous releases.

The PE regression comes from the LIEF::PE::OptionalHeader::computed_checksum introduced in LIEF 0.12.0 and discussed in this issue: #660.

As of LIEF 0.12.0, this computed_checksum was computed during the parsing phase, and on large binaries, this computation might have a significant impact on the performances. In LIEF 0.15.0, the OptionalHeader’s checksum can be re-computed over the LIEF::PE::Binary object:

1import lief
2
3pe: lief.PE.Binary = ...
4computed_checksum = pe.compute_checksum()

Thus, avoiding the computation during the parsing phase and moving to an “on-demand” API.

Mach-O Parser

On the other hand, the Mach-O regression was pretty tricky to identify (c.f. issue #1069).

The root cause of the regression was these lines:

1// https://github.com/lief-project/LIEF/blob/0.14.1/src/MachO/BinaryParser.cpp#L285-L290
2for (LARGE_LOOP) {
3  if (!is_printable(name)) {
4    ...
5  }
6}

with is_printable implemented as follows:

1bool is_printable(const std::string& str) {
2  return std::all_of(std::begin(str), std::end(str),
3                     [] (char c) { return std::isprint<char>(c, std::locale("C")); });
4}

Then, while processing large Mach-O binaries with LIEF we can observe:

  • On Linux: No regression
  • On macOS: REGRESSION
  • On Windows: REGRESSION

It turned out that std::locale("C") is cached by the STL on Linux but not on macOS & Windows. This means that we were invoking std::locale("C") for each character of each string (which has a cost).

One solution is to store std::locale("C") in a static variable as it is done – under the hood – in the Linux STL.

1bool is_printable(const std::string& str) {
2  return std::all_of(std::begin(str), std::end(str),
3-                     [] (char c) { return std::isprint<char>(c, std::locale("C")); });
4+                     [] (char c) {
5+                        static std::locale LC("C");
6+                        return std::isprint<char>(c, LC);
7+                     });
8}

This actual fix is slightly different though: 7c3f63194.

Python Wheels

LIEF Python wheels are now available for Musl-based systems. This support is motivated by the fact that Python Docker images tagged with the suffix the -alpine are using Alpine system which is based on Musl libc.

Thus, we can now use Docker’s python-alpine as image base to install LIEF:

1FROM python:3.13.0b3-alpine
2
3RUN pip install --no-cache-dir lief==0.15.0

Note that the LIEF Python wheel for Alpine weighs ~2.5MB compressed and ~7MB decompressed.

Final Words

This new Rust-oriented release is a major milestone for LIEF. While the library is widely used among Python community with ~16,000 daily downloads on PyPI, I’m eager to see new use cases or issues brought by the Rust community.

As a reminder, there is a Discord channel where you can drop your questions, and remarks (that are not issues 😉).

Thank you also to arttson and lexika979, for their sponsorship.


  1. Which makes sense since this is not the purpose of these projects ↩︎

Avatar
Romain Thomas Posted on July 21, 2024