Avatar

Rust bindings for LIEF

ionicons-v5-k Romain Thomas April 28, 2024
Wave

LIEF Rust bindings are now available. This blog post introduces these bindings and the technical challenges behind this journey.

tl;dr

1[package]
2name    = "lief-demo"
3version = "0.0.1"
4edition = "2021"
5
6[dependencies]
7lief = { git = "https://github.com/lief-project/LIEF", branch = "main"}
 1use lief::Binary;
 2
 3fn main() {
 4  let mut file = File::open(path).expect("Can't open the file");
 5
 6  match Binary::from(&mut file) {
 7          Some(Binary::ELF(elf)) => {
 8            for section in elf.sections() {
 9              println!("{}: 0x{:x}", section.name(), section.virtual_address());
10            }
11          },
12          Some(Binary::PE(pe)) => {
13            // ...
14          },
15          Some(Binary::MachO(macho)) => {
16            // ...
17          },
18          None => {
19            // Parsing error
20          }
21      }
22}

Nightly documentation is available here: https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/index.html and the package will be published on https://crates.io/crates/lief for the 0.15.0 release.

Introduction

It has been a long journey to have Rust bindings for LIEF, and I’m happy to announce that these bindings are starting to be ready for public release.

I’ll take this blog post as an opportunity to share the different challenges that led me to the current design of the bindings. I’m not a rust guru, so feel free to share your feedback or suggestions!

Idiomacy

First off, I’m attached to have bindings that are idiomatic in the language they target. Reaching the current state of the Rust API took me most of the time during the development. The Rust language introduces new concepts that do not exactly match what we can find in object-oriented languages. You can get an idea of the Rust API with these examples.

Iterate over ELF sections

1use lief::Binary;
2use lief::generic::Section; // for the "abstract" traits
3
4let path = std::env::args().last().unwrap();
5if let Some(Binary::ELF(elf)) = Binary::parse(path.as_str()) {
6    for section in elf.sections() {
7        println!("{}", section.name());
8    }
9}

Get PE PDB path

 1use lief::Binary;
 2use lief::pe::debug::Entries::CodeViewPDB;
 3
 4if let Some(Binary::PE(pe)) = Binary::parse(path.as_str()) {
 5    for entry in pe.debug() {
 6        if let CodeViewPDB(pdb_view) = entry {
 7            println!("{}", pdb_view.filename());
 8        }
 9    }
10}

Access Mach-O Dyld Info

 1use lief::Binary;
 2use lief::macho::commands::Commands;
 3use lief::macho::binding_info::BindingInfo;
 4
 5if let Some(Binary::MachO(fat)) = Binary::parse(path.as_str()) {
 6    for macho in fat.iter() {
 7
 8        // First version, iterate over the commands
 9        for cmd in macho.commands() {
10            // Alternative to `if let` pattern
11            match cmd {
12                Commands::DyldInfo(dyld_info) => {
13                    for binding in dyld_info.bindings() {
14                        if let BindingInfo::Chained(chained) = binding {
15                            println!("Library: 0x{:x}", chained.address());
16                        }
17                    }
18                }
19                _ => {}
20            }
21        }
22
23        // Second version, using the helper
24        if let Some(dyld_info) = macho.dyld_info() {
25            for binding in dyld_info.bindings() {
26                if let BindingInfo::Chained(chained) = binding {
27                    println!("Library: 0x{:x}", chained.address());
28                }
29            }
30        }
31    }
32}

Given this idiomatic goal, there were some challenges in exposing C++ code to Rust.

Polymorphism & Inheritance

How to idiomatically bind this C++ code in Rust?

 1class Base {
 2  virtual std::string get_name() {
 3    return "Base";
 4  }
 5};
 6
 7class Derived : public Base {
 8  virtual std::string get_name() {
 9    return "Derived";
10  }
11};
12
13class OtherDerived : public Base {
14  virtual std::string get_name() {
15    return "OtherDerived";
16  }
17};

For the inheritance relationship, the idea is to leverage Rust’s enum structure in which, all the leaves of the inheritance tree are an entry of the enum:

1pub enum Inheritance {
2    Derived(Derived),
3    OtherDerived(OtherDerived),
4}

Secondly, all these objects inherit and share the get_name() virtual function. To provide this shared property in Rust, we can leverage a Rust trait that would make get_name available for the structures that implement this trait:

 1pub trait AsBase {
 2    fn get_name(&self) -> String;
 3}
 4
 5impl AsBase for Derived {
 6    fn get_name(&self) -> String {
 7        ...
 8    }
 9}
10
11impl AsBase for OtherDerived {
12    fn get_name(&self) -> String {
13        ...
14    }
15}

One can also simplify the definition of the trait such as the derived objects only have to provide the FFI reference to the base class:

 1pub trait AsBase {
 2-    fn get_name(&self) -> String;
 3+    fn as_base(&self) -> ffi::BaseImpl;
 4+
 5+    fn get_name(&self) -> String {
 6+        self.as_base().get_name().to_string()
 7+    }
 8}
 9
10impl AsBase for Derived {
11-    fn get_name(&self) -> String {
12+    fn as_base(&self) -> ffi::BaseImpl {
13        ...
14    }
15}
16
17impl AsBase for OtherDerived {
18    fn get_name(&self) -> String {
19        ...
20    }
21}

LIEF’s Rust bindings highly rely on these patterns to expose classes with polymorphism and inheritance properties.

Lifetime

In C++, we don’t have the concept of a lifetime for an object. For instance, it’s perfectly fine to write this code:

1int main() {
2  LIEF::PE::Binary* pe = nullptr;
3  {
4    std::unique_ptr<LIEF::PE::Binary> pe_unique = LIEF::PE::Parser::parse("...");
5    pe = pe_unique.get();
6  }
7  printf("%s\n", pe->get_section(".text").name()); // Use-after-free
8  return 0;
9}

Nevertheless, the pe pointer used in printf is no longer valid because of the scope of the std::unique_ptr.

In Python, Nanobind and Pybind11 provide helpers to define the lifetime of an object according to its parent or its scope:

1nb::class<LIEF::PE::Binary>(m, "Binary")
2    .def_prop_ro("sections",
3        nb::overload_cast<>(&Binary::sections),
4        nb::keep_alive<0, 1>())

With nb::keep_alive, we indicate that the lifetime of the PE section iterator must be at least as long as the lifetime of the PE Binary instance.

In Rust, we could express this lifetime with something like:

 1pub struct Iterator<'a> {
 2    pub it: ffi::Impl,
 3}
 4
 5impl<'a> Iterator<'a> {
 6    pub fn new(it: ffi::Impl) -> Self {
 7        Self {
 8            it,
 9        }
10    }
11}
12
13
14impl Binary {
15    pub fn get_iterator(&'a self) {
16        Iterator::new(self.get_ffi_impl())
17    }
18}

But this code is not correct since the lifetime <'a> of the Iterator structure is not bound to an attribute in the structure. For technical-ffi reasons, we can’t bind this lifetime to ffi::Impl.

One solution consists of using PhantomData to provide the lifetime semantic:

1pub struct Iterator<'a> {
2    pub it: ffi::Impl,
3    _owner: PhantomData<&'a ffi::PE_Binary>,
4}

Safety First!

LIEF is developed in what we could say, an “unsafe” language (i.e. C++). On the other hand, Rust provides strong guarantees about memory, concurrency, …

Even though LIEF’s core can’t provide the safety guarantees that Rust is giving, I tried to provide some guarantees about the bindings.

Coverage

65% of the functions exposed by the Rust binding are covered by the test suite and you can access the coverage report here: https://lief-rs.s3.fr-par.scw.cloud/coverage/index.html (nightly generated).

Coverage

By covered I mean: "the function that bridges from C++ to Rust is executed in the test suite".

ASAN

Regarding memory safety, Rust allows to compile packages with ASAN thanks to compiler options:

export RUSTFLAGS="-Z sanitizer=address -Clink-args=-fsanitize=address"
export TARGET_CXXFLAGS="-fsanitize=address -fno-omit-frame-pointer -O1"
...

Thus, we also leverage this option to compile both: LIEF core and the Rust binding with ASAN.

Given the fact that 65% of the functions and 70% of the lines are test-covered, running these tests with ASAN gives us some confidence about the fact that the bindings do not introduce leaks or memory issues.

I don’t pretend that the code is free of bugs but at least these mechanisms are in place in the development cycle of the project.

Compilation

The bindings rely on autocxx to automatically generate rust FFI code from existing C++ include file. Autocxx is powerful but it can fail to process complex headers like LIEF/ELF/Binary.hpp. Thus, I had to create some kind of wrapper over the existing LIEF/*.hpp header files such as autocxx can process them. These wrappers are available in the directory api/rust/include/

overview

The time to generate the Rust FFI code for the different C++ headers is significant: about ~50s with the current bindings. This generation time can be problematic for the end user especially if LIEF is indirectly imported from other dependencies. On the other hand, for fixed versions of LIEF, cxxgen and, autocxx, the code generated by cxxgen and autocxx is always the same. Thus, we can pregenerate and precompile these files to save time during the pure-rust compilation step.

Docker

All the different steps mentioned in the previous parts: pre-compilation, ASAN, and code coverage are CI-compiled and fully Dockerized.

It might also be worth mentioning that the pre-compiled FFI artifacts are also compiled and cross-compiled with Docker. Yes, cross-compiled.

Cross-Compilation & CI

Digression

Feel free to skip this part which is not strictly related to LIEF & Rust.

LIEF uses Github Actions for the CI and from my experience, macOS and Windows runners are less available than Linux runners (i.e. you wait more for these runners). In addition, if you use these runners for a private repository (which is not the case for LIEF), you have a pool of 2000 minutes for the CI of the private repo. Depending on the runner you are using, these minutes are counted with a multiplier1:

Operating systemMinute multiplier
Linux1
Windows2
OSX10

1 minute spent on a macOS runner is equivalent to 10 minutes spent on a Linux runner. Hence, if your private project is exclusively using the macOS runner, you don’t have 2000 minutes (~33h) but 200 minutes (~3h).

And then, after this pool of 2000 minutes, 1 minute on a macOS 6 vCPU is priced at 0.16$ while the same minute on a Linux 8 vCPU is priced at 0.032$.

Given those facts, cross-compiling for macOS and Windows can be interesting. LLVM provides all the facilities to perform this cross-compilation2 and since we are only generating static libraries, we don’t even need the libraries for these platforms.

So yes, LIEF core and the Rust FFI library are cross-compiled for Windows(MT/MD CRT) and OSX(aarch64, x86_64) with a Docker container running on Linux :)

The Windows and OSX runners are only used for testing that the cross-compilation worked well (i.e. ld64 can link.exe can link the cross-compiled libraries) and that the test suite is also working.

Long story short, we save resources and CI minutes by cross-compiling for Windows and OSX in a Docker running on a Linux runner. As a side effect, we also get fully reproducible builds. The whole pipeline (LIEF core compilation, ASAN, coverage, S3 upload) takes less than 15 minutes (with cache optimizations).

Other Projects

LIEF Rust bindings might not be suitable for all the projects. Especially, if you are looking for a pure-safety-rust library or a #![no_std] context, please consider using these alternatives which are the standards libraries in Rust:

Acknowledgement

Thank you to Erynian for the initial introduction of autocxx, back in the days I was working at Quarkslab 😉

Avatar
Romain Thomas Posted on April 28, 2024