LIEF Rust bindings are now available. This blog post introduces these bindings and the technical challenges behind this journey.
1[package]
2name = "lief-demo"
3version = "0.0.1"
4edition = "2021"
5
6[dependencies]
7lief = { git = "https://github.com/lief-project/LIEF", branch = "main"}
1use lief::Binary;
2
3fn main() {
4 let mut file = File::open(path).expect("Can't open the file");
5
6 match Binary::from(&mut file) {
7 Some(Binary::ELF(elf)) => {
8 for section in elf.sections() {
9 println!("{}: 0x{:x}", section.name(), section.virtual_address());
10 }
11 },
12 Some(Binary::PE(pe)) => {
13 // ...
14 },
15 Some(Binary::MachO(macho)) => {
16 // ...
17 },
18 None => {
19 // Parsing error
20 }
21 }
22}
Nightly documentation is available here: https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/index.html
and the package will be published on https://crates.io/crates/lief for the
0.15.0
release.
It has been a long journey to have Rust bindings for LIEF, and I’m happy to announce that these bindings are starting to be ready for public release.
I’ll take this blog post as an opportunity to share the different challenges that led me to the current design of the bindings. I’m not a rust guru, so feel free to share your feedback or suggestions!
First off, I’m attached to have bindings that are idiomatic in the language they target. Reaching the current state of the Rust API took me most of the time during the development. The Rust language introduces new concepts that do not exactly match what we can find in object-oriented languages. You can get an idea of the Rust API with these examples.
1use lief::Binary;
2use lief::generic::Section; // for the "abstract" traits
3
4let path = std::env::args().last().unwrap();
5if let Some(Binary::ELF(elf)) = Binary::parse(path.as_str()) {
6 for section in elf.sections() {
7 println!("{}", section.name());
8 }
9}
1use lief::Binary;
2use lief::pe::debug::Entries::CodeViewPDB;
3
4if let Some(Binary::PE(pe)) = Binary::parse(path.as_str()) {
5 for entry in pe.debug() {
6 if let CodeViewPDB(pdb_view) = entry {
7 println!("{}", pdb_view.filename());
8 }
9 }
10}
1use lief::Binary;
2use lief::macho::commands::Commands;
3use lief::macho::binding_info::BindingInfo;
4
5if let Some(Binary::MachO(fat)) = Binary::parse(path.as_str()) {
6 for macho in fat.iter() {
7
8 // First version, iterate over the commands
9 for cmd in macho.commands() {
10 // Alternative to `if let` pattern
11 match cmd {
12 Commands::DyldInfo(dyld_info) => {
13 for binding in dyld_info.bindings() {
14 if let BindingInfo::Chained(chained) = binding {
15 println!("Library: 0x{:x}", chained.address());
16 }
17 }
18 }
19 _ => {}
20 }
21 }
22
23 // Second version, using the helper
24 if let Some(dyld_info) = macho.dyld_info() {
25 for binding in dyld_info.bindings() {
26 if let BindingInfo::Chained(chained) = binding {
27 println!("Library: 0x{:x}", chained.address());
28 }
29 }
30 }
31 }
32}
Given this idiomatic goal, there were some challenges in exposing C++ code to Rust.
How to idiomatically bind this C++ code in Rust?
1class Base {
2 virtual std::string get_name() {
3 return "Base";
4 }
5};
6
7class Derived : public Base {
8 virtual std::string get_name() {
9 return "Derived";
10 }
11};
12
13class OtherDerived : public Base {
14 virtual std::string get_name() {
15 return "OtherDerived";
16 }
17};
For the inheritance relationship, the idea is to leverage Rust’s enum
structure
in which, all the leaves of the inheritance tree are an entry of the enum:
1pub enum Inheritance {
2 Derived(Derived),
3 OtherDerived(OtherDerived),
4}
Secondly, all these objects inherit and share the get_name()
virtual function. To provide
this shared property in Rust, we can leverage a Rust trait
that would make
get_name
available for the structures that implement this trait:
1pub trait AsBase {
2 fn get_name(&self) -> String;
3}
4
5impl AsBase for Derived {
6 fn get_name(&self) -> String {
7 ...
8 }
9}
10
11impl AsBase for OtherDerived {
12 fn get_name(&self) -> String {
13 ...
14 }
15}
One can also simplify the definition of the trait such as the derived objects
only have to provide the FFI
reference to the base class:
1pub trait AsBase {
2- fn get_name(&self) -> String;
3+ fn as_base(&self) -> ffi::BaseImpl;
4+
5+ fn get_name(&self) -> String {
6+ self.as_base().get_name().to_string()
7+ }
8}
9
10impl AsBase for Derived {
11- fn get_name(&self) -> String {
12+ fn as_base(&self) -> ffi::BaseImpl {
13 ...
14 }
15}
16
17impl AsBase for OtherDerived {
18 fn get_name(&self) -> String {
19 ...
20 }
21}
LIEF’s Rust bindings highly rely on these patterns to expose classes with polymorphism and inheritance properties.
In C++, we don’t have the concept of a lifetime for an object. For instance, it’s perfectly fine to write this code:
1int main() {
2 LIEF::PE::Binary* pe = nullptr;
3 {
4 std::unique_ptr<LIEF::PE::Binary> pe_unique = LIEF::PE::Parser::parse("...");
5 pe = pe_unique.get();
6 }
7 printf("%s\n", pe->get_section(".text").name()); // Use-after-free
8 return 0;
9}
Nevertheless, the pe
pointer used in printf
is no longer valid because of the
scope of the std::unique_ptr
.
In Python, Nanobind and Pybind11 provide helpers to define the lifetime of an object according to its parent or its scope:
1nb::class<LIEF::PE::Binary>(m, "Binary")
2 .def_prop_ro("sections",
3 nb::overload_cast<>(&Binary::sections),
4 nb::keep_alive<0, 1>())
With nb::keep_alive
, we indicate that the lifetime of the PE section iterator
must be at least as long as the lifetime of the PE Binary instance.
In Rust, we could express this lifetime with something like:
1pub struct Iterator<'a> {
2 pub it: ffi::Impl,
3}
4
5impl<'a> Iterator<'a> {
6 pub fn new(it: ffi::Impl) -> Self {
7 Self {
8 it,
9 }
10 }
11}
12
13
14impl Binary {
15 pub fn get_iterator(&'a self) {
16 Iterator::new(self.get_ffi_impl())
17 }
18}
But this code is not correct since the lifetime <'a>
of the Iterator
structure is not bound to an attribute in the structure.
For technical-ffi reasons, we can’t bind this lifetime to ffi::Impl
.
One solution consists of using PhantomData to provide the lifetime semantic:
1pub struct Iterator<'a> {
2 pub it: ffi::Impl,
3 _owner: PhantomData<&'a ffi::PE_Binary>,
4}
LIEF is developed in what we could say, an “unsafe” language (i.e. C++). On the other hand, Rust provides strong guarantees about memory, concurrency, …
Even though LIEF’s core can’t provide the safety guarantees that Rust is giving, I tried to provide some guarantees about the bindings.
65% of the functions exposed by the Rust binding are covered by the test suite and you can access the coverage report here: https://lief-rs.s3.fr-par.scw.cloud/coverage/index.html (nightly generated).
Coverage
By covered I mean: "the function that bridges from C++ to Rust is executed in the test suite".
Regarding memory safety, Rust allows to compile packages with ASAN thanks to compiler options:
export RUSTFLAGS="-Z sanitizer=address -Clink-args=-fsanitize=address"
export TARGET_CXXFLAGS="-fsanitize=address -fno-omit-frame-pointer -O1"
...
Thus, we also leverage this option to compile both: LIEF core and the Rust binding with ASAN.
Given the fact that 65% of the functions and 70% of the lines are test-covered, running these tests with ASAN gives us some confidence about the fact that the bindings do not introduce leaks or memory issues.
I don’t pretend that the code is free of bugs but at least these mechanisms are in place in the development cycle of the project.
The bindings rely on autocxx to automatically
generate rust FFI code from existing C++ include file. Autocxx is powerful but it
can fail to process complex headers like LIEF/ELF/Binary.hpp
. Thus, I had to create
some kind of wrapper over the existing LIEF/*.hpp
header files such as autocxx
can process them. These wrappers are available in the directory api/rust/include/
The time to generate the Rust FFI code for the different C++ headers is significant: about ~50s with the current bindings. This generation time can be problematic for the end user especially if LIEF is indirectly imported from other dependencies. On the other hand, for fixed versions of LIEF, cxxgen and, autocxx, the code generated by cxxgen and autocxx is always the same. Thus, we can pregenerate and precompile these files to save time during the pure-rust compilation step.
All the different steps mentioned in the previous parts: pre-compilation, ASAN, and code coverage are CI-compiled and fully Dockerized.
It might also be worth mentioning that the pre-compiled FFI artifacts are also compiled and cross-compiled with Docker. Yes, cross-compiled.
Digression
Feel free to skip this part which is not strictly related to LIEF & Rust.
LIEF uses Github Actions for the CI and from my experience, macOS and Windows runners are less available than Linux runners (i.e. you wait more for these runners). In addition, if you use these runners for a private repository (which is not the case for LIEF), you have a pool of 2000 minutes for the CI of the private repo. Depending on the runner you are using, these minutes are counted with a multiplier1:
Operating system | Minute multiplier |
---|---|
Linux | 1 |
Windows | 2 |
OSX | 10 |
1 minute spent on a macOS runner is equivalent to 10 minutes spent on a Linux runner. Hence, if your private project is exclusively using the macOS runner, you don’t have 2000 minutes (~33h) but 200 minutes (~3h).
And then, after this pool of 2000 minutes, 1 minute on a macOS 6 vCPU is priced
at 0.16$
while the same minute on a Linux 8 vCPU is priced at 0.032$
.
Given those facts, cross-compiling for macOS and Windows can be interesting. LLVM provides all the facilities to perform this cross-compilation2 and since we are only generating static libraries, we don’t even need the libraries for these platforms.
So yes, LIEF core and the Rust FFI library are cross-compiled for Windows(MT/MD CRT) and OSX(aarch64, x86_64) with a Docker container running on Linux :)
The Windows and OSX runners are only used for testing that the cross-compilation worked well
(i.e. ld64
can link.exe
can link the cross-compiled libraries) and that the test suite
is also working.
Long story short, we save resources and CI minutes by cross-compiling for Windows and OSX in a Docker running on a Linux runner. As a side effect, we also get fully reproducible builds. The whole pipeline (LIEF core compilation, ASAN, coverage, S3 upload) takes less than 15 minutes (with cache optimizations).
LIEF Rust bindings might not be suitable for all the projects. Especially,
if you are looking for a pure-safety-rust library or a #![no_std]
context,
please consider using these alternatives which are the standards libraries
in Rust:
Thank you to Erynian for the initial introduction of autocxx, back in the days I was working at Quarkslab 😉
https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers ↩︎
Including the generation of an ad-hoc signature for the Apple Silicon binaries (c.f ld/MachO/SyntheticSections.cpp) ↩︎