Tl;DR
LIEF is a library to parse and manipulate ELF, PE and Mach-O formats. Source code is available on GitHub and use cases are here.When dealing with executable files, the first layer of information is the format in which the code is wrapped. We can see an executable file format as an envelope. It contains information so that the postman (i.e. Operating System) can handle and deliver (i.e. execute) it. The message wrapped by this envelope would be the machine code.
There are mainly three mainstream formats, one per OS:
Other executable file formats, such as COFF
, exist but they are less relevant.
Usually each format has a header which describes at least the target architecture, the program’s entry point and the type of the wrapped object (executable, library…)
Then we have blocks of data that will be mapped by the OS’s loader. These blocks of data could hold machine code (.text
), read-only data (.rodata
) or other OS specific information.
For PE there is only one kind of such block: Section. For ELF and Mach-O formats, a section has a different meaning. In these formats, sections are used by the linker at the compilation step, whereas segments (second type of block) are used by the OS’s loader at execution step. Thus sections are not mandatory for ELF and Mach-O formats and can be removed without affecting the execution.
It turns out that many projects need to parse executable file formats but don’t use a standard library and re-implement their own parser (and the wheel). Moreover, these parsers are usually bound to one language.
On Unix system one can find the objdump
and objcopy
utilities but they are limited to Unix and the API is not user-friendly.
The purpose of LIEF is to fill this void:
The following snippets show how to obtain information about an executable using different API of LIEF:
1import lief
2# ELF
3binary = lief.parse("/usr/bin/ls")
4print(binary)
5
6# PE
7binary = lief.parse("C:\\Windows\\explorer.exe")
8print(binary)
9
10# Mach-O
11binary = lief.parse("/usr/bin/ls")
12print(binary)
With the C++
API:
1#include <LIEF/LIEF.hpp>
2int main(int argc, const char** argv) {
3 LIEF::ELF::Binary* elf = LIEF::ELF::Parser::parse("/usr/bin/ls");
4 LIEF::PE::Binary* pe = LIEF::PE::Parser::parse("C:\\Windows\\explorer.exe");
5 LIEF::MachO::Binary* macho = LIEF::MachO::Parser::parse("/usr/bin/ls");
6
7 std::cout << *elf << std::endl;
8 std::cout << *pe << std::endl;
9 std::cout << *macho << std::endl;
10
11 delete elf;
12 delete pe;
13 delete macho;
14}
And finally with the C
API:
1#include <LIEF/LIEF.h>
2int main(int argc, const char** argv) {
3
4 Elf_Binary_t* elf_binary = elf_parse("/usr/bin/ls");
5 Pe_Binary_t* pe_binary = pe_parse("C:\\Windows\\explorer.exe");
6 Macho_Binary_t** macho_binaries = macho_parse("/usr/bin/ls");
7
8 Pe_Section_t** pe_sections = pe_binary->sections;
9 Elf_Section_t** elf_sections = elf_binary->sections;
10 Macho_Section_t** macho_sections = macho_binaries[0]->sections;
11
12 for (size_t i = 0; pe_sections[i] != NULL; ++i) {
13 printf("%s\n", pe_sections[i]->name)
14 }
15
16 for (size_t i = 0; elf_sections[i] != NULL; ++i) {
17 printf("%s\n", elf_sections[i]->name)
18 }
19
20 for (size_t i = 0; macho_sections[i] != NULL; ++i) {
21 printf("%s\n", macho_sections[i]->name)
22 }
23
24 elf_binary_destroy(elf_binary);
25 pe_binary_destroy(pe_binary);
26 macho_binaries_destroy(macho_binaries);
27}
LIEF supports FAT-MachO and one can iterate over binaries as follows:
1import lief
2binaries = lief.MachO.parse("/usr/lib/libc++abi.dylib")
3for binary in binaries:
4 print(binary)
Note
The above script uses thelief.MachO.parse
function instead of the lief.parse
function because lief.parse
returns a single lief.MachO.binary
object
whereas lief.MachO.parse
returns a list of lief.MachO.binary
(according to the FAT-MachO format).Along with standard format components like headers, sections, import table, load commands, symbols, etc. LIEF is also able to parse PE Authenticode:
1import lief
2driver = lief.parse("driver.sys")
3
4for crt in driver.signature.certificates:
5 print(crt)
1Version: 3
2Serial Number: 61:07:02:dc:00:00:00:00:00:0b
3Signature Algorithm: SHA1_WITH_RSA_ENCRYPTION
4Valid from: 2005-9-15 21:55:41
5Valid to: 2016-3-15 22:5:41
6Issuer: DC=com, DC=microsoft, CN=Microsoft Root Certificate Authority
7Subject: C=US, ST=Washington, L=Redmond, O=Microsoft Corporation, CN=Microsoft Windows Verification PCA
8...
Full API documentation is available here
In the LIEF
architecture, each format implements at least the following classes:
Binary
classTo factor common characteristics in formats we have an inheritance relationship between these characteristics.
For symbols it gives the following diagram:
It enables to write cross-format utility like nm
. nm
is a Unix utility to list symbols in an executable.
The source code is available here: binutils
With the given inheritance relationship one can write this utility for the three formats in a single script:
1import lief
2import sys
3
4def nm(binary):
5 for symbol in binary.symbols:
6 print(symbol)
7
8 return 0
9
10if __name__ == "__main__":
11 r = nm(sys.argv[1])
12 sys.exit(r)
As LIEF is still a young project we hope to have feedback, ideas, suggestions and pull requests.
The source code is available here: https://github.com/lief-project (under Apache 2.0 license) and the associated website: http://lief.quarkslab.com
If you are interested in use cases, you can take a look at these tutorials:
The project will be presented at the Third French Japanese Meeting on Cybersecurity
Thanks to Serge Guelton and Adrien Guinet for their advice about the design and their code review. Thanks to Quarkslab for making this project open-source.