Tl;DR
LIEF v0.8.3 is out. The main changelog is available here and packages can be downloaded on the official website.We attach a great importance to the automation of some development tasks like testing, distributing, packaging, etc. Here is a summary of these processes:
Each commits is tested on
The test suite includes:
If tests succeeds packages are automatically uploaded on the https://github.com/lief-project/packages repository.
For tagged version, packages are uploaded on the Github release page: https://github.com/lief-project/LIEF/releases.
To facilitate the compilation and the use of LIEF, we created the Dockerlief
repo which includes various Dockerfiles as well as
the dockerlief
utility. dockerlief
is basically a wrapper on docker build .
Among Dockerfiles, we provide a Dockerfile
to cross compile LIEF for Android (ARM
, AARCH64
, x86
, x86-64
)
To cross compile LIEF for Android ARM, one can run:
1$ dockerlief build --api-level 21 --arm lief-android
2
3[INFO] - Location of the Dockerfiles: ~/dockerfiles
4[INFO] - Building Dockerfile: 'lief-android'
5[INFO] - Target architecture: armeabi-v7a
6[INFO] - Target API Level: 21
The SDK package LIEF-0.8.3-Android_API21_armeabi-v7a.tar.gz
is automatically pulled from the Docker to the current directory.
Fuzzing our own library is a good way to detect bugs, memory leak, unsanitized inputs …
Thus, we integrated LibFuzzer in the project. Fuzzing the LIEF ELF, PE, Mach-O parser is as simple as:
1#include <LIEF/LIEF.hpp>
2#include <vector>
3#include <memory>
4
5extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
6 std::vector<uint8_t> raw = {data, data + size};
7 try {
8 std::unique_ptr<LIEF::Binary> b{LIEF::Parser::parse(raw)};
9 } catch (const LIEF::exception& e) {
10 std::cout << e.what() << std::endl;
11 }
12 return 0;
13}
To launch the fuzzer, one can run the following commands:
1$ make fuzz-elf # Launch ELF Fuzzer
2$ make fuzz-pe # Launch PE Fuzzer
3$ make fuzz-macho # Launch MachO Fuzzer
4$ make fuzz # Launch ELF, PE and MachO Fuzzer
In the tutorial #03 we demonstrated how to swap dynamic symbols between a binary and a library. In this part, we will see how we can rename these symbols.
Changing symbol names is not a trivial modification, since modifying the string table of the PT_DYNAMIC
segment has side effects:
DYNAMIC
part of the ELF format.The previous version of LIEF already implements the rebuilding of the hash table but not the extending of the DYNAMIC
part.
With the v0.8.3
we can extend the DYNAMIC
part. Therefore:
.dynamic
sectionDT_RUNPATH
and DT_RPATH
without length restrictionWe will rename all imported functions of gpg
that are imported from libgcrypt.so.20
into
a_very_long_name_of_function_XX
and all exported functions of libgcrypt.so.20
into the same name (XX is the symbol index). 1
1import lief
2
3# Load targets
4gpg = lief.parse("/usr/bin/gpg")
5libgcrypt = lief.parse("/usr/lib/libgcrypt.so.20")
6
7# Change names
8for idx, lsym in enumerate(filter(lambda e : e.exported, libgcrypt.dynamic_symbols)):
9 new_name = 'a_very_long_name_of_function_{:d}'.format(idx)
10 print("New name for '{}': {}".format(lsym.name, new_name))
11 for bsym in filter(lambda e : e.name == lsym.name, gpg.dynamic_symbols):
12 bsym.name = new_name
13 lsym.name = new_name
14
15# Write back
16binary.write(gpg.name)
17libgcrypt.write(libgcrypt.name)
By using readelf
we can check that function names have been modified:
1$ readelf -s ./gpg|grep "a_very_long_name"
2
3 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2)
4 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2)
5 11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2)
6 13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2)
7 ...
8
9$ readelf -s ./libgcrypt.so.20|grep "a_very_long_name"
10
11 88: 000000000000d050 6 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6
12 89: 000000000000dcd0 69 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6
13 90: 000000000000d310 34 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6
14 91: 000000000000de70 81 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6
15 ...
Now if we run the new gpg
binary, we get the following error:
1$ ./gpg --output bar.txt --symmetric ./foo.txt
2relocation error: ./gpg: symbol a_very_long_name_of_function_8, version GCRYPT_1.6 not defined in file libgcrypt.so.20 with link time reference
Because the Linux loader tries to resolve the function a_very_long_name_of_function_8
against /usr/lib/libgcrypt.so.20
and that library doesn’t include the updated names we get the error.
One way to fix this error is to set the environment variable LD_LIBRARY_PATH
to the current directory:
1$ LD_LIBRARY_PATH=. ./gpg --output bar.txt --symmetric ./foo.txt
2$ xxd ./bar.txt|head -n1
3
400000000: 8c0d 0407 0302 c5af 9fba cab1 9545 ebd2 .............E..
5
6$ LD_LIBRARY_PATH=. ./gpg --output foo_decrypted.txt --decrypt ./bar.txt
7$ xxd ./foo_decrypted.txt|head -n1
8
900000000: 4865 6c6c 6f20 576f 726c 640a Hello World.
Another way to fix it is to add a new entry in .dynamic
section.
As mentioned at the beginning, we can now add new entries in the .dynamic
so let’s add
a DT_RUNPATH
entry with the $ORIGIN
value so that the Linux loader resolves the modified libgcrypt.so.20
instead of
the system one:
1...
2# Add a DT_RUNPATH entry
3gpg += lief.ELF.DynamicEntryRunPath("$ORIGIN")
4
5# Write back
6binary.write(gpg.name)
7libgcrypt.write(libgcrypt.name)
And we don’t need the LD_LIBRARY_PATH
anymore:
1$ readelf -d ./gpg|grep RUNPATH
2
30x000000000000001d (RUNPATH) Library runpath: [$ORIGIN]
4
5$ ./gpg --decrypt ./bar.txt
6
7gpg: AES encrypted data
8gpg: encrypted with 1 passphrase
9Hello World
While IDA v7.0 has been released recently, among the changelog one can notice two changes:
These changes are partially true. Let’s see what go wrong in IDA with the following snippet:
1id = lief.parse("/usr/bin/id")
2dynsym = id.get_section(".dynsym")
3dynsym.entry_size = dynsym.size // 2
4id.write("id_test")
This snippet defines the size of one symbol as the entire size of .dynsym
section divided by 2.
The normal size of ELF symbols would be:
1>>> print(int(lief.ELF.ELF32.SIZES.SYM)) # For 32-bits
216
3>>> print(int(lief.ELF.ELF64.SIZES.SYM)) # For 64-bits
424
In the case of the 64-bits id
binary, we set this size to 924.
When opening id_test
in IDA and forcing to use Segment for parsing and not Sections we get the following imports :
Note that id_test
is still executable:
1$ id_test
2uid=1000(romain) gid=1000(romain) ...
By using readelf
we can still retrieve the symbols and we have an error indicating that symbol size is corrupted.
1$ readelf -s id_test
2readelf: Error: Section 5 has invalid sh_entsize of 000000000000039c
3readelf: Error: (Using the expected size of 24 for the rest of this dump)
4
5Symbol table '.dynsym' contains 77 entries:
6 Num: Value Size Type Bind Vis Ndx Name
7 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
8 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND endgrent@GLIBC_2.2.5 (2)
9 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __uflow@GLIBC_2.2.5 (2)
10 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv@GLIBC_2.2.5 (2)
11 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@GLIBC_2.2.5 (2)
12 5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.2.5 (2)
13 ...
In LIEF the (dynamic) symbol table address is computed through the DT_SYMTAB
from the PT_DYNAMIC
segment.
To compute the number of dynamic symbols LIEF uses three heuristics:
Malwares start to use this kind of corruption as we will see in the next part.
Rootnik is a malware targeting Android devices. It has been analyzed by Fortinet security researcher.
A full analysis of the malware is available on the Fortinet blog.
This part is focused on the ELF format analysis of one component: libshell
.
Actually there are two libraries libshella_2.10.3.1.so
and libshellx_2.10.3.1.so
. As they have the same purpose, we will use the x86 version.
First if we look at the ELF sections of libshellx_2.10.3.1.so
we can notice that the address, offset and size of some sections like .text
, .init_array
, .dynstr
, .dynsym
are set to 0.
This kind of modification is used to disturb tools that rely on sections to parse some ELF structures (like objdump, readelf, IDA …)
1$ readelf -S ./libshellx-2.10.3.1.so
2There are 21 section headers, starting at offset 0x2431c:
3
4Section Headers:
5 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
6 [ 0] NULL 00000000 000000 000000 00 0 0 0
7 [ 1] .dynsym DYNSYM 00000114 000114 000300 10 A 2 1 4
8 [ 2] .dynstr STRTAB 00000414 000414 0001e2 00 A 0 0 1
9 [ 3] .hash HASH 00000000 000000 000000 04 A 1 0 4
10 [ 4] .rel.dyn REL 00000000 000000 000000 08 A 1 0 4
11 [ 5] .rel.plt REL 00000000 000000 000000 08 AI 1 6 4
12 [ 6] .plt PROGBITS 00000000 000000 000000 04 AX 0 0 16
13 [ 7] .text PROGBITS 00000000 000000 000000 00 AX 0 0 16
14 [ 8] .code PROGBITS 00000000 000000 000000 00 AX 0 0 16
15 [ 9] .eh_frame PROGBITS 00000000 000000 000000 00 A 0 0 4
16 [10] .eh_frame_hdr PROGBITS 00000000 000000 000000 00 A 0 0 4
17 [11] .fini_array FINI_ARRAY 00000000 000000 000000 00 WA 0 0 4
18 [12] .init_array INIT_ARRAY 00000000 000000 000000 00 WA 0 0 4
19 [13] .dynamic DYNAMIC 0000ce50 00be50 0000f8 08 WA 2 0 4
20 [14] .got PROGBITS 00000000 000000 000000 00 WA 0 0 4
21 [15] .got.plt PROGBITS 00000000 000000 000000 00 WA 0 0 4
22 [16] .data PROGBITS 00000000 000000 000000 00 WA 0 0 16
23 [17] .bss NOBITS 0000d398 00c395 000000 00 WA 0 0 4
24 [18] .comment PROGBITS 00000000 00c395 000045 01 MS 0 0 1
25 [19] .note.gnu.gold-ve NOTE 00000000 00c3dc 00001c 00 0 0 4
26 [20] .shstrtab STRTAB 00000000 024268 0000b1 00 0 0 1
27Key to Flags:
28 W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
29 L (link order), O (extra OS processing required), G (group), T (TLS),
30 C (compressed), x (unknown), o (OS specific), E (exclude),
31 p (processor specific)
If we open the given library in IDA we have no exports, no imports and no sections:
Based on the segments and dynamic entries we can recover most of these information:
.init_array
address and size are available through the DT_INIT_ARRAY
and DT_INIT_ARRAYSZ
entries.dynstr
address and size are available through the DT_STRTAB
and DT_STRSZ
.dynsym
address is available through the DT_SYMTAB
The script recover_shellx.py recovers the missing values, patch sections and rebuild a fixed library.
Now if we open the new libshellx-2.10.3.1_FIXED.so
we have access to imports / exports and some sections. The .init_array
section contains 2 functions:
tencent652524168491435794009
sub_60C0
The tencent652524168491435794009
function basically do a stack alignment and the sub_60C0
is one of the decryption routines 2. This function is obfuscated with graph flattening
and looks like to O-LLVM graph flattening passe 3:
Fortunately there are few “relevant blocks” and there are not obfuscated.
The function sub_60C0
basically iterates over the program headers to find the encrypted one and decrypt it using a custom algorithm (based on shift, xor, etc).
The CVE-2017-1000249 is a stack based buffer overflow in the file
utility. It affects the versions
5.29
, 5.30
and 5.31
.
Basically the overflow occurs in the size of the note description.
Using LIEF we can trigger the overflow as follow:
1target = lief.parse("/usr/bin/id")
2note_build_id = target[lief.ELF.NOTE_TYPES.BUILD_ID]
3note_build_id.description = [0x41] * 30
4target.write("id_overflow")
1$ file --version
2file-5.29
3magic file from /usr/share/file/misc/magic
4
5$ id_overflow
6uid=1000(romain) gid=1000(romain) ...
7
8$ file id_overflow
9*** buffer overflow detected ***: file terminated
10./id_overflow: [1] 3418 abort (core dumped) file ./id_overflow
Here is the commit that introduced the bug: 9611f3
The Load Config directory is now parsed into the LoadConfiguration object. This structure evolves with the Windows versions and LIEF has been designed to support this evolution. You can take a look at LoadConfigurationV0, LoadConfigurationV6.
One can find the different versions of this structure in the following directories:
include/LIEF/PE/LoadConfigurations
src/PE/LoadConfigurations
The current version of LIEF is able to parse the structure up to Windows 10 build 15002 with the hotpatch table offset.
Here are some examples of the LoadConfiguration
API:
1>>> target = lief.parse("PE64_x86-64_binary_WinApp.exe")
2>>> target.has_configuration
3True
4>>> config = target.load_configuration
5>>> config.version
6WIN_VERSION.WIN10_0_15002
7>>> hex(config.guard_rf_failure_routine)
8'0x140001040'
LIEF also provides an API to serialize any ELF or PE objects into JSON 4
For examples to transform LoadConfiguration
object into Json:
1>>> from lief import to_json
2>>> to_json(config)
3'{"characteristics":248,"code_integrity":{"catalog":0,"catalog_offset":0 ... }}' # Not fully printed
One can also serialize the whole Binary object:
1>>> to_json(target)
2'{"data_directories":[{"RVA":0,"size":0,"type":"EXPORT_TABLE"},{"RVA":62584,"section" ...}}' # # Not fully printed
For Mach-O binary, dynamic executables embed the LC_DYLD_INFO
command which is associated with the dyld_info_command
structure.
The structure is basically a list of offsets and sizes pointing to other data structures.
From /usr/lib/mach-o/loader.h
the structure looks like this:
1struct dyld_info_command {
2 uint32_t cmd;
3 uint32_t cmdsize;
4 uint32_t rebase_off;
5 uint32_t rebase_size;
6 uint32_t bind_off;
7 uint32_t bind_size;
8 uint32_t weak_bind_off;
9 uint32_t weak_bind_size;
10 uint32_t lazy_bind_off;
11 uint32_t lazy_bind_size;
12 uint32_t export_off;
13 uint32_t export_size;
14};
The dyld
loader uses this structure to:
Whereas in the ELF and PE format relocations are basically a table, Mach-O format uses byte streams to rebase the image and to bind symbols with addresses. For exports it uses a trie as subjacent structure.
In the new version of LIEF, the Mach-O parser is able to handle these underlying structures to provide an user-friendly API:
The export trie is represented by the ExportInfo object which is usually tied to a Symbol. The binding byte stream is represented trough the BindingInfo object.
For the rebase byte stream, the parser create virtual relocations to model the rebasing process. These virtual relocations are represented by the RelocationDyld
object and among other attributes it contains address
, size
and type
5.
Here is an example using the Python API:
1>>> id = lief.parse("/usr/bin/id")
2>>> print(id.relocations[0])
3100002000 POINTER 64 DYLDINFO __DATA.__eh_frame dyld_stub_binder
4>>> print(id.has_dyld_info)
5True
6>>> dyldinfo = id.dyld_info
7>>> print(dyldinfo.bindings[0])
8Class: STANDARD
9Type: POINTER
10Address: 0x100002010
11Symbol: ___stderrp
12Segment: __DATA
13Library: /usr/lib/libSystem.B.dylib
14>>> print(dyldinfo.exports[0])
15Node Offset: 18
16Flags: 0
17Address: 0
18Symbol: __mh_execute_header
In this release we did a large improvement of the ELF builder. Mach-O and PE parts gain new objects and new functions. LIEF is now available on PyPI and can be added in the requirements of Python projects whatever the Python version and the target platform.
Since the v0.7.0
LIEF has been presented at RMLL
and the MISP project uses it for its PyMISP objects.
Some may complain about the C API. They are right! Until the v1.0.0
we will provide a minimal C API. Once C++ API is stable we plan to provide full APIs for Python, C, Java, OCaml 6, etc.
Next version should be focused on the Mach-O builder especially for adding sections and segments. We also plan to support PE .NET
headers and fix some performances issues.
For questions you can join the Gitter channel
All Python examples are done with the 3.5 version ↩︎
See the blog post about O-LLVM analysis: https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-protected-program.html ↩︎
As mentioned in the Fortinet blog post, the library is packed. ↩︎
This feature is not yet available for MachO objects ↩︎
Due to the inheritance relationship and abstraction these attributes are located in the MachO::Relocation and LIEF::Relocation objects. ↩︎