When reverse engineering binaries, we could want, at some point, to share the reverse-engineered information with others. The DWARF format, originally designed to hold debug information associated with the original source code, is also well-suited for storing reverse-engineered informations such as structure, function names.
This blog post introduces a new API in LIEF extended to create DWARF files. It also introduces two plugins for Ghidra and BinaryNinja to export binary analysis into DWARF.
LIEF extended now provides a comprehensive API to create DWARF files.
This API is available in Python, Rust, and C++ and it looks like this:
1import lief
2
3elf = lief.ELF.parse("./libd5A7BCF0524B8.so")
4
5editor: lief.dwarf.Editor = lief.dwarf.Editor.from_binary(elf)
6unit: lief.dwarf.editor.CompilationUnit = editor.create_compilation_unit()
7unit.set_producer("Generated by LIEF (LLVM backend)")
8
9func: lief.dwarf.editor.Function = unit.create_function("vm_set_register")
10func.set_address(0x1400023)
11
12editor.write("libd5A7BCF0524B8.dwarf")
Under the hood, LIEF uses the LLVM’s DWARF backend to create and generate the final DWARF. In contrast to LLVM’s low-level API, LIEF provides an abstraction that simplifies the implementation details of the DWARF format.
For instance, if we want to create a DWARF for a function
that contains a stack variable at the offset (on the stack) 8
,
we can use the following API:
1func: lief.dwarf.editor.Function = unit.create_function("vm_set_register")
2
3var: lief.dwarf.editor.Variable = func.create_stack_variable("my_stack_variable")
4var.set_stack_offset(8)
This code generates the following DWARF:
10x0000000c: DW_TAG_compile_unit
2 DW_AT_producer ("Generated by LIEF (LLVM backend)")
3
40x00000011: DW_TAG_subprogram
5 DW_AT_name ("vm_set_register")
6 DW_AT_entry_pc (0x0000000001400023)
7
80x0000001e: DW_TAG_variable
9 DW_AT_name ("my_stack_variable")
10 DW_AT_location (DW_OP_fbreg -8)
Defining the DW_AT_location
for a stack variable is not as simple as it sounds.
It requires defining some kind of DWARF expression and if you are curious about
the actual implementation, you can check this Github Gist.
Summary
LIEF exposes a high-level API to create DWARF based on the LLVM's low-level API
Reverse engineering tools typically use their own format to store information about
analyzed binaries such as *.idb
and *.bndb
. Most of these tools are not compatible
with each other, except Binary Ninja which has a support for loading IDB (Migrating from IDA).
For Ghidra, importing IDA database is a non-goal (c.f. issue #2921)
One alternative is to export binary information using BinExport or quokka, but many tools lack support for importing the exported data.
In contrast, Binary Ninja, Ghidra, and IDA all have built-in support for loading DWARF files and external DWARF files. The DWARF format is primarily designed to hold information about the original source code, and the purpose of reverse engineering is to recover the semantic of the source code information from the binary.
Therefore, we could use the DWARF as a reverse-engineering shared format to export types, functions, and variables from reverse-engineered binaries.
It’s worth noting that DWARF is compatible with PE binaries, even though this is not the default format for storing debug information on Windows.
PE / DWARF
If you compile a Windows executable with clang[-cl]
and with the flags -g -gdwarf-5
,
the final PE will contains DWARF information along with an external .pdb
.
Currently, Binary Ninja is the only tool with a built-in plugin that can generate
a DWARF file from a BinaryView
representation. However, it lacks the ability to
export stack-based variables, which can be crucial information.
The next section introduces two plugins for Ghidra and BinaryNinja to generate DWARF from these tools.
To provide some background on this feature, I initially developed the BinaryNinja’s DWARF exporter plugin for my own needs before Vector35 team released an official plugin in BinaryNinja 3.5. I use this plugin in my reverse engineering workflow to symbolize QBDI traces from DWARF information:
For instance, I used this process to reverse engineer the DroidGuard VM
a few years ago.
I’ll take this blog post as an opportunity to share the DWARF associated with
my reverse engineering of the VM libd5A7BCF0524B8.so
:
As mentioned earlier, this functionality is integrated into BinaryNinja since version 3.5, so I’ll focus more on the Ghidra plugin. For those interested in more details about the BinaryNinja plugin, you can visit this page: LIEF - BinaryNinja
The Ghidra plugin allows us to export Ghidra’s Program information into a DWARF file.
This can be done from the Project Manager interface by selecting the DWARF
format
in the export section:
You can also use this plugin from the CodeBrowser
tool, by left-clicking on
the LIEF menu and selecting Export as DWARF
:
The plugin is primarily written in Java (using the JNI) and you can also generate a DWARF file from a headless Java script:
1import lief.ghidra.core.dwarf.export.Manager;
2import lief.ghidra.core.NativeBridge;
3
4public class LiefDwarfExportScript extends GhidraScript {
5 @Override
6 protected void run() throws Exception {
7 NativeBridge.init();
8 Manager manager = new Manager(currentProgram);
9 File output = new File("/home/romain/output.dwarf");
10 manager.export(output);
11 }
12}
IDA Support
I do not plan to support IDA for this functionality. However, if there is strong demand for it, feel free to reach out to me. You can also create your own IDA script using the Python or C++ API.
This DWARF export functionality is still in early development, so I cannot guarantee it is free of bugs. Additionally, the current version does not export comments, but I plan to support this feature in the future.
The source code for the plugins are here:
Thank you for using LIEF.
Romain.