In this blog post, we have a packed PE file. We will analyze it and unpack it with Qiling Framework. Once you understand how encryption works, I will explain more about Qiling. In this article we will use Cutter for analysis and Ghidra for decompile (included in Cutter).
Table Of Contents
1. Analysis Packed Sample 1.1 Detect It Easy Output 1.2 Packer Basics 1.3 Cutter - Ghidra (Disassemble & Decompile) 2. Automate Unpacking with Qiling 2.1 Qiling Framework Structure 2.2 Layle's Emu Analysis 2.3 Unpacking Script
Analysis Packed Sample
Detect It Easy Output
The first thing I would do for analysis is to scan the file into Detect It Easy. On the Detect It Easy screen, it says that it contains the usual protections. But we know that Sample doesn’t. To investigate why, I’ll look at how Detect It Easy does this.
PE: protector: PELock(-)[-]
PE: protector: Unopix(0.94)[-]
PE: linker: Microsoft Linker(14.33**)[EXE32,console]
In fact, it is not difficult to understand how he did it. When we switch to the Signatures tab, we see that all kinds of identifiable things have rules. Here we need to look at PELock. This signature name is PELock.2 and code is below:
function detect(bShowType,bShowVersion,bShowOptions)
{
if(PE.getNumberOfImports()==1)
{
if((PE.isLibraryFunctionPresent("KERNEL32.DLL", "LoadLibraryA"))!=-1&&
(PE.isLibraryFunctionPresent("KERNEL32.DLL", "VirtualAlloc"))!=-1)
{
if(PE.getNumberOfResources()>=1)
{
if(PE.getNumberOfSections()>=4)
{
if((PE.getSectionName(0)==PE.getSectionName(1))&&(PE.getSectionName(0)==PE.getSectionName(3)))
{
bDetected=1;
}
}
}
}
}
return result(bShowType,bShowVersion,bShowOptions);
}
The first thing it does is that there is “only” Kernel32.dll on the Import Table. Secondly, does it have LoadLibraryA and VirtualAlloc? Thirdly, it checks the number of Resources and Sections and checks if some of the section names are equal to each other.
The Unopix protection (first time I’ve seen it) works like this:
function detect(bShowType,bShowVersion,bShowOptions)
{
var nLastSection=PE.nLastSection;
if(nLastSection>=2)
{
var nVirtualSize=PE.section[nLastSection].VirtualSize;
if(nVirtualSize==0x1000)
{
var nRawSize=PE.section[nLastSection].FileSize;
if(nVirtualSize==nRawSize)
{
var nFlags=PE.section[nLastSection].Characteristics;
if((nFlags==0xe0000040)&&(PE.section[nLastSection].Name!=".!ep"))
{
sVersion="0.94";
bDetected=1;
}
}
}
}
return result(bShowType,bShowVersion,bShowOptions);
}
I will not explain what he did because it is very clear. Summary: PELock and Unopix(?) show undesirable results in Detect It Easy because they display a similar representation. What I mainly want to do here is to compare EntryPoint and Sections in PE Info by pressing the PE button. I will find the place in Section that matches the address we see in AddressOfEntryPoint (00022000).
If we look at the virtual address of the section named .shell, we will see that it matches the entrypoint.
Packer Basics
There are many sources written about this. But it would be wrong to start the analysis without writing about it. Most packers work the same way. To simplify our interpretation, I will first briefly explain how they work. Packer encrypts the .text section containing executable code. It then inserts a new section into the file and changes the Entry Point accordingly. The new section does the decryption of the encrypted section (unpacking). The decrypted codes are then executed in memory.
For example, the data encryption function of the packer we analyzed:
// Encrypt the load segment and transformed import table.
EncryptData(load_seg_base, load_seg_size + new_imp_table_size);
//-----------------------------------------------------------
void EncryptData(BYTE* const base, const DWORD size) {
if (size == 0) {
return;
}
assert(base != NULL);
for (DWORD i = 0; i != size; ++i) {
base[i] += 0xCC;
}
}
My goal in this article is not to dump in an executable way (i.e. I will not fix the IAT table after dumping). After Decryption with Qiling, we will take a look at the sections. For example, an image of a sample partition before it is unpacked:
After unpacking:
Cutter - Ghidra (Disassemble & Decompile)
We already knew that the file was encrypted by the deletion of the partition names. The entry point of the file we have is redirected to the address of the partition named .shell . After opening the Cutter, we already see one function in the functions section and we start to analyze it. This is what we will see on the graph when we open the file on Cutter (in read mode):
I will use Ghidra since it is more difficult to interpret such packing operations in assembly. Once we figure out how it works, we can use x32dbg and Scylla.
We will see while debugging, but there are some things I want to report first. Structures like [ebp + 24] that we see in assembly commands are local variables defined at the beginning of the program. Local variable definition:
;-- (0x0042202e) GetProcAddress:
0x0042202d add byte [ebx + 0x20], cl
0x00422030 .dword 0x205c0002
;-- GetModuleHandleA:
0x00422032 .dword 0x0002205c ; reloc.Kernel32.dll_GetModuleHandleA
;-- LoadLibraryA:
0x00422036 .dword 0x0002206f ; reloc.Kernel32.dll_LoadLibraryA
For example, we see that the packer initially uses the following commands to access some Windows APIs:
; Get the address of `VirtualAlloc` API.
lea esi, [ebp + (dll_name - boot_seg_begin_lbl)]
push esi
call dword ptr [ebp + (second_thunk - boot_seg_begin_lbl)]
lea esi, [ebp + (virtual_alloc_name - boot_seg_begin_lbl)]
push esi
push eax
call dword ptr [ebp + (first_thunk - boot_seg_begin_lbl)]
mov dword ptr [ebp + (virtual_alloc_addr_boot - boot_seg_begin_lbl)], eax
In Disassemble code:
0x004220c1 lea esi, [ebp + 0x9e]
0x004220c2 mov ch, 0x9e ; 158
0x004220c4 add byte [eax], al
0x004220c6 add byte [esi + 0x50], dl
0x004220c9 call dword [ebp + 0x2e] ; 46
0x004220cc mov dword [ebp + 0xab], eax
uint64_t entry0(void)
{
uint8_t uVar1;
int32_t iVar2;
char *pcVar3;
char *pcVar4;
uint64_t uVar5;
// [09] -rwx section size 4096 named .shell
(*_GetModuleHandleA)();
*(code **)0x4220ab = (code *)(*_GetProcAddress)();
*(int32_t *)0x4220af = (**(code **)0x4220ab)();
// WARNING: Call to offcut address within same function
func_0x00422129();
*(int32_t *)0x422125 = *(int32_t *)0x4220af + -0x422129;
uVar5 = (*(code *)0x0)();
uVar1 = in((int16_t)(uVar5 >> 0x20));
pcVar3 = *(char **)0x422008;
pcVar4 = *(char **)0x42200c;
for (iVar2 = *(int32_t *)0x422010; iVar2 != 0; iVar2 = iVar2 + -1) {
*pcVar4 = *pcVar3 + '4';
pcVar3 = pcVar3 + 1;
pcVar4 = pcVar4 + 1;
}
return uVar5 & 0xffffffff00000000 | (uint64_t)((uint32_t)uVar5 & 0xffffff00 | (uint32_t)uVar1);
}
The code is missing, some parts are not interpreted as functions, so the decompiler cannot detect them. Junk code added in the packer also shows the decompiler as broken, for example the function “func_0x00422129”. So let’s try to go through disassembly.
Our goal here is to dump the software from memory after decrypting the sections. So we will examine the decrypt instructions on disassembly.
; DecryptData proc src: dword, dest: dword, count: dword
0x0042212c pushal
0x0042212d mov ecx, dword [ebp + 0x10] ;count
0x00422130 mov esi, dword [ebp + 8] ;src
0x00422133 mov edi, dword [ebp + 0xc] ;dest
0x00422136 jmp 0x42213d
0x00422138 lodsb al, byte [esi]
0x00422139 sub al, 0xcc ; 204
0x0042213b stosb byte es:[edi], al
0x0042213c dec ecx
0x0042213d or ecx, ecx
0x0042213f jne 0x422138
This part does the opposite of the EncryptData() function. The reason why this function is executed several times is that all data is encrypted with the same function.
Ok, our goal here is to dump the partitions from memory immediately after the decrypt procedure. Actually unpacking exactly is very easy with Qiling. But I won’t go into IAT build in this article. I want to talk about an example project that does this. The project is vacation3-emu, which emulates Layle’s vac3 modules.
And here is the function that decrypts the sections:
; DecryptSections
mov edx, 0x22B ; ptr ORIGIN_PE_INFO
add edx, ebp
lea edx, dword ptr ds:[edx + 0x10] ; [edx].section_encry_info
mov eax, dword ptr ds:[edx] ; eax = [edx].sec_rva
jmp 0x9500C0
mov esi, dword ptr ss:[ebp + 0x45B]
add esi, eax
mov edi, esi
mov ecx, dword ptr ds:[edx + 0x4]
push ecx
push edi
push esi
call dword ptr ss:[ebp + 0x457]
add edx, 0x8
mov eax, dword ptr ds:[edx]
or eax, eax
jne 0x9500A5
typedef struct _ORIGIN_PE_INFO {
//! The offset, relative to the shell.
DWORD entry_point;
//! The offset of the original import table, relative to the load segment.
DWORD imp_table_offset;
//! The relative virtual address of the relocation table.
DWORD reloc_table_rva;
//! The image base.
VOID* image_base;
//! The encryption information of sections, up to 0x40 sections and a blank structure.
ENCRY_INFO section_encry_info[MAX_ENCRY_SECTION_COUNT + 1];
} ORIGIN_PE_INFO;
If we analyze it dynamically on x32dbg. We see that after decrypting the data, it jumps to [ebp+0xAF] which it stores locally. Here are the codes that decrypt the sections.
008F20E3 | 6A 00 | push 0x0
008F20E5 | FF95 AB000000 | call dword ptr ss:[ebp+0xAB]
008F20EB | 8985 AF000000 | mov dword ptr ss:[ebp+0xAF],eax
........
Automate Unpacking with Qiling
In this section we will see some of the functions in the Qiling Framework and then we will develop a software that automatically decodes and dump partitions. Since most API implementations of the Qiling Framework on Windows are missing, I will also show how to hook some functions (I may even post a PR for them later).
First, let’s take a look at what exactly the Qiling Framework is.
Qiling Framework Structure
Since the previous blog post in vx.zone went into historical details about qiling, I would like to be more specific. Unicorn is a CPU emulator. It is simply a framework that can only emulate processor instructions. Qiling is a high-level framework that covers that too and can even emulate operating system files. Based on this information, when we look at the structure of the Qiling project on GitHub, we see several features implemented in detail.
For example, most famous file structures are defined in the loader folder:
qiling\qiling\loader
qiling\qiling\loader\macho_parser
qiling\qiling\loader\__init__.py
qiling\qiling\loader\blob.py
qiling\qiling\loader\dos.py
qiling\qiling\loader\elf.py
qiling\qiling\loader\evm.py
qiling\qiling\loader\loader.py
qiling\qiling\loader\macho.py
qiling\qiling\loader\mcu.py
qiling\qiling\loader\pe_uefi.py
qiling\qiling\loader\pe.py
This folder is important. Because we are going to implement some unimplemented windows apis using the pe.py file here. I will give a small spoiler. For example, we will use the Process structure in pe.py to emulate the GetModuleHandleA function.
class Process:
# let linter recognize mixin members
cmdline: bytes
pe_image_address: int
stack_address: int
stack_size: int
dlls: MutableMapping[str, int]
import_address_table: MutableMapping[str, Mapping]
import_symbols: MutableMapping[int, Dict[str, Any]]
export_symbols: MutableMapping[int, Dict[str, Any]]
libcache: Optional[QlPeCache]
def __init__(self, ql: Qiling):
self.ql = ql
# .....................
There is an os folder to emulate other specific features of operating systems. Within the folder, there are similar features between operating systems, as well as separate files containing specific features.
qiling\qiling\os\windows
qiling\qiling\os\windows\dlls
qiling\qiling\os\windows\__init__.py
qiling\qiling\os\windows\api.py
qiling\qiling\os\windows\clipboard.py
qiling\qiling\os\windows\const.py
qiling\qiling\os\windows\fiber.py
qiling\qiling\os\windows\fncc.py
qiling\qiling\os\windows\handle.py
qiling\qiling\os\windows\registry.py
qiling\qiling\os\windows\structs.py
qiling\qiling\os\windows\thread.py
qiling\qiling\os\windows\utils.py
qiling\qiling\os\windows\wdk_const.py
qiling\qiling\os\windows\windows.py
Of course, the processors have an arch folder to make the necessary adjustments before the emulation process. It contains implementations of many versions. For example x86.py:
from functools import cached_property
from unicorn import Uc, UC_ARCH_X86, UC_MODE_16, UC_MODE_32, UC_MODE_64
from capstone import Cs, CS_ARCH_X86, CS_MODE_16, CS_MODE_32, CS_MODE_64
from keystone import Ks, KS_ARCH_X86, KS_MODE_16, KS_MODE_32, KS_MODE_64
from qiling.arch.arch import QlArch
from qiling.arch.msr import QlMsrManager
from qiling.arch.register import QlRegisterManager
from qiling.arch import x86_const
from qiling.const import QL_ARCH, QL_ENDIAN
class QlArchIntel(QlArch):
@property
def endian(self) -> QL_ENDIAN:
return QL_ENDIAN.EL
@cached_property
def msr(self) -> QlMsrManager:
"""Model-Specific Registers.
"""
return QlMsrManager(self.uc)
# .....................
Layle’s Emu Analysis (BONUS) <3
I got permission from layle for this. It’s a great example of using Qiling on Windows. So we will analyze this.
There is not much to look at in the structure of the project. For this, let’s go directly into the emu.py file. When we look at the beginning of the script, I see that two Windows APIs are defined. Layle did not have the implementation of these functions in Qiling when he wrote this script, so he wrote them himself.
def GetProcAddress(ql, address, params):
global procs
name = params["lpProcName"]
dll_name = [key for key, value in ql.loader.dlls.items() if value == params["hModule"]][0]
ql.loader.load_dll(dll_name.encode())
try:
addr = ql.loader.import_address_table[dll_name][name.encode()]
procs[name] = addr
except:
pass
def LoadLibraryExA(ql, address, params):
global modules
name = params["lpLibFileName"]
addr = ql.loader.load_dll(name.encode())
modules[name] = addr
While writing the functions here, the loader structure mentioned in the “Qiling Structure” section was used. For example, what the GetProcAddress function does is to return the address where the functions in the dlls on the Import Table are located.
After setting the classic ql variable, hooking is done using the set_api
function.
ql.set_api("GetProcAddress", GetProcAddress, QL_INTERCEPT.EXIT)
ql.set_api("LoadLibraryExA", LoadLibraryExA, QL_INTERCEPT.EXIT)
QL_INTERCEPT:
POSIX system calls may be hooked to allow the user to modify their parameters, alter the return value or replace their funcionality altogether. System calls may be hooked either by their name or number, and intercepted at one or more stages: - QL_INTERCEPT.CALL : when the specified system call is about to be called; may be used to replace the system call functionality altogether - QL_INTERCEPT.ENTER : before entering the system call; may be used to tamper with the system call parameters values - QL_INTERCEPT.EXIT : after exiting the system call; may be used to tamper with the return value
The other point that attracted my attention on this script is that the function execution process that we do by moving the eip register on the debugger can be done very simply here.
ql.run(begin=0x00402B6C, end=0x00404516) # set up routines
ql.dprint(D_INFO, "Finished setting up routines")
ql.run(begin=0x00404516, end=0x00404522) # decrypt packet with ICE
ql.dprint(D_INFO, "Finished decrypting packet with ICE key from module")
ql.run(begin=0x00404522, end=0x00406416) # decrypt section with ICE
ql.dprint(D_INFO, "Finished decryption data section with ICE key from decrypted packet")
When we run the function after specifying the start and end address of the function, the function is executed in the ql sandbox. For example, Layle uses the function in the file to rebuild the IAT table in the file it emulates.
Unpacking Script
First, let us explain our purpose. We can’t read the chapters because of the encrypted data. For this, we will dump the partitions after they are decrypted. After running the function that decrypts the partitions, it is enough to run the dump_memory_region
function.
Since the starting addresses of the functions in the assembly are loaded into local variables to be executed, we must first analyze this. After initializing on x32dbg I find where the decrypt function is loaded.
mov eax,0x7E
add eax,ebp
push dword ptr ds:[eax+0x4]
push 0x0
call dword ptr ss:[ebp+0xAB]
mov dword ptr ss:[ebp+0xAF],eax
Our target value is [EBP + 0xAF]
Only after running these parts (the addresses are known) I will change the EIP and redirect it to the address I want.
pop ebp
sub ebp, 0x6
lea esi,dword ptr ss:[ebp + 0x3E]
.......................
.......................
jmp testfile-packed.EA213D
lodsb
sub al, 0xCC
stosb
dec ecx
or ecx, ecx
jne testfile-packed.EA2138
popad
leave
ret 0xC
Python code:
from qiling import *
from qiling.const import QL_VERBOSE
def dump_memory_region(ql, address, size):
try:
excuted_mem = ql.mem.read(address, size)
except Exception as err:
print('Unable to read memory region at address: {}. Error: {}'.format(hex(address), err))
return
print("Dumped")
with open("unpacked_"+hex(address)+".bin", "wb") as f:
f.write(excuted_mem)
exeFile = "testFile-packed.exe"
ql = Qiling(["testFile-packed.exe"], "qiling/examples/rootfs/x86_windows")
ql.run(end=0x00422143)
test = ql.arch.regs.read("EBP") + 0xaf # test = EBP + 0xaf
xxxx = ql.mem.read(test, 0x4) # address = [EBP + 0xaf]
print("[EBP + 0xAF]: ", hex(int.from_bytes(xxxx, byteorder='little')))
address = int.from_bytes(xxxx, byteorder='little')
ql.arch.regs.arch_pc = address
ql.arch.regs.eip = address
ql.run()
dump_memory_region(ql, 0x00419000, 0x100)
Output:
utku%> python main.py
[=] Initiate stack address at 0xfffdd000
[=] Loading testFile-packed.exe to 0x400000
[=] PE entry point at 0x422000
[=] TEB is at 0x6000
[=] PEB is at 0x61b0
[=] LDR is at 0x6630
[=] Loading ntdll.dll ...
[=] Done loading ntdll.dll
[=] Loading kernel32.dll ...
[=] Loading kernelbase.dll ...
[=] Done loading kernelbase.dll
[=] Done loading kernel32.dll
[=] Loading ucrtbase.dll ...
[=] Calling ucrtbase.dll DllMain at 0x10298260
[=] GetSystemTimeAsFileTime(lpSystemTimeAsFileTime = 0xffffcfcc)
[x] Error encountered while running ucrtbase.dll DllMain, bailing
[=] Done loading ucrtbase.dll
[=] GetModuleHandleA(lpModuleName = "Kernel32.dll") = 0x6b800000
[=] GetProcAddress(hModule = 0x6b800000, lpProcName = "VirtualAlloc") = 0x6b8181b0
[=] VirtualAlloc(lpAddress = 0, dwSize = 0xe0d, flAllocationType = 0x3000, flProtect = 0x40) = 0x50006f8
[EBP + 0xAF]: 0x50006f8
[=] GetModuleHandleA(lpModuleName = "Kernel32.dll") = 0x6b800000
[=] GetProcAddress(hModule = 0x6b800000, lpProcName = "VirtualAlloc") = 0x6b8181b0
[=] VirtualAlloc(lpAddress = 0, dwSize = 0xe0d, flAllocationType = 0x3000, flProtect = 0x40) = 0x5001505
Dumped