Add imports, inline-hook detection, function hashing, per-function imports

Wave 2 of the code-analysis layer:

- vmie_win32_imports resolves the import directory (INT/IAT) to {iat_rva, dll,
  name, ordinal} - named APIs, walking the name and slot thunks in lockstep so
  every import carries the IAT slot a call lands on.
- vmie_win32_inline_hooks decodes each .pdata function's entry and reports any
  whose first instruction is a direct jmp/call leaving the module image - the
  detour/trampoline shape.
- vmie_win32_func_imports records, in order, the IAT slots a function calls
  through (call qword [rip+disp] onto an import slot): the function's API-call
  sequence, named by correlating with vmie_win32_imports.
- func_hash (codeanalysis.h) hashes a function position-independently, zeroing
  the displacement bytes the decoder locates - one primitive for fingerprinting
  known code and for detecting a changed body across snapshots.

Devirtualization needs no new call and is documented as a composition: a
vtable's methods are gva_jumptable(vtable_va), its instances are
pmap_referrers(vtable_va), and func_hash names each method. Imports reuse the
shared data-directory accessor; the analyses reuse the function/section/decode
primitives - no second PE or instruction parser.
This commit is contained in:
2026-06-16 20:03:49 +03:00
parent 79e82ffc6a
commit 35c5dc06ba
5 changed files with 450 additions and 0 deletions
+119
View File
@@ -331,6 +331,125 @@ typedef struct { uint32_t from; uint32_t to; uint8_t kind; } call_edge;
int vmie_win32_callgraph(vmie_win32* v, uint64_t cr3, uint64_t module_base,
call_edge* out, int max);
/* One import: a function this module pulls from another DLL, recovered from the
* import directory (the INT/IAT pair of an IMAGE_IMPORT_DESCRIPTOR).
* iat_rva - RVA of the IAT slot that holds the resolved function pointer at
* run time (absolute VA = module_base + iat_rva). A call through
* this import is `call qword [rip+disp]` whose target lands on this
* slot - so iat_rva is exactly what vmie_win32_func_imports reports;
* correlate the two to name a function's API calls.
* dll - the exporting DLL name as written in the descriptor, NUL-
* terminated, TRUNCATED to 31 chars (e.g. "KERNEL32.dll"). A name
* longer than 31 bytes is cut; this is the documented limit.
* name - the imported function name, NUL-terminated, TRUNCATED to 63 chars
* (long C++ mangled names are cut); "" for a by-ordinal import.
* ordinal - the import ordinal for a by-ordinal import (name[0]=='\0'), else
* 0. By-ordinal imports set the high bit in the thunk and carry no
* name in the image. */
typedef struct { uint32_t iat_rva; char dll[32]; char name[64]; uint16_t ordinal; } import_sym;
/* Enumerate the module's imports from its import directory (IMAGE_DIRECTORY_
* ENTRY_IMPORT). For each IMAGE_IMPORT_DESCRIPTOR it reads the DLL name, then
* walks the parallel INT (OriginalFirstThunk: the name/ordinal hints) and IAT
* (FirstThunk: the resolved-pointer slots) in lockstep so every entry carries
* its own IAT-slot RVA. A by-name thunk points at an IMAGE_IMPORT_BY_NAME
* (hint+NUL-terminated name); a by-ordinal thunk has its top bit set and yields
* an ordinal instead. The INT is preferred when present (it survives binding);
* the IAT is the fallback.
*
* Returns the TOTAL number of imports (out=NULL => count only, so size then
* fill), or -1 if there is no import directory or the headers/directory are
* unreadable. Entries are reported descriptor by descriptor, and within a
* descriptor in thunk order.
*
* Example - list a module's imports and where each resolves:
* import_sym im[512];
* int n = vmie_win32_imports(v, pr->cr3, m.base, im, 512);
* for (int i = 0; i < n && i < 512; i++)
* if (im[i].name[0])
* printf("%s!%s -> IAT %#x\n", im[i].dll, im[i].name, im[i].iat_rva);
* else
* printf("%s!#%u -> IAT %#x\n", im[i].dll, im[i].ordinal, im[i].iat_rva); */
int vmie_win32_imports(vmie_win32* v, uint64_t cr3, uint64_t module_base,
import_sym* out, int max);
/* One inline-hook finding: a function whose FIRST instruction is a direct
* jmp/call leaving the module image - the classic detour / trampoline shape.
* func_rva - the hooked function's RVA (a .pdata function start). Absolute VA
* = module_base + func_rva.
* target - the absolute VA the entry redirects to. It lies OUTSIDE the
* module image [module_base, module_base + SizeOfImage); that is
* exactly what makes it a cross-module hook rather than an ordinary
* intra-module branch. */
typedef struct { uint32_t func_rva; uint64_t target; } inline_hook;
/* Detect inline (entry-redirect) hooks. For each function from .pdata
* (vmie_win32_functions) it decodes the FIRST instruction with x86_decode; if
* that instruction is a DIRECT jmp/call (has_rel) whose resolved target
* (x86_branch_target) lands OUTSIDE the module image
* [module_base, module_base + SizeOfImage), it records {func_rva, target}. An
* un-hooked function begins with its real prologue (push/sub/mov/endbr64...) or
* branches inside its own image, so it is not reported.
*
* Returns the TOTAL number of hooked functions (out=NULL => count only), or -1
* if the .pdata/.text directory or headers are missing/unreadable.
*
* Scope: this finds INLINE hooks (the function body's entry is patched). IAT
* hooks - an import SLOT redirected to point outside its resolving module - are
* a different shape that needs cross-module pointer resolution and are NOT
* covered here.
*
* Example - report any patched function entries in a module:
* inline_hook hk[64];
* int n = vmie_win32_inline_hooks(v, pr->cr3, m.base, hk, 64);
* for (int i = 0; i < n && i < 64; i++)
* printf("sub_%x hooked -> %#llx\n", hk[i].func_rva,
* (unsigned long long)hk[i].target); */
int vmie_win32_inline_hooks(vmie_win32* v, uint64_t cr3, uint64_t module_base,
inline_hook* out, int max);
/* Recover which IAT slots a function calls, in call order - the function's
* API-call sequence / behavioral fingerprint. It steps `func_rva`'s body with
* x86_decode and, for every `call/jmp qword [rip+disp]` (an indirect branch
* through memory: has_riprel) whose resolved memory target (x86_riprel_target)
* is an IAT slot of THIS module's import directory, it records that slot's RVA.
* Correlate the returned RVAs with vmie_win32_imports (same iat_rva) to turn the
* sequence into named API calls (e.g. CreateFileW, WriteFile, CloseHandle).
*
* func_rva - the function to analyze, as an RVA (e.g. from
* vmie_win32_functions or an export). Absolute VA = module_base +
* func_rva.
* iat_rvas - caller array receiving up to `max` IAT-slot RVAs in the order
* the calls appear; NULL to count only.
*
* Returns the TOTAL number of IAT-slot calls in the function (out=NULL =>
* count), or -1 if the headers / import directory / function bytes are
* unreadable. v1 resolves call/jmp THROUGH the IAT (rip-relative onto an import
* slot); other indirect forms are out of scope.
*
* Example - print the API sequence of a function:
* uint32_t slots[128];
* int n = vmie_win32_func_imports(v, pr->cr3, m.base, fn_rva, slots, 128);
* import_sym im[512];
* int ni = vmie_win32_imports(v, pr->cr3, m.base, im, 512);
* for (int i = 0; i < n && i < 128; i++)
* for (int j = 0; j < ni && j < 512; j++)
* if (im[j].iat_rva == slots[i]) { puts(im[j].name); break; } */
int vmie_win32_func_imports(vmie_win32* v, uint64_t cr3, uint64_t module_base,
uint32_t func_rva, uint32_t* iat_rvas, int max);
/* Devirtualization (C++ vtables) needs NO dedicated symbol - it is a
* COMPOSITION of primitives the engine already exposes:
* - a vtable at `vtable_va` is an array of code pointers, so its METHODS are
* gva_jumptable(mem, cr3, vtable_va, ...) (codeanalysis.h) - the same
* code-pointer-array walk that recovers switch tables;
* - its live INSTANCES are pmap_referrers(pm, vtable_va, ...) (pmap.h),
* because an object's first qword is its vtable pointer (who-points-here on
* the vtable VA enumerates the objects).
* Recover the method RVAs with gva_jumptable, then func_hash (codeanalysis.h)
* can name each method body against a known-hash table. No new call is added
* for this on purpose. */
/* One exported symbol from the module export directory (EAT).
* rva - export target RVA (absolute VA = module_base + rva). Forwarder
* exports report the forwarder-string RVA; see `forwarded`.