Add imports, inline-hook detection, function hashing, per-function imports

Wave 2 of the code-analysis layer:

- vmie_win32_imports resolves the import directory (INT/IAT) to {iat_rva, dll,
  name, ordinal} - named APIs, walking the name and slot thunks in lockstep so
  every import carries the IAT slot a call lands on.
- vmie_win32_inline_hooks decodes each .pdata function's entry and reports any
  whose first instruction is a direct jmp/call leaving the module image - the
  detour/trampoline shape.
- vmie_win32_func_imports records, in order, the IAT slots a function calls
  through (call qword [rip+disp] onto an import slot): the function's API-call
  sequence, named by correlating with vmie_win32_imports.
- func_hash (codeanalysis.h) hashes a function position-independently, zeroing
  the displacement bytes the decoder locates - one primitive for fingerprinting
  known code and for detecting a changed body across snapshots.

Devirtualization needs no new call and is documented as a composition: a
vtable's methods are gva_jumptable(vtable_va), its instances are
pmap_referrers(vtable_va), and func_hash names each method. Imports reuse the
shared data-directory accessor; the analyses reuse the function/section/decode
primitives - no second PE or instruction parser.
This commit is contained in:
2026-06-16 20:03:49 +03:00
parent 79e82ffc6a
commit 35c5dc06ba
5 changed files with 450 additions and 0 deletions
+36
View File
@@ -77,4 +77,40 @@ typedef struct { uint32_t start; uint32_t end; } code_block;
* printf("block %d: [%#x, %#x)\n", i, bb[i].start, bb[i].end); */
int cfg_blocks(mem_view_t fn, code_block* out, int max);
/* Position-independent hash of a function's bytes. `fn` is a view spanning
* exactly one function (e.g. a section-view sub-range covering a func_range from
* vmie_win32_functions): fn.data[0] is the function's first byte, fn.size its
* length. It steps `fn` with the decoder (x86_decode - no second decoder) and
* folds the opcode / ModRM / SIB / immediate bytes into a 64-bit hash while
* ZEROING the rel/RIP-relative displacement bytes of each instruction
* (in.disp_off .. in.disp_off + in.disp_len, exactly the span sig_generate
* wildcards). Those are the bytes that float with the load address and
* relocation, so zeroing them makes the hash STABLE across images and ASLR -
* the same function hashes identically wherever it is mapped.
*
* Returns a 64-bit hash, or 0 if `fn` is empty (no data / size 0) or does not
* decode cleanly (a desync stops the walk). 0 is therefore "no hash", never a
* valid fingerprint.
*
* Two uses on one primitive:
* - fingerprint / library-ID: compare against a table of known function hashes
* to auto-name recovered code (e.g. recognize a statically-linked CRT/SSL
* routine without symbols);
* - code diff: hash the same function in two snapshots - an unchanged hash
* means the body is byte-identical (modulo relocation), a changed hash means
* it was patched.
*
* Devirtualization needs NO new call - it is a composition of primitives the
* engine already has: a C++ vtable at `vtable_va` is an array of code pointers,
* so its METHODS are gva_jumptable(m, cr3, vtable_va, ...) (codeanalysis.h), and
* its live INSTANCES are pmap_referrers(pm, vtable_va, ...) (pmap.h) - every
* object's first qword is its vtable pointer. With the methods recovered,
* func_hash names each method body against a known-hash table. (See win32.h for
* the same note next to the indirect-call surface.)
*
* Example - diff a function across two snapshots:
* mem_view_t a, b; // same function, two captures (SECTION_LOCAL/RVA views)
* if (func_hash(a) != func_hash(b)) puts("function body changed"); */
uint64_t func_hash(mem_view_t fn);
#endif /* VMIE_CODEANALYSIS_H */