Add code-structure analysis: call graph, jump tables, basic blocks, constant xref

Wave 1 of the code-analysis layer, built on the x86-64 decoder:

- vmie_win32_callgraph walks each .pdata function with the decoder and emits an
  edge for every direct call/jmp whose target lands in the module - the
  intra-module call graph. Indirect edges are left to the IAT and jump tables.
- gva_jumptable recovers a switch's case targets from an indirect jump's table:
  consecutive pointer entries that land in an executable region.
- cfg_blocks splits one function view into basic blocks (a generic handler:
  leaders from intra-function branch targets, cut after jmp/jcc/ret).
- gva_imm_xref finds the instructions whose immediate operand equals a constant
  - the dual of code-xref for magic values, error codes, syscall numbers.

The decoder now also reports imm_off/imm_len so a caller can read or match the
immediate operand. The generic primitives live in the new codeanalysis.h
(jump tables, basic blocks) and scan.h (constant xref); the .pdata-bound call
graph stays on the win32 surface and reuses the existing function/section/decode
primitives - no second PE or instruction parser.
This commit is contained in:
2026-06-16 19:52:25 +03:00
parent c4419964aa
commit 79e82ffc6a
9 changed files with 505 additions and 1 deletions
+28
View File
@@ -83,6 +83,34 @@ int gva_sig_scan_multi(vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
int gva_code_xref(vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
uint64_t target_va, uint64_t* out, int max);
/* immediate / constant xref: every instruction in [lo,hi] (kept by the
* protection filter `prot_any`; pass VR_X to restrict to code) whose IMMEDIATE
* operand equals `value`, compared over the low `width` bytes (width is 1, 2, 4,
* or 8). Like gva_code_xref it brute-scans each byte offset with the light
* x86-64 decoder (x86dec.h, NOT a full disassembler) and carries a >=15-byte
* (max x86 instruction length) sweep overlap so no instruction is cut at a
* window seam; the same SEAM and INTERIOR de-duplications apply (a match
* starting in a non-last window's trailing overlap is left to the next window,
* and an interior alias falling inside an already-accepted match is dropped).
*
* An instruction matches when it carries an immediate (imm_len > 0) at least
* `width` bytes wide and its low `width` bytes equal `value & mask(width)`. The
* rel/RIP-relative DISPLACEMENT of a branch is NOT an immediate and never
* matches here - use gva_code_xref for displacement targets.
*
* Records each matching instruction-start VA in the view's coordinate space.
* Writes up to `max` VAs to `out` (NULL to count only) and returns the TOTAL
* number of matches, or -1 on bad input (a NULL m, an unswept range, or a width
* that is not 1/2/4/8). Use it to answer "what code uses the constant N" - error
* codes, magic values, syscall numbers, table sizes, struct sizes.
*
* Example - sites that load the NTSTATUS 0xC0000022 (ACCESS_DENIED) as a dword:
* uint64_t sites[64];
* int n = gva_imm_xref(m, cr3, lo, hi, VR_X, 0xC0000022ull, 4, sites, 64); */
int gva_imm_xref(vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
uint32_t prot_any, uint64_t value, int width,
uint64_t* out, int max);
/* gva bridges to the signature matcher: build mem_view from guest memory and feed sigscan.h */
int gva_sig_scan (vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
uint32_t prot_any, const sig_pattern_t* p, uint64_t* out, int max);