Add code-structure analysis: call graph, jump tables, basic blocks, constant xref

Wave 1 of the code-analysis layer, built on the x86-64 decoder:

- vmie_win32_callgraph walks each .pdata function with the decoder and emits an
  edge for every direct call/jmp whose target lands in the module - the
  intra-module call graph. Indirect edges are left to the IAT and jump tables.
- gva_jumptable recovers a switch's case targets from an indirect jump's table:
  consecutive pointer entries that land in an executable region.
- cfg_blocks splits one function view into basic blocks (a generic handler:
  leaders from intra-function branch targets, cut after jmp/jcc/ret).
- gva_imm_xref finds the instructions whose immediate operand equals a constant
  - the dual of code-xref for magic values, error codes, syscall numbers.

The decoder now also reports imm_off/imm_len so a caller can read or match the
immediate operand. The generic primitives live in the new codeanalysis.h
(jump tables, basic blocks) and scan.h (constant xref); the .pdata-bound call
graph stays on the win32 surface and reuses the existing function/section/decode
primitives - no second PE or instruction parser.
This commit is contained in:
2026-06-16 19:52:25 +03:00
parent c4419964aa
commit 79e82ffc6a
9 changed files with 505 additions and 1 deletions
+36
View File
@@ -295,6 +295,42 @@ typedef struct { uint32_t rva; uint32_t size; } func_range;
int vmie_win32_functions(vmie_win32* v, uint64_t cr3, uint64_t module_base,
func_range* out, int max);
/* One call-graph edge, with both endpoints as RVAs relative to the module base
* (absolute VA = module_base + rva).
* from - RVA of the function that contains the call/jmp site (a .pdata
* function start)
* to - RVA of the branch target (inside the same module image)
* kind - 0 = call (E8 / direct CALL), 1 = direct jmp (E9/EB, including a tail
* call to another function). */
typedef struct { uint32_t from; uint32_t to; uint8_t kind; } call_edge;
/* Build the intra-module call graph of the image at `module_base` (in the `cr3`
* address space). Reuses the existing primitives - vmie_win32_functions to
* enumerate the .pdata function starts, vmie_win32_section_view to gather the
* .text bytes, and x86_decode to step each function - and emits one edge for
* every DIRECT call/jmp (has_rel) whose resolved target lands inside the module
* image [module_base, module_base + SizeOfImage). `from` is the containing
* function's RVA, `to` is the target's RVA.
*
* INDIRECT calls/jmps (through a register or memory, e.g. `call [rip+disp]` or
* `jmp rax`) are SKIPPED here - they carry no static rel target. Resolve those
* separately: switch tables via gva_jumptable, import thunks via the IAT (a
* wave-2 concern). A direct branch whose target falls OUTSIDE the image (an
* inter-module jmp/call) is also skipped - the graph is intra-module by
* construction.
*
* Writes up to `max` edges to `out` (NULL to count only) and returns the TOTAL
* edge count, or -1 if the .pdata/.text directory is missing or unreadable.
* Edges are grouped by source function (all of one function's edges are
* contiguous), in ascending function order.
*
* Example - out-degree of each function:
* call_edge e[4096];
* int n = vmie_win32_callgraph(v, pr->cr3, m.base, e, 4096);
* // group by e[i].from to get each function's callees */
int vmie_win32_callgraph(vmie_win32* v, uint64_t cr3, uint64_t module_base,
call_edge* out, int max);
/* One exported symbol from the module export directory (EAT).
* rva - export target RVA (absolute VA = module_base + rva). Forwarder
* exports report the forwarder-string RVA; see `forwarded`.