Files
vatrog-vm-introspection-engine/include/codeanalysis.h
T
lirent 79e82ffc6a Add code-structure analysis: call graph, jump tables, basic blocks, constant xref
Wave 1 of the code-analysis layer, built on the x86-64 decoder:

- vmie_win32_callgraph walks each .pdata function with the decoder and emits an
  edge for every direct call/jmp whose target lands in the module - the
  intra-module call graph. Indirect edges are left to the IAT and jump tables.
- gva_jumptable recovers a switch's case targets from an indirect jump's table:
  consecutive pointer entries that land in an executable region.
- cfg_blocks splits one function view into basic blocks (a generic handler:
  leaders from intra-function branch targets, cut after jmp/jcc/ret).
- gva_imm_xref finds the instructions whose immediate operand equals a constant
  - the dual of code-xref for magic values, error codes, syscall numbers.

The decoder now also reports imm_off/imm_len so a caller can read or match the
immediate operand. The generic primitives live in the new codeanalysis.h
(jump tables, basic blocks) and scan.h (constant xref); the .pdata-bound call
graph stays on the win32 surface and reuses the existing function/section/decode
primitives - no second PE or instruction parser.
2026-06-16 19:52:25 +03:00

81 lines
4.5 KiB
C

/* codeanalysis.h - generic (OS-agnostic) x86-64 code-structure analysis.
*
* Handler layer: built on the generic memory model (memmodel.h: cr3 + VA, the
* region map, gva_read) and the light x86-64 decoder (x86dec.h). It names no
* Windows object - jump-table recovery and basic-block splitting are properties
* of code and the address space, not of any particular OS. The win32-specific
* call graph (which needs .pdata) lives in win32.h instead.
*
* These are the structure-recovery primitives that sit above the decoder and
* gva_code_xref / gva_imm_xref (scan.h): given a function body or an indirect
* jump's table, reconstruct the control flow the linear scanners cannot see.
*/
#ifndef VMIE_CODEANALYSIS_H
#define VMIE_CODEANALYSIS_H
#include <stdint.h>
#include <stddef.h>
#include "memmodel.h" /* vmie_mem, cr3+VA, vregion/VR_*, gva_read/gva_regions */
#include "sigscan.h" /* mem_view_t (the single owner of the view type) */
#include "x86dec.h" /* x86_decode, x86_insn, x86_branch_target */
/* Jump-table recovery. From `table_va`, read consecutive 8-byte entries and
* keep those that point into an EXECUTABLE region under `cr3` (membership tested
* against the live region map, i.e. a VR_X run from gva_regions); stop at the
* first entry that is not a code pointer, at a read failure, or at `max`. The
* entries are absolute 64-bit code VAs (the common /CASE jump-table form a
* compiler emits for a switch). Writes up to `max` recovered targets to
* `targets` (NULL to count only) and returns the number recovered.
*
* Feed it the table address taken from an indirect jump's memory operand - e.g.
* `jmp qword [rip+disp]` => rip+disp (x86_riprel_target), or the base of a
* `jmp qword [base + idx*8]` SIB table - to recover a switch's case targets and
* complete the control-flow graph that the linear decoders (cfg_blocks,
* vmie_win32_callgraph) leave dangling at the indirect jump.
*
* Returns 0 when the first entry is already not a code pointer (an empty/absent
* table), so a 0 return is "no table here", not an error.
*
* Example - resolve a switch reached by `jmp qword [rip+disp]`:
* x86_insn in; x86_decode(code, avail, &in); // the indirect jmp
* uint64_t tbl = x86_riprel_target(jmp_va, &in); // table base VA
* uint64_t cases[64];
* int n = gva_jumptable(m, cr3, tbl, cases, 64); // case target VAs */
int gva_jumptable(vmie_mem* m, uintptr_t cr3, uint64_t table_va,
uint64_t* targets, int max);
/* One basic block inside a function view. The offsets are in the VIEW's own
* coordinate space (mem_view_t.base_va + offset): for a SECTION_LOCAL view they
* are section-local byte offsets, for a MODULE_RVA view they are RVAs.
* start - byte offset of the block's first instruction (inclusive)
* end - byte offset just past the block's last instruction (exclusive), so
* the block spans [start, end) and its length is end - start. */
typedef struct { uint32_t start; uint32_t end; } code_block;
/* Split one function's bytes into basic blocks. `fn` is a view spanning exactly
* one function (e.g. a section-view sub-range covering a func_range from
* vmie_win32_functions): fn.data[0] is the function's first byte and fn.size its
* length. Two linear passes over the bytes with the decoder:
* 1. collect intra-function branch targets (the destinations of jmp/jcc whose
* target lands inside [0, fn.size)) - these are leaders;
* 2. cut a block after every jmp/jcc/ret and before every leader. A CALL is
* treated as fall-through (it returns), so it does NOT end a block. A
* branch whose target is OUTSIDE `fn` (a tail call or inter-procedural jmp)
* ends the block but starts no new one inside `fn`.
*
* Blocks are emitted in ascending start order, partition [0, fn.size) with no
* gaps or overlaps, and are reported in the view's coordinate space (start/end
* are offsets from fn.base_va). Writes up to `max` blocks to `out` (NULL to
* count only) and returns the TOTAL block count, or -1 if the bytes do not
* decode cleanly (a desync: the linear walk hit an undecodable byte). Pure: it
* touches only the view and the decoder, no vmie_mem / no I/O.
*
* Example - block count and extents of one function:
* mem_view_t fn; // a SECTION_LOCAL/RVA sub-view of one function
* code_block bb[256];
* int n = cfg_blocks(fn, bb, 256);
* for (int i = 0; i < n && i < 256; i++)
* printf("block %d: [%#x, %#x)\n", i, bb[i].start, bb[i].end); */
int cfg_blocks(mem_view_t fn, code_block* out, int max);
#endif /* VMIE_CODEANALYSIS_H */