mirror of
https://dev.lirent.ru/Vatrog/vm-introspection-engine.git
synced 2026-06-18 02:06:36 +03:00
35c5dc06ba
Wave 2 of the code-analysis layer:
- vmie_win32_imports resolves the import directory (INT/IAT) to {iat_rva, dll,
name, ordinal} - named APIs, walking the name and slot thunks in lockstep so
every import carries the IAT slot a call lands on.
- vmie_win32_inline_hooks decodes each .pdata function's entry and reports any
whose first instruction is a direct jmp/call leaving the module image - the
detour/trampoline shape.
- vmie_win32_func_imports records, in order, the IAT slots a function calls
through (call qword [rip+disp] onto an import slot): the function's API-call
sequence, named by correlating with vmie_win32_imports.
- func_hash (codeanalysis.h) hashes a function position-independently, zeroing
the displacement bytes the decoder locates - one primitive for fingerprinting
known code and for detecting a changed body across snapshots.
Devirtualization needs no new call and is documented as a composition: a
vtable's methods are gva_jumptable(vtable_va), its instances are
pmap_referrers(vtable_va), and func_hash names each method. Imports reuse the
shared data-directory accessor; the analyses reuse the function/section/decode
primitives - no second PE or instruction parser.
117 lines
6.6 KiB
C
117 lines
6.6 KiB
C
/* codeanalysis.h - generic (OS-agnostic) x86-64 code-structure analysis.
|
|
*
|
|
* Handler layer: built on the generic memory model (memmodel.h: cr3 + VA, the
|
|
* region map, gva_read) and the light x86-64 decoder (x86dec.h). It names no
|
|
* Windows object - jump-table recovery and basic-block splitting are properties
|
|
* of code and the address space, not of any particular OS. The win32-specific
|
|
* call graph (which needs .pdata) lives in win32.h instead.
|
|
*
|
|
* These are the structure-recovery primitives that sit above the decoder and
|
|
* gva_code_xref / gva_imm_xref (scan.h): given a function body or an indirect
|
|
* jump's table, reconstruct the control flow the linear scanners cannot see.
|
|
*/
|
|
#ifndef VMIE_CODEANALYSIS_H
|
|
#define VMIE_CODEANALYSIS_H
|
|
#include <stdint.h>
|
|
#include <stddef.h>
|
|
#include "memmodel.h" /* vmie_mem, cr3+VA, vregion/VR_*, gva_read/gva_regions */
|
|
#include "sigscan.h" /* mem_view_t (the single owner of the view type) */
|
|
#include "x86dec.h" /* x86_decode, x86_insn, x86_branch_target */
|
|
|
|
/* Jump-table recovery. From `table_va`, read consecutive 8-byte entries and
|
|
* keep those that point into an EXECUTABLE region under `cr3` (membership tested
|
|
* against the live region map, i.e. a VR_X run from gva_regions); stop at the
|
|
* first entry that is not a code pointer, at a read failure, or at `max`. The
|
|
* entries are absolute 64-bit code VAs (the common /CASE jump-table form a
|
|
* compiler emits for a switch). Writes up to `max` recovered targets to
|
|
* `targets` (NULL to count only) and returns the number recovered.
|
|
*
|
|
* Feed it the table address taken from an indirect jump's memory operand - e.g.
|
|
* `jmp qword [rip+disp]` => rip+disp (x86_riprel_target), or the base of a
|
|
* `jmp qword [base + idx*8]` SIB table - to recover a switch's case targets and
|
|
* complete the control-flow graph that the linear decoders (cfg_blocks,
|
|
* vmie_win32_callgraph) leave dangling at the indirect jump.
|
|
*
|
|
* Returns 0 when the first entry is already not a code pointer (an empty/absent
|
|
* table), so a 0 return is "no table here", not an error.
|
|
*
|
|
* Example - resolve a switch reached by `jmp qword [rip+disp]`:
|
|
* x86_insn in; x86_decode(code, avail, &in); // the indirect jmp
|
|
* uint64_t tbl = x86_riprel_target(jmp_va, &in); // table base VA
|
|
* uint64_t cases[64];
|
|
* int n = gva_jumptable(m, cr3, tbl, cases, 64); // case target VAs */
|
|
int gva_jumptable(vmie_mem* m, uintptr_t cr3, uint64_t table_va,
|
|
uint64_t* targets, int max);
|
|
|
|
/* One basic block inside a function view. The offsets are in the VIEW's own
|
|
* coordinate space (mem_view_t.base_va + offset): for a SECTION_LOCAL view they
|
|
* are section-local byte offsets, for a MODULE_RVA view they are RVAs.
|
|
* start - byte offset of the block's first instruction (inclusive)
|
|
* end - byte offset just past the block's last instruction (exclusive), so
|
|
* the block spans [start, end) and its length is end - start. */
|
|
typedef struct { uint32_t start; uint32_t end; } code_block;
|
|
|
|
/* Split one function's bytes into basic blocks. `fn` is a view spanning exactly
|
|
* one function (e.g. a section-view sub-range covering a func_range from
|
|
* vmie_win32_functions): fn.data[0] is the function's first byte and fn.size its
|
|
* length. Two linear passes over the bytes with the decoder:
|
|
* 1. collect intra-function branch targets (the destinations of jmp/jcc whose
|
|
* target lands inside [0, fn.size)) - these are leaders;
|
|
* 2. cut a block after every jmp/jcc/ret and before every leader. A CALL is
|
|
* treated as fall-through (it returns), so it does NOT end a block. A
|
|
* branch whose target is OUTSIDE `fn` (a tail call or inter-procedural jmp)
|
|
* ends the block but starts no new one inside `fn`.
|
|
*
|
|
* Blocks are emitted in ascending start order, partition [0, fn.size) with no
|
|
* gaps or overlaps, and are reported in the view's coordinate space (start/end
|
|
* are offsets from fn.base_va). Writes up to `max` blocks to `out` (NULL to
|
|
* count only) and returns the TOTAL block count, or -1 if the bytes do not
|
|
* decode cleanly (a desync: the linear walk hit an undecodable byte). Pure: it
|
|
* touches only the view and the decoder, no vmie_mem / no I/O.
|
|
*
|
|
* Example - block count and extents of one function:
|
|
* mem_view_t fn; // a SECTION_LOCAL/RVA sub-view of one function
|
|
* code_block bb[256];
|
|
* int n = cfg_blocks(fn, bb, 256);
|
|
* for (int i = 0; i < n && i < 256; i++)
|
|
* printf("block %d: [%#x, %#x)\n", i, bb[i].start, bb[i].end); */
|
|
int cfg_blocks(mem_view_t fn, code_block* out, int max);
|
|
|
|
/* Position-independent hash of a function's bytes. `fn` is a view spanning
|
|
* exactly one function (e.g. a section-view sub-range covering a func_range from
|
|
* vmie_win32_functions): fn.data[0] is the function's first byte, fn.size its
|
|
* length. It steps `fn` with the decoder (x86_decode - no second decoder) and
|
|
* folds the opcode / ModRM / SIB / immediate bytes into a 64-bit hash while
|
|
* ZEROING the rel/RIP-relative displacement bytes of each instruction
|
|
* (in.disp_off .. in.disp_off + in.disp_len, exactly the span sig_generate
|
|
* wildcards). Those are the bytes that float with the load address and
|
|
* relocation, so zeroing them makes the hash STABLE across images and ASLR -
|
|
* the same function hashes identically wherever it is mapped.
|
|
*
|
|
* Returns a 64-bit hash, or 0 if `fn` is empty (no data / size 0) or does not
|
|
* decode cleanly (a desync stops the walk). 0 is therefore "no hash", never a
|
|
* valid fingerprint.
|
|
*
|
|
* Two uses on one primitive:
|
|
* - fingerprint / library-ID: compare against a table of known function hashes
|
|
* to auto-name recovered code (e.g. recognize a statically-linked CRT/SSL
|
|
* routine without symbols);
|
|
* - code diff: hash the same function in two snapshots - an unchanged hash
|
|
* means the body is byte-identical (modulo relocation), a changed hash means
|
|
* it was patched.
|
|
*
|
|
* Devirtualization needs NO new call - it is a composition of primitives the
|
|
* engine already has: a C++ vtable at `vtable_va` is an array of code pointers,
|
|
* so its METHODS are gva_jumptable(m, cr3, vtable_va, ...) (codeanalysis.h), and
|
|
* its live INSTANCES are pmap_referrers(pm, vtable_va, ...) (pmap.h) - every
|
|
* object's first qword is its vtable pointer. With the methods recovered,
|
|
* func_hash names each method body against a known-hash table. (See win32.h for
|
|
* the same note next to the indirect-call surface.)
|
|
*
|
|
* Example - diff a function across two snapshots:
|
|
* mem_view_t a, b; // same function, two captures (SECTION_LOCAL/RVA views)
|
|
* if (func_hash(a) != func_hash(b)) puts("function body changed"); */
|
|
uint64_t func_hash(mem_view_t fn);
|
|
|
|
#endif /* VMIE_CODEANALYSIS_H */
|