12 Commits

Author SHA1 Message Date
lirent 50ed32b7dc Add function-level code diff over caller-supplied views
code_diff compares two views of the same code in one coordinate space - an
on-disk image section against the live in-memory section, or one .text across
two snapshots - and reports the functions whose body changed. For each function
extent it func_hash()es the slice of each view and flags a mismatch: a patch, an
inline hook, or an unpacked/JIT-rewritten body. A thin handler over func_hash +
mem_sub, with no file I/O of its own - the caller owns reading the on-disk image.
The relocation limit (absolute-address immediates) is documented; two snapshots
at the same base diff exactly. Closes the non-starred reversing series.
2026-06-16 20:21:36 +03:00
lirent 35c5dc06ba Add imports, inline-hook detection, function hashing, per-function imports
Wave 2 of the code-analysis layer:

- vmie_win32_imports resolves the import directory (INT/IAT) to {iat_rva, dll,
  name, ordinal} - named APIs, walking the name and slot thunks in lockstep so
  every import carries the IAT slot a call lands on.
- vmie_win32_inline_hooks decodes each .pdata function's entry and reports any
  whose first instruction is a direct jmp/call leaving the module image - the
  detour/trampoline shape.
- vmie_win32_func_imports records, in order, the IAT slots a function calls
  through (call qword [rip+disp] onto an import slot): the function's API-call
  sequence, named by correlating with vmie_win32_imports.
- func_hash (codeanalysis.h) hashes a function position-independently, zeroing
  the displacement bytes the decoder locates - one primitive for fingerprinting
  known code and for detecting a changed body across snapshots.

Devirtualization needs no new call and is documented as a composition: a
vtable's methods are gva_jumptable(vtable_va), its instances are
pmap_referrers(vtable_va), and func_hash names each method. Imports reuse the
shared data-directory accessor; the analyses reuse the function/section/decode
primitives - no second PE or instruction parser.
2026-06-16 20:03:49 +03:00
lirent 79e82ffc6a Add code-structure analysis: call graph, jump tables, basic blocks, constant xref
Wave 1 of the code-analysis layer, built on the x86-64 decoder:

- vmie_win32_callgraph walks each .pdata function with the decoder and emits an
  edge for every direct call/jmp whose target lands in the module - the
  intra-module call graph. Indirect edges are left to the IAT and jump tables.
- gva_jumptable recovers a switch's case targets from an indirect jump's table:
  consecutive pointer entries that land in an executable region.
- cfg_blocks splits one function view into basic blocks (a generic handler:
  leaders from intra-function branch targets, cut after jmp/jcc/ret).
- gva_imm_xref finds the instructions whose immediate operand equals a constant
  - the dual of code-xref for magic values, error codes, syscall numbers.

The decoder now also reports imm_off/imm_len so a caller can read or match the
immediate operand. The generic primitives live in the new codeanalysis.h
(jump tables, basic blocks) and scan.h (constant xref); the .pdata-bound call
graph stays on the win32 surface and reuses the existing function/section/decode
primitives - no second PE or instruction parser.
2026-06-16 19:52:25 +03:00
lirent c4419964aa Add function inventory (.pdata), signature generation, and export/PDB symbols
Three reversing capabilities on the win32 surface plus a pure sig-gen handler:

- vmie_win32_functions enumerates a module's functions from the exception
  directory (.pdata RUNTIME_FUNCTION), folding unwind chain continuations into
  their primary - authoritative non-leaf boundaries, not prologue heuristics.
- vmie_win32_exports resolves the export table to {name, rva, ordinal,
  forwarded}: named functions with no PDB or network. vmie_win32_pdb_ref pulls
  the CodeView/RSDS {guid, age, pdb} from the debug directory - the symbol-server
  key for any module (full PDB parsing stays out of scope).
- sig_generate (siggen.h) builds a unique masked signature for a code span,
  wildcarding the rel/RIP-relative displacement bytes the x86 decoder locates and
  growing until it matches the scope exactly once - the dual of sigscan.

The decoder now also reports disp_off/disp_len so a caller can mask the floating
bytes. The MZ/PE walk gains one shared data-directory accessor and one shared
CodeView/RSDS parser; the kernel bootstrap is moved onto both, removing its
private copies - one PE parser in the tree.
2026-06-16 19:27:42 +03:00
lirent 06230ac680 Add PE section enumeration and section views (section-local / RVA / absolute)
vmie_win32_sections lists a module's PE sections (name, RVA, virtual size,
VR_* protection) for any image base in a process address space - including a
base found by scanning, not only loader-list modules. vmie_win32_section_view
gathers a section's bytes into a caller buffer and returns a mem_view_t whose
base_va is chosen by view_base: SECTION_LOCAL (0, section-relative offsets),
MODULE_RVA (ASLR-stable module RVAs), or ABSOLUTE_VA (live VA). Because the pure
scanners report base_va + offset, the mode directly selects the coordinate space
of every hit - feeding a view to sig_all or x86_decode yields section-relative,
RVA, or absolute results with no extra work.

The MZ/PE header walk is factored into one helper that both pe_find_section and
the new enumerator share - no second parser. The whole public surface is
documented with the operational nuances (coordinate stability, borrowed-buffer
lifetime, truncation, residency) and worked examples.
2026-06-16 19:06:59 +03:00
lirent 3199fbf258 Add a light x86-64 decoder; back code-xref with it
The reversing keystone: a length-disassembly decoder with control-flow and
RIP-relative target extraction (x86dec.h), pure over a byte buffer - no vmie_mem,
no cr3, no Windows. Table-driven length over the 1-byte / 0F / 0F38 / 0F3A maps,
legacy + REX + VEX prefixes, ModRM/SIB, displacements and immediates (66 and
REX.W operand-size aware). It reports the instruction length plus the rel and
RIP-relative targets of near call/jmp/jcc and any RIP-relative memory operand.
EVEX is a documented gap (decodes as length 0). This is the primitive the rest
of the static-reversing layer builds on (function inventory, call graph, xref).

gva_code_xref now brute-scans with the decoder instead of its own ad-hoc E8/E9
and REX.W-lea heuristic, which is removed - one decoder in the tree. Because a
brute scan can re-enter a prefixed instruction one byte in and decode a shorter
aliased form with the same target, the scan drops a match that starts inside the
extent of an already-accepted one; real, non-overlapping instructions are
unaffected.
2026-06-16 18:11:29 +03:00
lirent c36ffe295d Add process-scoped scanning algorithms: multi-pattern, code-xref, pointer-map, dissection, snapshot diff
All are OS-agnostic handlers keyed by vmie_mem* + cr3, built on the windowed
sweep / region walk / matcher; none names a Windows concept and each compiles
against include/ alone.

Scanning: a compiled multi-pattern automaton (Aho-Corasick over each pattern's
longest literal anchor, then a masked verify) finds N signatures in one sweep
pass (sigscan.h sigset; scan.h gva_sig_scan_multi). gva_code_xref decodes
rel32 call/jmp and RIP-relative lea/mov to find every instruction targeting a
given VA.

Pointer graph (pmap.h): one sweep indexes every qword whose value lands in a
mapped region into reverse + forward edges. pmap_referrers is the keystone -
it answers who-points-here, class-instance enumeration (referrers of a vtable
VA), and string xref (referrers of a string VA) from the same index;
pmap_paths is the indexed counterpart to scan_pointer's one-shot DFS;
struct_dissect classifies the qwords of an instance (pointer/vtable/float/
int/string) into a field map.

Temporal (snapdiff.h): snap_take captures a window's bytes, snap_diff reports
the changed runs against a later read.
2026-06-16 17:38:10 +03:00
lirent dc09d7f2a4 Keep the arch layer's prose consumer-agnostic; note the x86-64 binding
The generic address-space layer no longer names win32 in its comments: the
khalf_score and gva_translate doc-comments described themselves in terms of
their current Windows consumer, a downward coupling from the stable layer to a
specific, swappable one. Reworded to describe what each primitive does, not who
calls it. Also drop a dangling reference to the renamed engine handle.

State the contract's real scope in memmodel.h: OS-agnostic but architecture-
bound. The address-space key is the x86-64 CR3 (the PML4 base), shared by any
guest OS on x86-64 - CR3 is an ISA register, not a Windows concept; only its
per-process storage (DirectoryTableBase) is win32-specific and stays in the
win32 engine.
2026-06-15 12:07:43 +03:00
lirent 93966c3df2 Define the win32 engine; add a dump source and physical sigscan
Name and isolate the Windows engine as one of potentially several. The
public surface moves to include/win32.h with an opaque vmie_win32 handle
(vmie_win32_open/close/mem); the engine's Windows internals — host bring-up,
the struct-offset profile, process/module/PE/text decode — live under
src/engine/win32. The generic address-space layer stays in src/engine
(gva.c + engine-arch.h, carrying no offset table): gva.c is de-profiled, and
CR3 bring-up reaches the hot translator through a cold gva_translate bridge
so the zero-copy hot path stays private and inlinable.

A memory source is now first-class and public: vmie_mem_open/_open_segs/
_close open a flat dump (or an explicit segment map) as a vmie_mem, with
gpa_seg promoted to the public contract. The physical signature scan is
exposed source-agnostically: sig_scan_mem returns GPAs for any vmie_mem,
sig_scan_sources scans several sources with per-source attribution, and
sig_from_bytes builds an exact needle from a byte span. The pure matcher is
unchanged; dumps and the live engine image are scanned uniformly, neither
needing the other.
2026-06-15 08:20:50 +03:00
lirent b3441dd6f6 Split the library into CORE / ENGINE / HANDLERS layers
CORE (src/core): vmie_mem — guest-physical substrate with a data-driven
segment map (replaces the hardcoded 4 GiB PCI-hole topology). ENGINE
(src/engine): x86-64 paging + Windows bring-up; produces the generic memory
model. HANDLERS (src/handlers): the signature/value/pointer scanners, which
now consume an OS-agnostic contract.

Keystone: gva_ctx is split into vmie_mem (core) + vmie (engine); the generic
access functions take vmie_mem* + cr3 and no longer compile in the Windows
offset table. New public contract include/memmodel.h (vmie_mem, mem_view_t,
vregion, task, range, the gva_* access); win32 surface in include/vmie.h.
Leak relocations: the PE parser, UTF-16 decode and CR3-recovery heuristics
move engine-side; the matcher stays a pure, source-agnostic handler, and the
pointer scanner takes a generic range[] instead of reaching into the process
enumerator.
2026-06-15 02:57:46 +03:00
lirent 7c0995a4f2 Rename project w32ms -> vmi-engine
Library vmie (libvmie.a), CLI vmie_cli, guest agent vmie-startup.exe,
symbol prefix VMIE_ (header guards, the LTO build option). No behavior change.
2026-06-15 01:49:16 +03:00
lirent 1ec70b7ede Windows guest VMI core: host library, CLI, guest agent
Static library over a flat RW mmap of guest RAM: GPA/GVA paging walks,
beacon-driven bootstrap, dynamic struct-offset profiling, process and
module enumeration, a region map, and value/pointer/signature scanners on
a shared windowed sweep. Public API in include/; internals under src/.

Thin CLI demonstrator over the public API. Guest agent cross-compiled to
Windows x86-64 via mingw-w64. CMake: static library + CLI + guest target,
C17.
2026-06-14 21:47:56 +03:00