Commit Graph

14 Commits

Author SHA1 Message Date
lirent 35c5dc06ba Add imports, inline-hook detection, function hashing, per-function imports
Wave 2 of the code-analysis layer:

- vmie_win32_imports resolves the import directory (INT/IAT) to {iat_rva, dll,
  name, ordinal} - named APIs, walking the name and slot thunks in lockstep so
  every import carries the IAT slot a call lands on.
- vmie_win32_inline_hooks decodes each .pdata function's entry and reports any
  whose first instruction is a direct jmp/call leaving the module image - the
  detour/trampoline shape.
- vmie_win32_func_imports records, in order, the IAT slots a function calls
  through (call qword [rip+disp] onto an import slot): the function's API-call
  sequence, named by correlating with vmie_win32_imports.
- func_hash (codeanalysis.h) hashes a function position-independently, zeroing
  the displacement bytes the decoder locates - one primitive for fingerprinting
  known code and for detecting a changed body across snapshots.

Devirtualization needs no new call and is documented as a composition: a
vtable's methods are gva_jumptable(vtable_va), its instances are
pmap_referrers(vtable_va), and func_hash names each method. Imports reuse the
shared data-directory accessor; the analyses reuse the function/section/decode
primitives - no second PE or instruction parser.
2026-06-16 20:03:49 +03:00
lirent 79e82ffc6a Add code-structure analysis: call graph, jump tables, basic blocks, constant xref
Wave 1 of the code-analysis layer, built on the x86-64 decoder:

- vmie_win32_callgraph walks each .pdata function with the decoder and emits an
  edge for every direct call/jmp whose target lands in the module - the
  intra-module call graph. Indirect edges are left to the IAT and jump tables.
- gva_jumptable recovers a switch's case targets from an indirect jump's table:
  consecutive pointer entries that land in an executable region.
- cfg_blocks splits one function view into basic blocks (a generic handler:
  leaders from intra-function branch targets, cut after jmp/jcc/ret).
- gva_imm_xref finds the instructions whose immediate operand equals a constant
  - the dual of code-xref for magic values, error codes, syscall numbers.

The decoder now also reports imm_off/imm_len so a caller can read or match the
immediate operand. The generic primitives live in the new codeanalysis.h
(jump tables, basic blocks) and scan.h (constant xref); the .pdata-bound call
graph stays on the win32 surface and reuses the existing function/section/decode
primitives - no second PE or instruction parser.
2026-06-16 19:52:25 +03:00
lirent c4419964aa Add function inventory (.pdata), signature generation, and export/PDB symbols
Three reversing capabilities on the win32 surface plus a pure sig-gen handler:

- vmie_win32_functions enumerates a module's functions from the exception
  directory (.pdata RUNTIME_FUNCTION), folding unwind chain continuations into
  their primary - authoritative non-leaf boundaries, not prologue heuristics.
- vmie_win32_exports resolves the export table to {name, rva, ordinal,
  forwarded}: named functions with no PDB or network. vmie_win32_pdb_ref pulls
  the CodeView/RSDS {guid, age, pdb} from the debug directory - the symbol-server
  key for any module (full PDB parsing stays out of scope).
- sig_generate (siggen.h) builds a unique masked signature for a code span,
  wildcarding the rel/RIP-relative displacement bytes the x86 decoder locates and
  growing until it matches the scope exactly once - the dual of sigscan.

The decoder now also reports disp_off/disp_len so a caller can mask the floating
bytes. The MZ/PE walk gains one shared data-directory accessor and one shared
CodeView/RSDS parser; the kernel bootstrap is moved onto both, removing its
private copies - one PE parser in the tree.
2026-06-16 19:27:42 +03:00
lirent 06230ac680 Add PE section enumeration and section views (section-local / RVA / absolute)
vmie_win32_sections lists a module's PE sections (name, RVA, virtual size,
VR_* protection) for any image base in a process address space - including a
base found by scanning, not only loader-list modules. vmie_win32_section_view
gathers a section's bytes into a caller buffer and returns a mem_view_t whose
base_va is chosen by view_base: SECTION_LOCAL (0, section-relative offsets),
MODULE_RVA (ASLR-stable module RVAs), or ABSOLUTE_VA (live VA). Because the pure
scanners report base_va + offset, the mode directly selects the coordinate space
of every hit - feeding a view to sig_all or x86_decode yields section-relative,
RVA, or absolute results with no extra work.

The MZ/PE header walk is factored into one helper that both pe_find_section and
the new enumerator share - no second parser. The whole public surface is
documented with the operational nuances (coordinate stability, borrowed-buffer
lifetime, truncation, residency) and worked examples.
2026-06-16 19:06:59 +03:00
lirent 3199fbf258 Add a light x86-64 decoder; back code-xref with it
The reversing keystone: a length-disassembly decoder with control-flow and
RIP-relative target extraction (x86dec.h), pure over a byte buffer - no vmie_mem,
no cr3, no Windows. Table-driven length over the 1-byte / 0F / 0F38 / 0F3A maps,
legacy + REX + VEX prefixes, ModRM/SIB, displacements and immediates (66 and
REX.W operand-size aware). It reports the instruction length plus the rel and
RIP-relative targets of near call/jmp/jcc and any RIP-relative memory operand.
EVEX is a documented gap (decodes as length 0). This is the primitive the rest
of the static-reversing layer builds on (function inventory, call graph, xref).

gva_code_xref now brute-scans with the decoder instead of its own ad-hoc E8/E9
and REX.W-lea heuristic, which is removed - one decoder in the tree. Because a
brute scan can re-enter a prefixed instruction one byte in and decode a shorter
aliased form with the same target, the scan drops a match that starts inside the
extent of an already-accepted one; real, non-overlapping instructions are
unaffected.
2026-06-16 18:11:29 +03:00
lirent c36ffe295d Add process-scoped scanning algorithms: multi-pattern, code-xref, pointer-map, dissection, snapshot diff
All are OS-agnostic handlers keyed by vmie_mem* + cr3, built on the windowed
sweep / region walk / matcher; none names a Windows concept and each compiles
against include/ alone.

Scanning: a compiled multi-pattern automaton (Aho-Corasick over each pattern's
longest literal anchor, then a masked verify) finds N signatures in one sweep
pass (sigscan.h sigset; scan.h gva_sig_scan_multi). gva_code_xref decodes
rel32 call/jmp and RIP-relative lea/mov to find every instruction targeting a
given VA.

Pointer graph (pmap.h): one sweep indexes every qword whose value lands in a
mapped region into reverse + forward edges. pmap_referrers is the keystone -
it answers who-points-here, class-instance enumeration (referrers of a vtable
VA), and string xref (referrers of a string VA) from the same index;
pmap_paths is the indexed counterpart to scan_pointer's one-shot DFS;
struct_dissect classifies the qwords of an instance (pointer/vtable/float/
int/string) into a field map.

Temporal (snapdiff.h): snap_take captures a window's bytes, snap_diff reports
the changed runs against a later read.
2026-06-16 17:38:10 +03:00
lirent 25b8ed8ca9 Add a dump-scan demonstrator (vmie_scan)
A thin CLI proving the OS-agnostic dump path end to end: open one or more raw
memory dumps as flat identity images (vmie_mem) and scan them all for an
IDA-style pattern, printing each hit as source:gpa. Two-pass (count, then size
the buffer exactly) so nothing is silently truncated.

Kept separate from vmie_cli rather than folded in as a subcommand: vmie_cli
demonstrates live win32 bring-up, this demonstrates the source-agnostic scan.
Its source includes only the public memmodel/sigscan/scan headers and names no
Windows symbol - it compiles against include/ alone.
2026-06-16 16:25:27 +03:00
lirent dc09d7f2a4 Keep the arch layer's prose consumer-agnostic; note the x86-64 binding
The generic address-space layer no longer names win32 in its comments: the
khalf_score and gva_translate doc-comments described themselves in terms of
their current Windows consumer, a downward coupling from the stable layer to a
specific, swappable one. Reworded to describe what each primitive does, not who
calls it. Also drop a dangling reference to the renamed engine handle.

State the contract's real scope in memmodel.h: OS-agnostic but architecture-
bound. The address-space key is the x86-64 CR3 (the PML4 base), shared by any
guest OS on x86-64 - CR3 is an ISA register, not a Windows concept; only its
per-process storage (DirectoryTableBase) is win32-specific and stays in the
win32 engine.
2026-06-15 12:07:43 +03:00
lirent 303a52527f Make the guest agent wait until ack; collapse the contract magics
The agent no longer self-terminates on a 120s deadline: a VMI host may attach
at any time, so the beacon now polls its ack flag indefinitely and exits only
once the host sets it. The fixed lifetime was an artifact, not a requirement.

The contract drops from three constants to one. The companion magic is derived
as the byte-reversed primary (__builtin_bswap64, folded to an immediate at -O2),
giving the same 16-byte beacon signature from a single source of truth; the host
acknowledges by echoing that value into the ack slot instead of carrying a
separate ack constant.
2026-06-15 09:02:55 +03:00
lirent 93966c3df2 Define the win32 engine; add a dump source and physical sigscan
Name and isolate the Windows engine as one of potentially several. The
public surface moves to include/win32.h with an opaque vmie_win32 handle
(vmie_win32_open/close/mem); the engine's Windows internals — host bring-up,
the struct-offset profile, process/module/PE/text decode — live under
src/engine/win32. The generic address-space layer stays in src/engine
(gva.c + engine-arch.h, carrying no offset table): gva.c is de-profiled, and
CR3 bring-up reaches the hot translator through a cold gva_translate bridge
so the zero-copy hot path stays private and inlinable.

A memory source is now first-class and public: vmie_mem_open/_open_segs/
_close open a flat dump (or an explicit segment map) as a vmie_mem, with
gpa_seg promoted to the public contract. The physical signature scan is
exposed source-agnostically: sig_scan_mem returns GPAs for any vmie_mem,
sig_scan_sources scans several sources with per-source attribution, and
sig_from_bytes builds an exact needle from a byte span. The pure matcher is
unchanged; dumps and the live engine image are scanned uniformly, neither
needing the other.
2026-06-15 08:20:50 +03:00
lirent b3441dd6f6 Split the library into CORE / ENGINE / HANDLERS layers
CORE (src/core): vmie_mem — guest-physical substrate with a data-driven
segment map (replaces the hardcoded 4 GiB PCI-hole topology). ENGINE
(src/engine): x86-64 paging + Windows bring-up; produces the generic memory
model. HANDLERS (src/handlers): the signature/value/pointer scanners, which
now consume an OS-agnostic contract.

Keystone: gva_ctx is split into vmie_mem (core) + vmie (engine); the generic
access functions take vmie_mem* + cr3 and no longer compile in the Windows
offset table. New public contract include/memmodel.h (vmie_mem, mem_view_t,
vregion, task, range, the gva_* access); win32 surface in include/vmie.h.
Leak relocations: the PE parser, UTF-16 decode and CR3-recovery heuristics
move engine-side; the matcher stays a pure, source-agnostic handler, and the
pointer scanner takes a generic range[] instead of reaching into the process
enumerator.
2026-06-15 02:57:46 +03:00
lirent 7c0995a4f2 Rename project w32ms -> vmi-engine
Library vmie (libvmie.a), CLI vmie_cli, guest agent vmie-startup.exe,
symbol prefix VMIE_ (header guards, the LTO build option). No behavior change.
2026-06-15 01:49:16 +03:00
lirent 4015e839eb Zero-copy hot path, correctness hardening
gva_ptr: leaf-bounded zero-copy guest reads. gva_sweep redesigned to drive
on it — large-page leaves are lent to the callback while 4K runs stay
buffered, and the run loop is guarded against wrap at the top of the address
space. gva_gpa fetches PTEs zero-copy; optional W32MS_LTO build option folds
the per-fetch call boundary (shipped -O2 default unchanged).

Correctness: subtract-form bounds check (no add overflow), memcpy decode in
place of type-punned wide loads, zero-init PDB name before compare,
PCI-hole-crossing range rejection, single-sourced VA_CANON and USER bounds.
hot/cold attributes audited across the translation and scan path.
2026-06-15 01:05:00 +03:00
lirent 1ec70b7ede Windows guest VMI core: host library, CLI, guest agent
Static library over a flat RW mmap of guest RAM: GPA/GVA paging walks,
beacon-driven bootstrap, dynamic struct-offset profiling, process and
module enumeration, a region map, and value/pointer/signature scanners on
a shared windowed sweep. Public API in include/; internals under src/.

Thin CLI demonstrator over the public API. Guest agent cross-compiled to
Windows x86-64 via mingw-w64. CMake: static library + CLI + guest target,
C17.
2026-06-14 21:47:56 +03:00