Add a light x86-64 decoder; back code-xref with it

The reversing keystone: a length-disassembly decoder with control-flow and
RIP-relative target extraction (x86dec.h), pure over a byte buffer - no vmie_mem,
no cr3, no Windows. Table-driven length over the 1-byte / 0F / 0F38 / 0F3A maps,
legacy + REX + VEX prefixes, ModRM/SIB, displacements and immediates (66 and
REX.W operand-size aware). It reports the instruction length plus the rel and
RIP-relative targets of near call/jmp/jcc and any RIP-relative memory operand.
EVEX is a documented gap (decodes as length 0). This is the primitive the rest
of the static-reversing layer builds on (function inventory, call graph, xref).

gva_code_xref now brute-scans with the decoder instead of its own ad-hoc E8/E9
and REX.W-lea heuristic, which is removed - one decoder in the tree. Because a
brute scan can re-enter a prefixed instruction one byte in and decode a shorter
aliased form with the same target, the scan drops a match that starts inside the
extent of an already-accepted one; real, non-overlapping instructions are
unaffected.
This commit is contained in:
2026-06-16 18:11:29 +03:00
parent c36ffe295d
commit 3199fbf258
5 changed files with 560 additions and 52 deletions
+10 -8
View File
@@ -70,14 +70,16 @@ int gva_sig_scan_multi(vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
uint32_t prot_any, const sigset* s,
sig_multi_hit* out, int max);
/* code-xref: every instruction in the X-regions of [lo,hi] whose rel32 operand
* targets `target_va`. Heuristic decoder (NOT a full disassembler): recognizes
* E8 call / E9 jmp (next_rip + disp32) and the RIP-relative ModRM forms
* (mod=00, rm=101) of lea/mov (REX.W 8D / 8B) where target = next_rip +
* (int32)disp. Records each matching instruction-start VA. The sweep forces
* VR_X and carries a >=15-byte overlap (max x86 instruction length) so no
* instruction is cut at a window seam. Writes up to `max` VAs to `out` (NULL to
* count only) and returns the TOTAL number of matches, or -1 on bad input. */
/* code-xref: every instruction in the X-regions of [lo,hi] whose near rel
* branch or RIP-relative memory operand resolves to `target_va`. Brute-scans
* each byte offset with the light x86-64 decoder (x86dec.h, NOT a full
* disassembler): an E8/E9/EB/Jcc rel branch matches when next_rip + rel ==
* target_va, and any RIP-relative operand (ModRM mod=00, rm=101) matches when
* next_rip + disp32 == target_va (this covers lea/mov and any other rip-rel
* form). Records each matching instruction-start VA. The sweep forces VR_X and
* carries a >=15-byte overlap (max x86 instruction length) so no instruction is
* cut at a window seam. Writes up to `max` VAs to `out` (NULL to count only) and
* returns the TOTAL number of matches, or -1 on bad input. */
int gva_code_xref(vmie_mem* m, uintptr_t cr3, uint64_t lo, uint64_t hi,
uint64_t target_va, uint64_t* out, int max);