gva_ptr: leaf-bounded zero-copy guest reads. gva_sweep redesigned to drive
on it — large-page leaves are lent to the callback while 4K runs stay
buffered, and the run loop is guarded against wrap at the top of the address
space. gva_gpa fetches PTEs zero-copy; optional W32MS_LTO build option folds
the per-fetch call boundary (shipped -O2 default unchanged).
Correctness: subtract-form bounds check (no add overflow), memcpy decode in
place of type-punned wide loads, zero-init PDB name before compare,
PCI-hole-crossing range rejection, single-sourced VA_CANON and USER bounds.
hot/cold attributes audited across the translation and scan path.