mirror_zfs/module
Robert Evans fdc683e863 dnode_next_offset: backtrack if lower level does not match
This changes the basic search algorithm from a single search up and down
the tree to a full depth-first traversal to handle conditions where the
tree matches at a higher level but not a lower level.

Normally higher level blocks always point to matching blocks, but there
are cases where this does not happen:

1. Racing block pointer updates from dbuf_write_ready.

   Before f664f1ee7f (#8946), both dbuf_write_ready and
   dnode_next_offset held dn_struct_rwlock which protected against
   pointer writes from concurrent syncs.

   This no longer applies, so sync context can f.e. clear or fill all
   L1->L0 BPs before the L2->L1 BP and higher BP's are updated.

   dnode_free_range in particular can reach this case and skip over L1
   blocks that need to be dirtied. Later, sync will panic in
   free_children when trying to clear a non-dirty indirect block.

   This case was found with ztest.

2. txg > 0, non-hole case. This is #11196.

   Freeing blocks/dnodes breaks the assumption that a match at a higher
   level implies a match at a lower level when filtering txg > 0.

   Whenever some but not all L0 blocks are freed, the parent L1 block is
   rewritten. Its updated L2->L1 BP reflects a newer birth txg.

   Later when searching by txg, if the L1 block matches since the txg is
   newer, it is possible that none of the remaining L1->L0 BPs match if
   none have been updated.

   The same behavior is possible with dnode search at L0.

   This is reachable from dsl_destroy_head for synchronous freeing.
   When this happens open context fails to free objects leaving sync
   context stuck freeing potentially many objects.

   This is also reachable from traverse_pool for extreme rewind where it
   is theoretically possible that datasets not dirtied after txg are
   skipped if the MOS has high enough indirection to trigger this case.

In both of these cases, without backtracking the search ends prematurely
as ESRCH result implies no more matches in the entire object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16025
Closes #11196
2025-10-21 11:31:06 -07:00
..
avl Suppress Clang Static Analyzer false positive in the AVL tree code. 2023-03-08 13:51:21 -08:00
icp Linux build: silence objtool warnings 2025-06-04 17:40:56 -07:00
lua lua: add flex array field to TString type 2024-11-05 15:43:52 -08:00
nvpair xdr: header cleanup 2024-04-29 13:50:05 -07:00
os [zfs-2.2.9] Fix zpl_ctldir.c checkstyle 2025-10-20 14:56:20 -07:00
unicode Illumos #15286: do_composition() needs sign awareness 2023-01-05 11:16:21 -08:00
zcommon GCC 15: Fix unterminated-string-initialization (#17244) 2025-05-27 15:03:11 -07:00
zfs dnode_next_offset: backtrack if lower level does not match 2025-10-21 11:31:06 -07:00
zstd Resolve WS-2021-0184 vulnerability in zstd 2023-02-02 15:12:51 -08:00
.gitignore FreeBSD: Ignore symlink to i386 includes 2022-08-02 16:34:23 -07:00
Kbuild.in Linux build: always use objtool 2025-05-30 15:08:01 -07:00
Makefile.bsd zfs_znode: lift common code to a single shared file 2024-11-15 10:15:01 -08:00
Makefile.in Linux build: handle CONFIG_OBJTOOL_WERROR=y 2025-10-16 16:50:20 -07:00