Prevent range tree corruption race by updating dnode_sync()

Switch to incremental range tree processing in dnode_sync() to avoid
unsafe lock dropping during zfs_range_tree_walk(). This also ensures
the free ranges remain visible to dnode_block_freed() throughout the
sync process, preventing potential stale data reads.

This patch:
 - Keeps the range tree attached during processing for visibility.
 - Processes segments one-by-one by restarting from the tree head.
 - Uses zfs_range_tree_clear() to safely handle ranges that may have
   been modified while the lock was dropped.
 - adds ASSERT()s to document that we don't expect dn_free_ranges
   modification outside of sync context.

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Issue #18186
Closes #18235
This commit is contained in:
Alek P
2026-03-23 21:34:19 -04:00
committed by Tony Hutter
parent b06caaeec4
commit 7590972f76
4 changed files with 87 additions and 45 deletions
+12
View File
@@ -2206,6 +2206,17 @@ dbuf_dirty_lightweight(dnode_t *dn, uint64_t blkid, dmu_tx_t *tx)
mutex_enter(&dn->dn_mtx);
int txgoff = tx->tx_txg & TXG_MASK;
/*
* Assert that we are not modifying the range tree for the syncing
* TXG from a non-syncing thread. We verify that the tx's
* transaction group is strictly newer than the one currently
* syncing (meaning we are in open context). If this triggers,
* it indicates a race where syncing dn_free_range tree is
* being modified while dnode_sync() may be iterating over it.
*/
ASSERT(tx->tx_txg > spa_syncing_txg(dn->dn_objset->os_spa));
if (dn->dn_free_ranges[txgoff] != NULL) {
zfs_range_tree_clear(dn->dn_free_ranges[txgoff], blkid, 1);
}
@@ -2393,6 +2404,7 @@ dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx)
db->db_blkid != DMU_SPILL_BLKID) {
mutex_enter(&dn->dn_mtx);
if (dn->dn_free_ranges[txgoff] != NULL) {
FREE_RANGE_VERIFY(tx, dn);
zfs_range_tree_clear(dn->dn_free_ranges[txgoff],
db->db_blkid, 1);
}