Allow physical rewrite without logical

During regular block writes ZFS sets both logical and physical
birth times equal to the current TXG.  During dedup and block
cloning logical birth time is still set to the current TXG, but
physical may be copied from the original block that was used.
This represents the fact that logically user data has changed,
but the physically it is the same old block.

But block rewrite introduces a new situation, when block is not
changed logically, but stored in a different place of the pool.
From ARC, scrub and some other perspectives this is a new block,
but for example for user applications or incremental replication
it is not.  Somewhat similar thing happen during remap phase of
device removal, but in that case space blocks are still acounted
as allocated at their logical birth times.

This patch introduces a new "rewrite" flag in the block pointer
structure, allowing to differentiate physical rewrite (when the
block is actually reallocated at the physical birth time) from
the device reval case (when the logical birth time is used).

The new functionality is not used at this point, and the only
expected change is that error log is now kept in terms of physical
physical birth times, rather than logical, since if a block with
logged error was somehow rewritten, then the previous error does
not matter any more.

This change also introduces a new TRAVERSE_LOGICAL flag to the
traverse code, allowing zfs send, redact and diff to work in
context of logical birth times, ignoring physical-only rewrites.
It also changes nothing at this point due to lack of those writes,
but they will come in a following patch.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17565
This commit is contained in:
Alexander Motin
2025-07-17 12:50:54 -04:00
committed by Brian Behlendorf
parent 894edd084e
commit 4ae8bf406b
29 changed files with 205 additions and 144 deletions
+18 -4
View File
@@ -5603,7 +5603,21 @@ remap_blkptr_cb(uint64_t inner_offset, vdev_t *vd, uint64_t offset,
vdev_indirect_births_t *vib = oldvd->vdev_indirect_births;
uint64_t physical_birth = vdev_indirect_births_physbirth(vib,
DVA_GET_OFFSET(&bp->blk_dva[0]), DVA_GET_ASIZE(&bp->blk_dva[0]));
BP_SET_PHYSICAL_BIRTH(bp, physical_birth);
/*
* For rewritten blocks, use the old physical birth as the new logical
* birth (representing when the space was allocated) and the removal
* time as the new physical birth (representing when it was actually
* written).
*/
if (BP_GET_REWRITE(bp)) {
uint64_t old_physical_birth = BP_GET_PHYSICAL_BIRTH(bp);
ASSERT3U(old_physical_birth, <, physical_birth);
BP_SET_BIRTH(bp, old_physical_birth, physical_birth);
BP_SET_REWRITE(bp, 0);
} else {
BP_SET_PHYSICAL_BIRTH(bp, physical_birth);
}
DVA_SET_VDEV(&bp->blk_dva[0], vd->vdev_id);
DVA_SET_OFFSET(&bp->blk_dva[0], offset);
@@ -5972,7 +5986,7 @@ metaslab_alloc_range(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
int error = 0;
ASSERT0(BP_GET_LOGICAL_BIRTH(bp));
ASSERT0(BP_GET_PHYSICAL_BIRTH(bp));
ASSERT0(BP_GET_RAW_PHYSICAL_BIRTH(bp));
spa_config_enter(spa, SCL_ALLOC, FTAG, RW_READER);
@@ -6034,7 +6048,7 @@ metaslab_free(spa_t *spa, const blkptr_t *bp, uint64_t txg, boolean_t now)
int ndvas = BP_GET_NDVAS(bp);
ASSERT(!BP_IS_HOLE(bp));
ASSERT(!now || BP_GET_LOGICAL_BIRTH(bp) >= spa_syncing_txg(spa));
ASSERT(!now || BP_GET_BIRTH(bp) >= spa_syncing_txg(spa));
/*
* If we have a checkpoint for the pool we need to make sure that
@@ -6052,7 +6066,7 @@ metaslab_free(spa_t *spa, const blkptr_t *bp, uint64_t txg, boolean_t now)
* normally as they will be referenced by the checkpointed uberblock.
*/
boolean_t checkpoint = B_FALSE;
if (BP_GET_LOGICAL_BIRTH(bp) <= spa->spa_checkpoint_txg &&
if (BP_GET_BIRTH(bp) <= spa->spa_checkpoint_txg &&
spa_syncing_txg(spa) > spa->spa_checkpoint_txg) {
/*
* At this point, if the block is part of the checkpoint