mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-23 19:04:45 +03:00
draid: allow seq resilver reads from degraded vdevs
When sequentially resilvering allow a dRAID child to be read as long as the DTLs indicate it should have a good copy of the data and the leaf isn't being rebuilt. The previous check was slightly too broad and would skip dRAID spare and replacing vdevs if one of their children was being replaced. As long as there exists enough additional redundancy this is fine, but when there isn't this vdev must be read in order to correctly reconstruct the missing data. A new test case has been added which exhausts the available redundancy, faults another device causing it to be degraded, and then performs a sequential resilver for the degraded device. In such a situation enough redundancy exists to perform the replacement and a scrub should detect no checksum errors. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #18405
This commit is contained in:
committed by
Tony Hutter
parent
63b8da8ff7
commit
e9a8c6e080
+9
-28
@@ -1191,7 +1191,7 @@ vdev_draid_min_alloc(vdev_t *vd)
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if the txg range does not exist on any leaf vdev.
|
||||
* Returns false if the txg range exists on any leaf vdev, true otherwise.
|
||||
*
|
||||
* A dRAID spare does not fit into the DTL model. While it has child vdevs
|
||||
* there is no redundancy among them, and the effective child vdev is
|
||||
@@ -1932,34 +1932,15 @@ vdev_draid_io_start_read(zio_t *zio, raidz_row_t *rr)
|
||||
vdev_t *svd;
|
||||
|
||||
/*
|
||||
* Sequential rebuilds need to always consider the data
|
||||
* on the child being rebuilt to be stale. This is
|
||||
* important when all columns are available to aid
|
||||
* known reconstruction in identifing which columns
|
||||
* contain incorrect data.
|
||||
*
|
||||
* Furthermore, all repairs need to be constrained to
|
||||
* the devices being rebuilt because without a checksum
|
||||
* we cannot verify the data is actually correct and
|
||||
* performing an incorrect repair could result in
|
||||
* locking in damage and making the data unrecoverable.
|
||||
* Repairs need to be constrained to the devices being
|
||||
* rebuilt since without a checksum we cannot verify the
|
||||
* data is actually correct and performing an incorrect
|
||||
* repair could result in locking in the damage and
|
||||
* making the data unrecoverable.
|
||||
*/
|
||||
if (zio->io_priority == ZIO_PRIORITY_REBUILD) {
|
||||
if (vdev_draid_rebuilding(cvd)) {
|
||||
if (c >= rr->rr_firstdatacol)
|
||||
rr->rr_missingdata++;
|
||||
else
|
||||
rr->rr_missingparity++;
|
||||
rc->rc_error = SET_ERROR(ESTALE);
|
||||
rc->rc_skipped = 1;
|
||||
rc->rc_allow_repair = 1;
|
||||
continue;
|
||||
} else {
|
||||
rc->rc_allow_repair = 0;
|
||||
}
|
||||
} else {
|
||||
rc->rc_allow_repair = 1;
|
||||
}
|
||||
if (zio->io_priority == ZIO_PRIORITY_REBUILD &&
|
||||
!vdev_draid_rebuilding(cvd))
|
||||
rc->rc_allow_repair = 0;
|
||||
|
||||
/*
|
||||
* If this child is a distributed spare then the
|
||||
|
||||
@@ -674,9 +674,14 @@ vdev_mirror_io_start(zio_t *zio)
|
||||
|
||||
/*
|
||||
* When sequentially resilvering only issue write repair
|
||||
* IOs to the vdev which is being rebuilt since performance
|
||||
* is limited by the slowest child. This is an issue for
|
||||
* faster replacement devices such as distributed spares.
|
||||
* IOs to the vdev which is being rebuilt for two reasons:
|
||||
* 1. The repair IO data calculated from parity has no checksum
|
||||
* to validate and could be incorrect. Existing data must
|
||||
* never be overwritten with unconfirmed data to ensure we
|
||||
* never lock in unrecoverable damage to the pool.
|
||||
* 2. Performance is limited by the slowest child device. We
|
||||
* don't want a slower device to limit the rebuild rate for
|
||||
* faster replacement devices such as distributed spares.
|
||||
*/
|
||||
if ((zio->io_priority == ZIO_PRIORITY_REBUILD) &&
|
||||
(zio->io_flags & ZIO_FLAG_IO_REPAIR) &&
|
||||
|
||||
Reference in New Issue
Block a user