draid: fix cksum errors after rebuild with degraded disks

Currently, when more than nparity disks get faulted during the rebuild, only first nparity disks would go to faulted state, and all the remaining disks would go to degraded state. When a hot spare is attached to that degraded disk for rebuild creating the spare mirror, only that hot spare is getting rebuilt, but not the degraded device. So when later during scrub some other attached draid spare happens to map to that spare, it will end up with cksum error. Moreover, if the user clears the degraded disk from errors, the data won't be resilvered to it, hot spare will be detached almost immediately and the data that was resilvered only to it will be lost. Solution: write to all mirrored devices during rebuild, similar to traditional/healing resilvering, but only if we can verify the integrity of the data, or when it's the draid spare we are writing to, in which case we are writing to a reserved spare space, and there is no danger to overwrite any good data. The argument that writing only to rebuilding draid spare vdev is faster than writing to normal device doesn't hold since, at a specific offset being rebuilt, draid spare will be mapped to a normal device anyway. redundancy_draid_degraded2 automation test is added also to cover the scenario. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com> Closes #18414
2026-05-24 11:18:52 +03:00 · 2026-04-15 22:48:00 +01:00
parent eec8b9b929
commit da44040bbb
8 changed files with 235 additions and 19 deletions
@@ -23,6 +23,7 @@
 * Copyright (c) 2018 Intel Corporation.
 * Copyright (c) 2020 by Lawrence Livermore National Security, LLC.
 * Copyright (c) 2025, Klara, Inc.
+ * Copyright (c) 2026, Wasabi Technologies, Inc.
 */

 #include <sys/zfs_context.h>
@@ -1249,8 +1250,7 @@ vdev_draid_missing(vdev_t *vd, uint64_t physical_offset, uint64_t txg,
 		if (vd == NULL)
 			return (B_TRUE);

-		return (vdev_draid_missing(vd, physical_offset,
-		    txg, size));
+		return (vdev_draid_missing(vd, physical_offset, txg, size));
 	}

 	return (vdev_dtl_contains(vd, DTL_MISSING, txg, size));
@@ -1909,12 +1909,34 @@ vdev_draid_io_start_read(zio_t *zio, raidz_row_t *rr)
 		}

 		if (vdev_draid_missing(cvd, rc->rc_offset, zio->io_txg, 1)) {
+			vdev_t *svd;
+
 			if (c >= rr->rr_firstdatacol)
 				rr->rr_missingdata++;
 			else
 				rr->rr_missingparity++;
 			rc->rc_error = SET_ERROR(ESTALE);
 			rc->rc_skipped = 1;
+
+			/*
+			 * If this child has draid spare attached, and that
+			 * spare by rc_offset maps to another spare, the repair
+			 * would go to that spare, and we want all mirrored
+			 * children on it to be updated with the repaired data,
+			 * even when we cannot vouch for it during rebuilds
+			 * (which don't have checksums). Otherwise, we will have
+			 * a lot of checksum errors on that spares during scrub.
+			 * The worst thing that can happen in this case is that
+			 * we will update the reserved spare column on some
+			 * device with unverified data, which is harmless.
+			 */
+			if ((svd = vdev_draid_find_spare(cvd)) != NULL) {
+				svd = vdev_draid_spare_get_child(svd,
+				    rc->rc_offset);
+				if (svd && (svd->vdev_ops == &vdev_spare_ops ||
+				    svd->vdev_ops == &vdev_replacing_ops))
+					rc->rc_tgt_is_dspare = 1;
+			}
 			continue;
 		}