Multiple DVA Scrubbing Fix

Currently, there is an issue in the sequential scrub code which
prevents self healing from working in some cases. The scrub code
will split up all DVA copies of a bp and issue each of them
separately. The problem is that, since each of the DVAs is no
longer associated with the others, the self healing code doesn't
have the opportunity to repair problems that show up in one of the
DVAs with the data from the others.

This patch fixes this issue by ensuring that all IOs issued by the
sequential scrub code include all DVAs. Initially, only the first
DVA of each is attempted. If an issue arises, the IO is retried
with all available copies, giving the self healing code a chance
to correct the issue.

To test this change, this patch also adds the ability for zinject
to specify individual DVAs to inject read errors into. We then
add a new test case that utilizes this functionality to ensure
scrubs and self-healing reads can handle and transparently fix
issues with individual copies of blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8453
This commit is contained in:
Tom Caputi
2019-03-15 17:14:31 -04:00
committed by Brian Behlendorf
parent 2bbec1c910
commit ab7615d92c
10 changed files with 415 additions and 95 deletions
+25
View File
@@ -254,10 +254,35 @@ vdev_mirror_map_init(zio_t *zio)
if (vd == NULL) {
dva_t *dva = zio->io_bp->blk_dva;
spa_t *spa = zio->io_spa;
dsl_scan_t *scn = spa->spa_dsl_pool->dp_scan;
dva_t dva_copy[SPA_DVAS_PER_BP];
c = BP_GET_NDVAS(zio->io_bp);
/*
* The sequential scrub code sorts and issues all DVAs
* of a bp separately. Each of these IOs includes all
* original DVA copies so that repairs can be performed
* in the event of an error, but we only actually want
* to check the first DVA since the others will be
* checked by their respective sorted IOs. Only if we
* hit an error will we try all DVAs upon retrying.
*
* Note: This check is safe even if the user switches
* from a legacy scrub to a sequential one in the middle
* of processing, since scn_is_sorted isn't updated until
* all outstanding IOs from the previous scrub pass
* complete.
*/
if ((zio->io_flags & ZIO_FLAG_SCRUB) &&
!(zio->io_flags & ZIO_FLAG_IO_RETRY) &&
dsl_scan_scrubbing(spa->spa_dsl_pool) &&
scn->scn_is_sorted) {
c = 1;
} else {
c = BP_GET_NDVAS(zio->io_bp);
}
/*
* If we do not trust the pool config, some DVAs might be
* invalid or point to vdevs that do not exist. We skip them.