Verify dRAID empty sectors

Verify that all empty sectors are zero filled before using them to
calculate parity.  Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero.  This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.

This issue has been addressed by updating raidz_parity_verify() to
check that all dRAID empty sectors are zero filled.  Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged.  They can then be safely used to verify the parity.

This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub.  However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data checksum
is valid.  Checksum errors will be reported in the `zpool status`
output for any repairs which are made.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12857
This commit is contained in:
Brian Behlendorf
2022-01-04 16:46:32 -08:00
committed by Tony Hutter
parent 5d8c081193
commit 6575defc52
5 changed files with 67 additions and 10 deletions
@@ -28,11 +28,12 @@
# in zpool status.
#
# STRATEGY:
# 1. Create a raidz or mirror pool
# 1. Create a mirror, raidz, or draid pool
# 2. Inject read/write IO errors or checksum errors
# 3. Verify the number of errors in zpool status match the corresponding
# number of error events.
# 4. Repeat for all combinations of raidz/mirror and io/checksum errors.
# 4. Repeat for all combinations of mirror/raidz/draid and io/checksum
# errors.
#
. $STF_SUITE/include/libtest.shlib
@@ -74,7 +75,7 @@ log_must mkdir -p $MOUNTDIR
# Run error test on a specific type of pool
#
# $1: pool - raidz, mirror
# $1: pool - mirror, raidz, draid
# $2: test type - corrupt (checksum error), io
# $3: read, write
function do_test
@@ -142,8 +143,8 @@ function do_test
log_must zpool destroy $POOL
}
# Test all types of errors on mirror and raidz pools
for pooltype in mirror raidz ; do
# Test all types of errors on mirror, raidz, and draid pools
for pooltype in mirror raidz draid; do
do_test $pooltype corrupt read
do_test $pooltype io read
do_test $pooltype io write