mirror_zfs/module
Matthew Ahrens 62d4287f27
RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489 
Closes #11510
2021-01-26 16:05:05 -08:00
..
avl Links in Source Files 2020-09-02 09:42:12 -07:00
icp Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
lua lua: avoid gcc -Wreturn-local-addr bug 2020-12-15 09:20:48 -08:00
nvpair Links in Source Files 2020-09-02 09:42:12 -07:00
os spl-taskq: Make sure thread tsd hash entry is cleared 2021-01-25 11:18:28 -08:00
spl Cleanup linux module kbuild files 2020-06-10 09:24:15 -07:00
unicode Throw const on some strings 2020-10-02 17:44:10 -07:00
zcommon Linux 5.10 compat: use iov_iter in uio structure 2020-12-18 08:48:26 -08:00
zfs RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks 2021-01-26 16:05:05 -08:00
zstd Optimize locking checks in mempool allocator 2020-11-02 12:10:07 -08:00
.gitignore Cleanup linux module kbuild files 2020-06-10 09:24:15 -07:00
Kbuild.in Add zstd support to zfs 2020-08-20 10:30:06 -07:00
Makefile.bsd FreeBSD: decouple ZFS_DEBUG from kernel debug settings 2020-11-24 09:16:46 -08:00
Makefile.in Fix Linux modules uninstall 2020-10-08 20:07:10 -07:00