Illumos 4390 - I/O errors can corrupt space map when deleting fs/vol

4390 i/o errors when deleting filesystem/zvol can lead to space map corruption
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/4390
  https://github.com/illumos/illumos-gate/commit/7fd05ac

Porting notes:

Previous stack-reduction efforts in traverse_visitb() caused a fair
number of un-mergable pieces of code.  This patch should reduce its
stack footprint a bit more.

The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated
using kmem_zalloc() for the purpose of stack reduction.

The new global zfs_free_leak_on_eio has been defined as an integer
rather than a boolean_t as was the case with the related zfs_recover
global.  Also, zfs_free_leak_on_eio's definition has been inserted into
zfs_debug.c for consistency with the existing definition of zfs_recover.
Illumos placed it in spa_misc.c.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2545
This commit is contained in:
Matthew Ahrens
2014-06-05 13:20:08 -08:00
committed by Brian Behlendorf
parent 9b67f60560
commit fbeddd60b7
17 changed files with 339 additions and 157 deletions
+37
View File
@@ -696,6 +696,43 @@ Set additional debugging flags
Default value: \fB1\fR.
.RE
.sp
.ne 2
.na
\fBzfs_free_leak_on_eio\fR (int)
.ad
.RS 12n
If destroy encounters an EIO while reading metadata (e.g. indirect
blocks), space referenced by the missing metadata can not be freed.
Normally this causes the background destroy to become "stalled", as
it is unable to make forward progress. While in this stalled state,
all remaining space to free from the error-encountering filesystem is
"temporarily leaked". Set this flag to cause it to ignore the EIO,
permanently leak the space from indirect blocks that can not be read,
and continue to free everything else that it can.
The default, "stalling" behavior is useful if the storage partially
fails (i.e. some but not all i/os fail), and then later recovers. In
this case, we will be able to continue pool operations while it is
partially failed, and when it recovers, we can continue to free the
space, with no leaks. However, note that this case is actually
fairly rare.
Typically pools either (a) fail completely (but perhaps temporarily,
e.g. a top-level vdev going offline), or (b) have localized,
permanent errors (e.g. disk returns the wrong data due to bit flip or
firmware bug). In case (a), this setting does not matter because the
pool will be suspended and the sync thread will not be able to make
forward progress regardless. In case (b), because the error is
permanent, the best we can do is leak the minimum amount of space,
which is what setting this flag will do. Therefore, it is reasonable
for this flag to normally be set, but we chose the more conservative
approach of not setting it, so that there is no possibility of
leaking space in the "partial temporary" failure case.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na