Extend deadman logic

The intent of this patch is extend the existing deadman code such that it's flexible enough to be used by both ztest and on production systems. The proposed changes include: * Added a new `zfs_deadman_failmode` module option which is used to dynamically control the behavior of the deadman. It's loosely modeled after, but independant from, the pool failmode property. It can be set to wait, continue, or panic. * wait - Wait for the "hung" I/O (default) * continue - Attempt to recover from a "hung" I/O * panic - Panic the system * Added a new `zfs_deadman_ziotime_ms` module option which is analogous to `zfs_deadman_synctime_ms` except instead of applying to a pool TXG sync it applies to zio_wait(). A default value of 300s is used to define a "hung" zio. * The ztest deadman thread has been re-enabled by default, aligned with the upstream OpenZFS code, and then extended to terminate the process when it takes significantly longer to complete than expected. * The -G option was added to ztest to print the internal debug log when a fatal error is encountered. This same option was previously added to zdb in commit fa603f82. Update zloop.sh to unconditionally pass -G to obtain additional debugging. * The FM_EREPORT_ZFS_DELAY event which was previously posted when the deadman detect a "hung" pool has been replaced by a new dedicated FM_EREPORT_ZFS_DEADMAN event. * The proposed recovery logic attempts to restart a "hung" zio by calling zio_interrupt() on any outstanding leaf zios. We may want to further restrict this to zios in either the ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages. Calling zio_interrupt() is expected to only be useful for cases when an IO has been submitted to the physical device but for some reasonable the completion callback hasn't been called by the lower layers. This shouldn't be possible but has been observed and may be caused by kernel/driver bugs. * The 'zfs_deadman_synctime_ms' default value was reduced from 1000s to 600s. * Depending on how ztest fails there may be no cache file to move. This should not be considered fatal, collect the logs which are available and carry on. * Add deadman test cases for spa_deadman() and zio_wait(). * Increase default zfs_deadman_checktime_ms to 60s. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6999
2026-05-22 02:27:36 +03:00 · 2017-12-18 17:06:07 -05:00
parent 1b18c6d791
commit 8fb1ede146
21 changed files with 583 additions and 58 deletions
@@ -129,6 +129,10 @@ Total test run time.
 .BI "\-z" " zil_failure_rate" " (default: fail every 2^5 allocs)
 .IP
 Injected failure rate.
+.HP
+.BI "\-G"
+.IP
+Dump zfs_dbgmsg buffer before exiting.
 .SH "EXAMPLES"
 .LP
 To override /tmp as your location for block files, you can use the -f
@@ -55,7 +55,7 @@ part here.
 \fBchecksum\fR
 .ad
 .RS 12n
-Issued when a checksum error have been detected.
+Issued when a checksum error has been detected.
 .RE

 .sp
@@ -76,14 +76,27 @@ Issued when there is an I/O error in a vdev in the pool.
 Issued when there have been data errors in the pool.
 .RE

+.sp
+.ne 2
+.na
+\fBdeadman\fR
+.ad
+.RS 12n
+Issued when an I/O is determined to be "hung", this can be caused by lost
+completion events due to flaky hardware or drivers.  See the
+\fBzfs_deadman_failmode\fR module option description for additional
+information regarding "hung" I/O detection and configuration.
+.RE
+
 .sp
 .ne 2
 .na
 \fBdelay\fR
 .ad
 .RS 12n
-Issued when an I/O was slow to complete as defined by the zio_delay_max module
-option.
+Issued when a completed I/O exceeds the maximum allowed time specified
+by the \fBzio_delay_max\fR module option.  This can be an indicator of
+problems with the underlying storage device.
 .RE

 .sp
@@ -823,14 +823,36 @@ Default value: \fB0\fR.
 .ad
 .RS 12n
 When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
-milliseconds, a "slow spa_sync" message is logged to the debug log
-(see \fBzfs_dbgmsg_enable\fR).  If \fBzfs_deadman_enabled\fR is set,
-all pending IO operations are also checked and if any haven't completed
-within \fBzfs_deadman_synctime_ms\fR milliseconds, a "SLOW IO" message
-is logged to the debug log and a "delay" system event with the details of
-the hung IO is posted.
+milliseconds, or when an individual I/O takes longer than
+\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
+be "hung".  If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
+invoked as described by the \fBzfs_deadman_failmode\fR module option.
+By default the deadman is enabled and configured to \fBwait\fR which results
+in "hung" I/Os only being logged.  The deadman is automatically disabled
+when a pool gets suspended.
 .sp
-Use \fB1\fR (default) to enable the slow IO check and \fB0\fR to disable.
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_failmode\fR (charp)
+.ad
+.RS 12n
+Controls the failure behavior when the deadman detects a "hung" I/O.  Valid
+values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
+.sp
+\fBwait\fR - Wait for a "hung" I/O to complete.  For each "hung" I/O a
+"deadman" event will be posted describing that I/O.
+.sp
+\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
+to the I/O pipeline if possible.
+.sp
+\fBpanic\fR - Panic the system.  This can be used to facilitate an automatic
+fail-over to a properly configured fail-over partner.
+.sp
+Default value: \fBwait\fR.
 .RE

 .sp
@@ -839,11 +861,10 @@ Use \fB1\fR (default) to enable the slow IO check and \fB0\fR to disable.
 \fBzfs_deadman_checktime_ms\fR (int)
 .ad
 .RS 12n
-Once a pool sync operation has taken longer than
-\fBzfs_deadman_synctime_ms\fR milliseconds, continue to check for slow
-operations every \fBzfs_deadman_checktime_ms\fR milliseconds.
+Check time in milliseconds. This defines the frequency at which we check
+for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
 .sp
-Default value: \fB5,000\fR.
+Default value: \fB60,000\fR.
 .RE

 .sp
@@ -853,12 +874,25 @@ Default value: \fB5,000\fR.
 .ad
 .RS 12n
 Interval in milliseconds after which the deadman is triggered and also
-the interval after which an IO operation is considered to be "hung"
-if \fBzfs_deadman_enabled\fR is set.
-
-See \fBzfs_deadman_enabled\fR.
+the interval after which a pool sync operation is considered to be "hung".
+Once this limit is exceeded the deadman will be invoked every
+\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
 .sp
-Default value: \fB1,000,000\fR.
+Default value: \fB600,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_ziotime_ms\fR (ulong)
+.ad
+.RS 12n
+Interval in milliseconds after which the deadman is triggered and an
+individual IO operation is considered to be "hung".  As long as the I/O
+remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
+milliseconds until the I/O completes.
+.sp
+Default value: \fB300,000\fR.
 .RE

 .sp