mirror_zfs/tests/zfs-tests/tests/functional
Brian Behlendorf 8fb1ede146 Extend deadman logic
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f82.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6999
2018-01-25 13:40:38 -08:00
..
acl Fix shellcheck v0.4.6 warnings 2018-01-17 10:17:16 -08:00
atime OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
bootfs Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
cache Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
cachefile Remove vn_rename and vn_remove dependency 2017-10-19 10:06:55 -07:00
casenorm OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
chattr Fix chattr_001_pos 2017-07-07 15:45:29 -07:00
checksum Support -fsanitize=address with --enable-asan 2018-01-10 10:49:27 -08:00
clean_mirror Implemented zpool sync command 2017-05-19 12:33:11 -07:00
cli_root Support -fsanitize=address with --enable-asan 2018-01-10 10:49:27 -08:00
cli_user Handle broken pipes in arc_summary 2017-12-19 13:19:24 -08:00
compression OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
ctime Add configure option to enable gcov analysis 2017-09-15 10:24:13 -07:00
deadman Extend deadman logic 2018-01-25 13:40:38 -08:00
delegate Enable remaining tests 2017-05-22 12:34:32 -04:00
devices Enable remaining tests 2017-05-22 12:34:32 -04:00
events Sequential scrub and resilvers 2017-11-15 17:27:01 -08:00
exec Enable remaining tests 2017-05-22 12:34:32 -04:00
fault Various ZED fixes 2017-12-08 16:58:41 -08:00
features Fix ARC behavior on 32-bit systems 2017-10-10 15:19:19 -07:00
grow_pool Fix ZTS grow_pool/setup 2017-08-15 16:40:04 -07:00
grow_replicas Enable remaining tests 2017-05-22 12:34:32 -04:00
history Disable history_004_pos 2018-01-10 10:41:30 -08:00
hkdf Support -fsanitize=address with --enable-asan 2018-01-10 10:49:27 -08:00
inheritance OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
inuse Enable remaining tests 2017-05-22 12:34:32 -04:00
large_files Enable remaining tests 2017-05-22 12:34:32 -04:00
largest_pool Enable remaining tests 2017-05-22 12:34:32 -04:00
libzfs Add libtpool (thread pools) 2017-08-09 15:31:08 -07:00
link_count Enable remaining tests 2017-05-22 12:34:32 -04:00
migration OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
mmap Enable remaining tests 2017-05-22 12:34:32 -04:00
mmp Fix multihost stale cache file import 2017-12-18 10:28:27 -08:00
mount Enable remaining tests 2017-05-22 12:34:32 -04:00
mv_files OpenZFS 7629 - Fix for 7290 neglected to remove some escape sequences 2017-04-07 09:30:05 -07:00
nestedfs OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
no_space Fix size inflation in spa_get_worst_case_asize() 2017-04-10 15:28:21 -07:00
nopwrite Enable remaining tests 2017-05-22 12:34:32 -04:00
online_offline Enable remaining tests 2017-05-22 12:34:32 -04:00
pool_names Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
poolversion Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
privilege ZTS: replace su commands by run_user function 2017-07-05 10:46:52 -07:00
quota OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
raidz OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
redundancy Enable remaining tests 2017-05-22 12:34:32 -04:00
refquota OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
refreserv Relax (ref)reservation constraints on ZVOLs 2017-09-12 11:33:22 -07:00
rename_dirs Enable remaining tests 2017-05-22 12:34:32 -04:00
replacement OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
reservation Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
rootpool OpenZFS 8076 - zfs-tests suite fails rootpool_002_neg 2017-05-25 17:29:08 -07:00
rsend Fix intra-pool resumable 'zfs send -t <token>' 2017-10-10 15:22:05 -07:00
scrub_mirror OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
slog OpenZFS 8909 - 8585 can cause a use-after-free kernel panic 2017-12-28 10:18:04 -08:00
snapshot Fix some ZFS Test Suite issues 2017-09-25 10:32:34 -07:00
snapused OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
sparse OpenZFS 7290 - ZFS test suite needs to control what utilities it can run 2017-04-06 09:25:36 -07:00
threadsappend Enable remaining tests 2017-05-22 12:34:32 -04:00
tmpfile Add tmpfile_003_pos to .gitignore 2017-02-03 13:42:49 -08:00
truncate Fix truncate(2) mtime and ctime handling 2017-11-13 09:24:26 -08:00
upgrade Implemented zpool sync command 2017-05-19 12:33:11 -07:00
userquota Fix 'zfs get {user|group}objused@' functionality 2017-11-29 11:59:22 -08:00
vdev_zaps Disable vdev_zaps_004_pos 2017-12-07 16:43:59 -08:00
write_dirs Skip tests that are slow on 32-bit builders 2017-06-06 19:04:01 -07:00
xattr Fix volume WR_INDIRECT log replay 2017-09-08 15:07:00 -07:00
zvol Disable zvol_ENOSPC_001_pos on 32-bit systems 2017-11-13 16:26:15 -08:00
Makefile.am Extend deadman logic 2018-01-25 13:40:38 -08:00