Go to file
Brian Behlendorf 8fb1ede146 Extend deadman logic
The intent of this patch is extend the existing deadman code
such that it's flexible enough to be used by both ztest and
on production systems.  The proposed changes include:

* Added a new `zfs_deadman_failmode` module option which is
  used to dynamically control the behavior of the deadman.  It's
  loosely modeled after, but independant from, the pool failmode
  property.  It can be set to wait, continue, or panic.

    * wait     - Wait for the "hung" I/O (default)
    * continue - Attempt to recover from a "hung" I/O
    * panic    - Panic the system

* Added a new `zfs_deadman_ziotime_ms` module option which is
  analogous to `zfs_deadman_synctime_ms` except instead of
  applying to a pool TXG sync it applies to zio_wait().  A
  default value of 300s is used to define a "hung" zio.

* The ztest deadman thread has been re-enabled by default,
  aligned with the upstream OpenZFS code, and then extended
  to terminate the process when it takes significantly longer
  to complete than expected.

* The -G option was added to ztest to print the internal debug
  log when a fatal error is encountered.  This same option was
  previously added to zdb in commit fa603f82.  Update zloop.sh
  to unconditionally pass -G to obtain additional debugging.

* The FM_EREPORT_ZFS_DELAY event which was previously posted
  when the deadman detect a "hung" pool has been replaced by
  a new dedicated FM_EREPORT_ZFS_DEADMAN event.

* The proposed recovery logic attempts to restart a "hung"
  zio by calling zio_interrupt() on any outstanding leaf zios.
  We may want to further restrict this to zios in either the
  ZIO_STAGE_VDEV_IO_START or ZIO_STAGE_VDEV_IO_DONE stages.
  Calling zio_interrupt() is expected to only be useful for
  cases when an IO has been submitted to the physical device
  but for some reasonable the completion callback hasn't been
  called by the lower layers.  This shouldn't be possible but
  has been observed and may be caused by kernel/driver bugs.

* The 'zfs_deadman_synctime_ms' default value was reduced from
  1000s to 600s.

* Depending on how ztest fails there may be no cache file to
  move.  This should not be considered fatal, collect the logs
  which are available and carry on.

* Add deadman test cases for spa_deadman() and zio_wait().

* Increase default zfs_deadman_checktime_ms to 60s.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6999
2018-01-25 13:40:38 -08:00
.github Reduce codecov PR comments 2018-01-09 11:15:55 -08:00
cmd Extend deadman logic 2018-01-25 13:40:38 -08:00
config Fix Debian packaging on ARMv7/ARM64 2018-01-18 10:15:41 -08:00
contrib Run zfs load-key if needed in dracut 2018-01-18 10:20:34 -08:00
etc Run zfs load-key if needed in dracut 2018-01-18 10:20:34 -08:00
include Extend deadman logic 2018-01-25 13:40:38 -08:00
lib OpenZFS 8652 - Tautological comparisons with ZPROP_INVAL 2018-01-19 09:22:37 -08:00
man Extend deadman logic 2018-01-25 13:40:38 -08:00
module Extend deadman logic 2018-01-25 13:40:38 -08:00
rpm Force ztest to always use /dev/urandom 2018-01-12 09:36:26 -08:00
scripts Extend deadman logic 2018-01-25 13:40:38 -08:00
tests Extend deadman logic 2018-01-25 13:40:38 -08:00
udev Fix spelling 2017-01-03 11:31:18 -06:00
.gitignore Add configure option to enable gcov analysis 2017-09-15 10:24:13 -07:00
.gitmodules Add zimport.sh compatibility test script 2014-02-21 12:10:31 -08:00
.travis.yml Add .travis.yml 2017-11-13 09:18:18 -08:00
AUTHORS Add a missing > to AUTHORS 2014-09-02 14:18:53 -07:00
autogen.sh build: do not call boilerplate ourself 2013-04-02 10:55:20 -07:00
configure.ac Extend deadman logic 2018-01-25 13:40:38 -08:00
copy-builtin Fix copy-builtin to work with ASAN patch 2018-01-12 09:39:36 -08:00
COPYRIGHT Encryption patch follow-up 2017-10-11 16:54:48 -04:00
DISCLAIMER Fix minor typos and update marketing copy. 2013-03-21 12:51:06 -07:00
Makefile.am Update for cppcheck v1.80 2017-11-18 14:08:00 -08:00
META Tag zfs-0.7.0 2017-07-26 10:13:25 -07:00
OPENSOLARIS.LICENSE Add CDDL license file 2008-12-01 14:49:34 -08:00
README.markdown Add scan.coverity.com badge to README 2017-10-30 16:21:24 -07:00
TEST Refresh TEST file to include new variables 2017-11-08 11:09:30 -08:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

img

ZFS on Linux is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the OpenZFS community.

codecov coverity

Official Resources

Installation

Full documentation for installing ZoL on your favorite Linux distribution can be found at our site.

Contribute & Develop

We have a separate document with contribution guidelines.