mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 02:27:36 +03:00
Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). However, no action is taken by ZED, like is done for checksum or I/O errors. This change adds slow disk diagnosis to ZED which is opt-in using new VDEV properties: VDEV_PROP_SLOW_IO_N VDEV_PROP_SLOW_IO_T If multiple VDEVs in a pool are undergoing slow I/Os, then it skips the zpool_vdev_degrade(). Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15469
This commit is contained in:
@@ -44,7 +44,7 @@ section, below.
|
||||
Every vdev has a set of properties that export statistics about the vdev
|
||||
as well as control various behaviors.
|
||||
Properties are not inherited from top-level vdevs, with the exception of
|
||||
checksum_n, checksum_t, io_n, and io_t.
|
||||
checksum_n, checksum_t, io_n, io_t, slow_io_n, and slow_io_t.
|
||||
.Pp
|
||||
The values of numeric properties can be specified using human-readable suffixes
|
||||
.Po for example,
|
||||
@@ -117,7 +117,7 @@ If this device is currently being removed from the pool
|
||||
.Pp
|
||||
The following native properties can be used to change the behavior of a vdev.
|
||||
.Bl -tag -width "allocating"
|
||||
.It Sy checksum_n , checksum_t , io_n , io_t
|
||||
.It Sy checksum_n , checksum_t , io_n , io_t , slow_io_n , slow_io_t
|
||||
Tune the fault management daemon by specifying checksum/io thresholds of <N>
|
||||
errors in <T> seconds, respectively.
|
||||
These properties can be set on leaf and top-level vdevs.
|
||||
|
||||
@@ -260,8 +260,8 @@ sufficient replicas exist to continue functioning.
|
||||
The underlying conditions are as follows:
|
||||
.Bl -bullet -compact
|
||||
.It
|
||||
The number of checksum errors exceeds acceptable levels and the device is
|
||||
degraded as an indication that something may be wrong.
|
||||
The number of checksum errors or slow I/Os exceeds acceptable levels and the
|
||||
device is degraded as an indication that something may be wrong.
|
||||
ZFS continues to use the device as necessary.
|
||||
.It
|
||||
The number of I/O errors exceeds acceptable levels.
|
||||
|
||||
@@ -69,6 +69,7 @@ Force a vdev into the DEGRADED or FAULTED state.
|
||||
.Nm zinject
|
||||
.Fl d Ar vdev
|
||||
.Fl D Ar latency : Ns Ar lanes
|
||||
.Op Fl T Ar read|write
|
||||
.Ar pool
|
||||
.Xc
|
||||
Add an artificial delay to I/O requests on a particular
|
||||
|
||||
Reference in New Issue
Block a user