mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 02:27:36 +03:00
zed: Add deadman-slot_off.sh zedlet
Optionally turn off disk's enclosure slot if an I/O is hung triggering the deadman. It's possible for outstanding I/O to a misbehaving SCSI disk to neither promptly complete or return an error. This can occur due to retry and recovery actions taken by the SCSI layer, driver, or disk. When it occurs the pool will be unresponsive even though there may be sufficient redundancy configured to proceeded without this single disk. When a hung I/O is detected by the kmods it will be posted as a deadman event. By default an I/O is considered to be hung after 5 minutes. This value can be changed with the zfs_deadman_ziotime_ms module parameter. If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set the disk's enclosure slot will be powered off causing the outstanding I/O to fail. The ZED will then handle this like a normal disk failure. By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set. As part of this change `zfs_deadman_events_per_second` is added to control the ratelimitting of deadman events independantly of delay events. In practice, a single deadman event is sufficient and more aren't particularly useful. Alphabetize the zfs_deadman_* entries in zfs.4. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16226
This commit is contained in:
committed by
Tony Hutter
parent
8ca4319f60
commit
b0cfb480ca
+12
-9
@@ -889,6 +889,13 @@ Historically used for controlling what reporting was available under
|
||||
.Pa /proc/spl/kstat/zfs .
|
||||
No effect.
|
||||
.
|
||||
.It Sy zfs_deadman_checktime_ms Ns = Ns Sy 60000 Ns ms Po 1 min Pc Pq u64
|
||||
Check time in milliseconds.
|
||||
This defines the frequency at which we check for hung I/O requests
|
||||
and potentially invoke the
|
||||
.Sy zfs_deadman_failmode
|
||||
behavior.
|
||||
.
|
||||
.It Sy zfs_deadman_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
|
||||
When a pool sync operation takes longer than
|
||||
.Sy zfs_deadman_synctime_ms ,
|
||||
@@ -904,6 +911,10 @@ By default, the deadman is enabled and set to
|
||||
which results in "hung" I/O operations only being logged.
|
||||
The deadman is automatically disabled when a pool gets suspended.
|
||||
.
|
||||
.It Sy zfs_deadman_events_per_second Ns = Ns Sy 1 Ns /s Pq int
|
||||
Rate limit deadman zevents (which report hung I/O operations) to this many per
|
||||
second.
|
||||
.
|
||||
.It Sy zfs_deadman_failmode Ns = Ns Sy wait Pq charp
|
||||
Controls the failure behavior when the deadman detects a "hung" I/O operation.
|
||||
Valid values are:
|
||||
@@ -921,13 +932,6 @@ This can be used to facilitate automatic fail-over
|
||||
to a properly configured fail-over partner.
|
||||
.El
|
||||
.
|
||||
.It Sy zfs_deadman_checktime_ms Ns = Ns Sy 60000 Ns ms Po 1 min Pc Pq u64
|
||||
Check time in milliseconds.
|
||||
This defines the frequency at which we check for hung I/O requests
|
||||
and potentially invoke the
|
||||
.Sy zfs_deadman_failmode
|
||||
behavior.
|
||||
.
|
||||
.It Sy zfs_deadman_synctime_ms Ns = Ns Sy 600000 Ns ms Po 10 min Pc Pq u64
|
||||
Interval in milliseconds after which the deadman is triggered and also
|
||||
the interval after which a pool sync operation is considered to be "hung".
|
||||
@@ -985,8 +989,7 @@ will result in objects waiting when there is not actually contention on the
|
||||
same object.
|
||||
.
|
||||
.It Sy zfs_slow_io_events_per_second Ns = Ns Sy 20 Ns /s Pq int
|
||||
Rate limit delay and deadman zevents (which report slow I/O operations) to this
|
||||
many per
|
||||
Rate limit delay zevents (which report slow I/O operations) to this many per
|
||||
second.
|
||||
.
|
||||
.It Sy zfs_unflushed_max_mem_amt Ns = Ns Sy 1073741824 Ns B Po 1 GiB Pc Pq u64
|
||||
|
||||
Reference in New Issue
Block a user