mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 18:40:43 +03:00
Detect a slow raidz child during reads
A single slow responding disk can affect the overall read performance of a raidz group. When a raidz child disk is determined to be a persistent slow outlier, then have it sit out during reads for a period of time. The raidz group can use parity to reconstruct the data that was skipped. Each time a slow disk is placed into a sit out period, its `vdev_stat.vs_slow_ios count` is incremented and a zevent class `ereport.fs.zfs.delay` is posted. The length of the sit out period can be changed using the `raid_read_sit_out_secs` module parameter. Setting it to zero disables slow outlier detection. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Contributions-by: Don Brady <don.brady@klarasystems.com> Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17227
This commit is contained in:
committed by
Brian Behlendorf
parent
0df85ec27c
commit
df55ba7c49
@@ -4,6 +4,7 @@
|
||||
.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
|
||||
.\" Copyright (c) 2019 Datto Inc.
|
||||
.\" Copyright (c) 2023, 2024, 2025, Klara, Inc.
|
||||
.\"
|
||||
.\" The contents of this file are subject to the terms of the Common Development
|
||||
.\" and Distribution License (the "License"). You may not use this file except
|
||||
.\" in compliance with the License. You can obtain a copy of the license at
|
||||
@@ -601,6 +602,42 @@ new format when enabling the
|
||||
feature.
|
||||
The default is to convert all log entries.
|
||||
.
|
||||
.It Sy vdev_read_sit_out_secs Ns = Ns Sy 600 Ns s Po 10 min Pc Pq ulong
|
||||
When a slow disk outlier is detected it is placed in a sit out state.
|
||||
While sitting out the disk will not participate in normal reads, instead its
|
||||
data will be reconstructed as needed from parity.
|
||||
Scrub operations will always read from a disk, even if it's sitting out.
|
||||
A number of disks in a RAID-Z or dRAID vdev may sit out at the same time, up
|
||||
to the number of parity devices.
|
||||
Writes will still be issued to a disk which is sitting out to maintain full
|
||||
redundancy.
|
||||
Defaults to 600 seconds and a value of zero disables disk sit-outs in general,
|
||||
including slow disk outlier detection.
|
||||
.
|
||||
.It Sy vdev_raidz_outlier_check_interval_ms Ns = Ns Sy 1000 Ns ms Po 1 sec Pc Pq ulong
|
||||
How often each RAID-Z and dRAID vdev will check for slow disk outliers.
|
||||
Increasing this interval will reduce the sensitivity of detection (since all
|
||||
I/Os since the last check are included in the statistics), but will slow the
|
||||
response to a disk developing a problem.
|
||||
Defaults to once per second; setting extremely small values may cause negative
|
||||
performance effects.
|
||||
.
|
||||
.It Sy vdev_raidz_outlier_insensitivity Ns = Ns Sy 50 Pq uint
|
||||
When performing slow outlier checks for RAID-Z and dRAID vdevs, this value is
|
||||
used to determine how far out an outlier must be before it counts as an event
|
||||
worth consdering.
|
||||
This is phrased as "insensitivity" because larger values result in fewer
|
||||
detections.
|
||||
Smaller values will result in more aggressive sitting out of disks that may have
|
||||
problems, but may significantly increase the rate of spurious sit-outs.
|
||||
.Pp
|
||||
To provide a more technical definition of this parameter, this is the multiple
|
||||
of the inter-quartile range (IQR) that is being used in a Tukey's Fence
|
||||
detection algorithm.
|
||||
This is much higher than a normal Tukey's Fence k-value, because the
|
||||
distribution under consideration is probably an extreme-value distribution,
|
||||
rather than a more typical Gaussian distribution.
|
||||
.
|
||||
.It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq uint
|
||||
During top-level vdev removal, chunks of data are copied from the vdev
|
||||
which may include free space in order to trade bandwidth for IOPS.
|
||||
|
||||
+35
-4
@@ -19,7 +19,7 @@
|
||||
.\"
|
||||
.\" CDDL HEADER END
|
||||
.\"
|
||||
.\" Copyright (c) 2021 Klara, Inc.
|
||||
.\" Copyright (c) 2021, 2025, Klara, Inc.
|
||||
.\"
|
||||
.Dd July 23, 2024
|
||||
.Dt VDEVPROPS 7
|
||||
@@ -106,11 +106,17 @@ The number of children belonging to this vdev
|
||||
.It Sy read_errors , write_errors , checksum_errors , initialize_errors , trim_errors
|
||||
The number of errors of each type encountered by this vdev
|
||||
.It Sy slow_ios
|
||||
The number of slow I/Os encountered by this vdev,
|
||||
These represent I/O operations that didn't complete in
|
||||
This indicates the number of slow I/O operations encountered by this vdev.
|
||||
A slow I/O is defined as an operation that did not complete within the
|
||||
.Sy zio_slow_io_ms
|
||||
milliseconds
|
||||
threshold in milliseconds
|
||||
.Pq Sy 30000 No by default .
|
||||
For
|
||||
.Sy RAIDZ
|
||||
and
|
||||
.Sy DRAID
|
||||
configurations, this value also represents the number of times the vdev was
|
||||
identified as an outlier and excluded from participating in read I/O operations.
|
||||
.It Sy null_ops , read_ops , write_ops , free_ops , claim_ops , trim_ops
|
||||
The number of I/O operations of each type performed by this vdev
|
||||
.It Xo
|
||||
@@ -150,6 +156,31 @@ The amount of space to reserve for the EFI system partition
|
||||
.It Sy failfast
|
||||
If this device should propagate BIO errors back to ZFS, used to disable
|
||||
failfast.
|
||||
.It Sy sit_out
|
||||
Only valid for
|
||||
.Sy RAIDZ
|
||||
and
|
||||
.Sy DRAID
|
||||
vdevs.
|
||||
True when a slow disk outlier was detected and the vdev is currently in a sit
|
||||
out state.
|
||||
This property can be manually set to cause vdevs to sit out.
|
||||
It will also be automatically set by the
|
||||
.Sy autosit
|
||||
logic if that is enabled.
|
||||
While sitting out, the vdev will not participate in normal reads, instead its
|
||||
data will be reconstructed as needed from parity.
|
||||
.It Sy autosit
|
||||
Only valid for
|
||||
.Sy RAIDZ
|
||||
and
|
||||
.Sy DRAID
|
||||
vdevs.
|
||||
If set, this enables the kernel-level slow disk detection logic.
|
||||
This logic automatically causes any vdevs that are significant negative
|
||||
performance outliers to sit out, as described in the
|
||||
.Sy sit_out
|
||||
property.
|
||||
.It Sy path
|
||||
The path to the device for this vdev
|
||||
.It Sy allocating
|
||||
|
||||
@@ -190,6 +190,16 @@ Issued when a scrub is resumed on a pool.
|
||||
.It Sy scrub.paused
|
||||
Issued when a scrub is paused on a pool.
|
||||
.It Sy bootfs.vdev.attach
|
||||
.It Sy sitout
|
||||
Issued when a
|
||||
.Sy RAIDZ
|
||||
or
|
||||
.Sy DRAID
|
||||
vdev triggers the
|
||||
.Sy autosit
|
||||
logic.
|
||||
This logic detects when a disk in such a vdev is significantly slower than its
|
||||
peers, and sits them out temporarily to preserve the performance of the pool.
|
||||
.El
|
||||
.
|
||||
.Sh PAYLOADS
|
||||
|
||||
Reference in New Issue
Block a user