Detect a slow raidz child during reads

A single slow responding disk can affect the overall read
performance of a raidz group.  When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.

Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.

The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter.  Setting it to
zero disables slow outlier detection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Contributions-by: Don Brady <don.brady@klarasystems.com>
Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17227
This commit is contained in:
Paul Dagnelie
2025-08-27 16:41:48 -07:00
committed by Brian Behlendorf
parent 0620c979a5
commit d64711c202
28 changed files with 1399 additions and 13 deletions
+37
View File
@@ -4,6 +4,7 @@
.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
.\" Copyright (c) 2019 Datto Inc.
.\" Copyright (c) 2023, 2024, 2025, Klara, Inc.
.\"
.\" The contents of this file are subject to the terms of the Common Development
.\" and Distribution License (the "License"). You may not use this file except
.\" in compliance with the License. You can obtain a copy of the license at
@@ -601,6 +602,42 @@ new format when enabling the
feature.
The default is to convert all log entries.
.
.It Sy vdev_read_sit_out_secs Ns = Ns Sy 600 Ns s Po 10 min Pc Pq ulong
When a slow disk outlier is detected it is placed in a sit out state.
While sitting out the disk will not participate in normal reads, instead its
data will be reconstructed as needed from parity.
Scrub operations will always read from a disk, even if it's sitting out.
A number of disks in a RAID-Z or dRAID vdev may sit out at the same time, up
to the number of parity devices.
Writes will still be issued to a disk which is sitting out to maintain full
redundancy.
Defaults to 600 seconds and a value of zero disables disk sit-outs in general,
including slow disk outlier detection.
.
.It Sy vdev_raidz_outlier_check_interval_ms Ns = Ns Sy 1000 Ns ms Po 1 sec Pc Pq ulong
How often each RAID-Z and dRAID vdev will check for slow disk outliers.
Increasing this interval will reduce the sensitivity of detection (since all
I/Os since the last check are included in the statistics), but will slow the
response to a disk developing a problem.
Defaults to once per second; setting extremely small values may cause negative
performance effects.
.
.It Sy vdev_raidz_outlier_insensitivity Ns = Ns Sy 50 Pq uint
When performing slow outlier checks for RAID-Z and dRAID vdevs, this value is
used to determine how far out an outlier must be before it counts as an event
worth consdering.
This is phrased as "insensitivity" because larger values result in fewer
detections.
Smaller values will result in more aggressive sitting out of disks that may have
problems, but may significantly increase the rate of spurious sit-outs.
.Pp
To provide a more technical definition of this parameter, this is the multiple
of the inter-quartile range (IQR) that is being used in a Tukey's Fence
detection algorithm.
This is much higher than a normal Tukey's Fence k-value, because the
distribution under consideration is probably an extreme-value distribution,
rather than a more typical Gaussian distribution.
.
.It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq uint
During top-level vdev removal, chunks of data are copied from the vdev
which may include free space in order to trade bandwidth for IOPS.
+35 -4
View File
@@ -19,7 +19,7 @@
.\"
.\" CDDL HEADER END
.\"
.\" Copyright (c) 2021 Klara, Inc.
.\" Copyright (c) 2021, 2025, Klara, Inc.
.\"
.Dd July 23, 2024
.Dt VDEVPROPS 7
@@ -106,11 +106,17 @@ The number of children belonging to this vdev
.It Sy read_errors , write_errors , checksum_errors , initialize_errors , trim_errors
The number of errors of each type encountered by this vdev
.It Sy slow_ios
The number of slow I/Os encountered by this vdev,
These represent I/O operations that didn't complete in
This indicates the number of slow I/O operations encountered by this vdev.
A slow I/O is defined as an operation that did not complete within the
.Sy zio_slow_io_ms
milliseconds
threshold in milliseconds
.Pq Sy 30000 No by default .
For
.Sy RAIDZ
and
.Sy DRAID
configurations, this value also represents the number of times the vdev was
identified as an outlier and excluded from participating in read I/O operations.
.It Sy null_ops , read_ops , write_ops , free_ops , claim_ops , trim_ops
The number of I/O operations of each type performed by this vdev
.It Xo
@@ -150,6 +156,31 @@ The amount of space to reserve for the EFI system partition
.It Sy failfast
If this device should propagate BIO errors back to ZFS, used to disable
failfast.
.It Sy sit_out
Only valid for
.Sy RAIDZ
and
.Sy DRAID
vdevs.
True when a slow disk outlier was detected and the vdev is currently in a sit
out state.
This property can be manually set to cause vdevs to sit out.
It will also be automatically set by the
.Sy autosit
logic if that is enabled.
While sitting out, the vdev will not participate in normal reads, instead its
data will be reconstructed as needed from parity.
.It Sy autosit
Only valid for
.Sy RAIDZ
and
.Sy DRAID
vdevs.
If set, this enables the kernel-level slow disk detection logic.
This logic automatically causes any vdevs that are significant negative
performance outliers to sit out, as described in the
.Sy sit_out
property.
.It Sy path
The path to the device for this vdev
.It Sy allocating
+10
View File
@@ -190,6 +190,16 @@ Issued when a scrub is resumed on a pool.
.It Sy scrub.paused
Issued when a scrub is paused on a pool.
.It Sy bootfs.vdev.attach
.It Sy sitout
Issued when a
.Sy RAIDZ
or
.Sy DRAID
vdev triggers the
.Sy autosit
logic.
This logic detects when a disk in such a vdev is significantly slower than its
peers, and sits them out temporarily to preserve the performance of the pool.
.El
.
.Sh PAYLOADS