Change checksum & IO delay ratelimit values

Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec. This allows zed to actually trigger if a bunch of these events arrive in a short period of time (zed has a threshold of 10 events in 10 sec). Previously, if you had, say, 100 checksum errors in 1 sec, it would get ratelimited to 5/sec which wouldn't trigger zed to fault the drive. Also, convert the checksum and IO delay thresholds to module params for easy testing. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7252
2026-05-22 10:37:35 +03:00 · 2018-03-04 17:34:51 -08:00
parent 792f88131c
commit 6dc40e2ada
5 changed files with 71 additions and 9 deletions
@@ -739,6 +739,34 @@ Disable pool import at module load by ignoring the cache file (typically \fB/etc
 Use \fB1\fR for yes (default) and \fB0\fR for no.
 .RE

+.sp
+.ne 2
+.na
+\fBzfs_checksums_per_second\fR (int)
+.ad
+.RS 12n
+Rate limit checksum events to this many per second.  Note that this should
+not be set below the zed thresholds (currently 10 checksums over 10 sec)
+or else zed may not trigger any action.
+.sp
+Default value: 20
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_commit_timeout_pct\fR (int)
+.ad
+.RS 12n
+This controls the amount of time that a ZIL block (lwb) will remain "open"
+when it isn't "full", and it has a thread waiting for it to be committed to
+stable storage.  The timeout is scaled based on a percentage of the last lwb
+latency to avoid significantly impacting the latency of each individual
+transaction record (itx).
+.sp
+Default value: \fB5\fR%.
+.RE
+
 .sp
 .ne 2
 .na
@@ -866,6 +894,17 @@ Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
 Default value: \fB500,000\fR.
 .RE

+.sp
+.ne 2
+.na
+\fBzfs_delays_per_second\fR (int)
+.ad
+.RS 12n
+Rate limit IO delay events to this many per second.
+.sp
+Default value: 20
+.RE
+
 .sp
 .ne 2
 .na