Change checksum & IO delay ratelimit values

Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec.
This allows zed to actually trigger if a bunch of these events arrive in
a short period of time (zed has a threshold of 10 events in 10 sec).
Previously, if you had, say, 100 checksum errors in 1 sec, it would get
ratelimited to 5/sec which wouldn't trigger zed to fault the drive.

Also, convert the checksum and IO delay thresholds to module params for
easy testing.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7252
This commit is contained in:
Tony Hutter
2018-03-04 17:34:51 -08:00
parent 792f88131c
commit 6dc40e2ada
5 changed files with 71 additions and 9 deletions
+39
View File
@@ -739,6 +739,34 @@ Disable pool import at module load by ignoring the cache file (typically \fB/etc
Use \fB1\fR for yes (default) and \fB0\fR for no.
.RE
.sp
.ne 2
.na
\fBzfs_checksums_per_second\fR (int)
.ad
.RS 12n
Rate limit checksum events to this many per second. Note that this should
not be set below the zed thresholds (currently 10 checksums over 10 sec)
or else zed may not trigger any action.
.sp
Default value: 20
.RE
.sp
.ne 2
.na
\fBzfs_commit_timeout_pct\fR (int)
.ad
.RS 12n
This controls the amount of time that a ZIL block (lwb) will remain "open"
when it isn't "full", and it has a thread waiting for it to be committed to
stable storage. The timeout is scaled based on a percentage of the last lwb
latency to avoid significantly impacting the latency of each individual
transaction record (itx).
.sp
Default value: \fB5\fR%.
.RE
.sp
.ne 2
.na
@@ -866,6 +894,17 @@ Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
Default value: \fB500,000\fR.
.RE
.sp
.ne 2
.na
\fBzfs_delays_per_second\fR (int)
.ad
.RS 12n
Rate limit IO delay events to this many per second.
.sp
Default value: 20
.RE
.sp
.ne 2
.na