Change checksum & IO delay ratelimit values

Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec.
This allows zed to actually trigger if a bunch of these events arrive in
a short period of time (zed has a threshold of 10 events in 10 sec).
Previously, if you had, say, 100 checksum errors in 1 sec, it would get
ratelimited to 5/sec which wouldn't trigger zed to fault the drive.

Also, convert the checksum and IO delay thresholds to module params for
easy testing.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7252
This commit is contained in:
Tony Hutter
2018-03-04 17:34:51 -08:00
committed by Brian Behlendorf
parent 5666a994f2
commit 80d52c3919
6 changed files with 56 additions and 15 deletions
@@ -68,14 +68,8 @@ for type in "mirror" "raidz" "raidz2"; do
log_must dd if=/dev/urandom of=$TESTFILE bs=1M count=16
# 4. Inject CHECKSUM ERRORS on read with a zinject error handler
# NOTE: checksum events are ratelimited to max 5 per second, ZED needs
# 10 to kick in a spare
log_must zinject -d $FAULT_FILE -e corrupt -f 50 -T read $TESTPOOL
log_must cp $TESTFILE /dev/null
log_must sleep 1
log_must cp $TESTFILE /dev/null
log_must sleep 1
log_must cp $TESTFILE /dev/null
# 5. Verify the ZED kicks in a hot spare and expected pool/device status
log_note "Wait for ZED to auto-spare"