Reflect f2330bd156
change in our man pages and add some context.
Wording is primarily copy-pasted from code comments.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#16581
Since dsl_crypto_key_open() references the key, 0d23f5e2e4 should
have called dsl_crypto_key_rele() to drop it first instead of
calling dsl_crypto_key_free() directly. The final result should
actually be the same, but without triggering dck_holds assertion.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16567
Couple places in the code depend on 0 returned only if the task was
actually cancelled. Doing otherwise could lead to extra references
being dropped. The race could be small, but I believe CI hit it
from time to time.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16565
This adds the HAVE_KERNEL_NEON and HAVE_KERNEL_FPU_INTERNAL
guards to simd_stat.c defaulted to 0 to make it build again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Sebastian Wuerl <s.wuerl@mailbox.org>
Closes#16558
Evidently while reworking it on aarch64, I broke it on x86 and
didn't notice.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#16556
Too many times, people's performance problems have amounted to
"somehow your SIMD support isn't working", and determining that
at runtime is difficult to describe to people.
This adds a /proc/spl/kstat/zfs/simd node, which exposes
metadata about which instructions ZFS thinks it can use,
on AArch64 and x86_64 Linux, to make investigating things
like this much easier.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#16530
Without updating 'm' we evict from MFU metadata all that we wanted
to evict from all metadata, including already evicted MRU metadata
('m' is the total amount of metadata we had at the beginning,
and 'w' is the total amount of metadata we want to have).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Theera K. <tkittich@hotmail.com>
Closes#16521Closes#16546
At least FreeBSD has a limit of 256 simultaneous AIO requests per
process. Attempt to issue more results in EAGAIN errors. Since we
issue 4 requests per disk/partition from 2xCPUs threads, it is
quite easy to reach that limit on large systems, that results in
random pool import failures. It annoyed me for quite a while on
a system with 64 CPUs and 70+ partitioned disks.
This patch from one side limits the number of threads to avoid the
error, while from another should softly fall back to sync reads in
case of error. It takes into account _SC_AIO_MAX as a system-wide
AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process
limit. The last not exactly right, but it is the best I found.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16551
GRUB is not able to detect ZFS pool if snaphsot of top level boot
pool is created. This issue is observed with GRUB versions up to
v2.06 if extensible_dataset feature is enabled on ZFS boot pool.
compatibility=grub2-2.06 would enable all read-only compatible
zpool features except extensible_dataset and other features that
depend on it.
The existing grub2 compatibility file is now renamed to grub2-2.12 to
reflect the appropriate grub2 version. grub2-2.12 lists all read-only
features that can be enabled on boot pool for grub2 with version 2.12
onwards.
A new symlink grub2 is created that currently points to the grub2-2.12
compatibility file.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#13873Closes#15261Closes#15909
An old FreeBSD bugzilla report PR#168158 notes that DNS
names with '-'s in them cannot be used for the sharenfs
property. This patch fixes the parsing of these DNS names.
The only negative affect this patch might have is that,
if a user has incorrectly separated options with a '-'
the sharenfs setting will no longer work once this patch
is applied.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes#16529
Off by one, confused me a while!
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16500
zfs_acl_node_alloc allocates an uninitialized data buffer, but upstack
zfs_acl_chmod only partially initializes it. KMSAN reported that this
memory remained uninitialized at the point when it was read by
lzjb_compress, which suggests a possible kernel memory disclosure bug.
The full KMSAN warning may be found in the PR.
https://github.com/openzfs/zfs/pull/16511
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
BRT refcounts are stored as eight uint8_ts rather than a single
uint64_t. This means that za_first_integer is only the first byte, so
max 256. This fixes it by doing a lookup for the whole value.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite. Some minor tweaks also needed
to support ksh 1.0.10.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16700
Add a LUKS sanity test to trigger: #16631
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16681
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
The following tests have been observed to occasionally fail when
running under the CI. Updated our exceptions list to track them.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16670
With CPU pinning, we should get some speedup because of better
cpu cache re-use.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16641
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.
The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.
FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16641
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:
cpp/autobuilder: deptrace not supported in ubuntu 24.04
Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes#16639
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16611
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.
Update also the FreeBSD version definitions a bit.
The naming is like this now:
FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)
RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40
Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24
Misc Linux distros:
- archlinux, tumbleweed
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16610
There are cases, where some needed files for the summary page aren't
created. Currently the whole Summary Page creation will fail then.
Sample run: https://github.com/openzfs/zfs/actions/runs/11148248072/job/30999748588
Fix this, by properly checking for existence of the needed files.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16599
For data integrity checks as done in ZTS, the verification for
unintended data corruption with xxhash128 should be a lot faster
and perfectly usable.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16577
Update the test case to freeze the pool then export it to better
simulate a hard failure. This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent. That can in turn result
in a pool import failure and a spurious test failure.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16578
Lower the minimum number of expected deadman events from 4 to 3. All
that is strictly required is a single event to consider the test a
pass. However, since I've never seen a count of less than 3 reported
by the CI that should be sufficient.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16575
On failure attempt to include the most relevant portions of the
ztest logs in the CI output. This full logs are still available
for download but often a backtrace and the last output is enough.
Install libunwind to improve the odds of a useful backtrace.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16573
Update the test case to freeze the pool then export it to better
simulate a hard failure. This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent. That can in turn result
in a pool import failure and a spurious test failure.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16570
The commit uses heuristics to determine whether a PR is behavioral:
It runs "quick" CI (i.e., only use sanity.run on fewer OSes)
if (explicitly requested by user):
- the *last* commit message contains a line 'ZFS-CI-Type: quick',
or if (by heuristics):
- the files changed are not in the list of specified directory, and
- all commit messages does not contain 'ZFS-CI-Type: full'.
It runs "full" CI otherwise.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes#16564
For checkstyle, zloop, zfs-qemu, and codeql workflows cancel
in-progress jobs when the PR is updated.
Relevant GitHub Actions documentation:
The following concurrency group cancels in-progress jobs or run
on pull_request events only; if github.head_ref is undefined, the
concurrency group will fallback to the run ID, which is guaranteed
to be both unique and defined for the run.
https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#example-using-a-fallback-value
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16562
Update the CONTRIBUTING.md documentation to refer to the GitHub Actions
workflows which have replaced the buildbot infrastructure.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16561
Switch from v2 to v3 CodeQL Actions. The v2 actions will no longer
be supported as of Dec '24 so we need to move to v3. According to
the release notes they should be functionally equivalent.
Note that the only difference between v2 and v3 of the CodeQL
Action is the node version they support, ... For example 3.22.11
was the first v3 release and is functionally identical to 2.22.11.
https://github.com/github/codeql-action/blob/main/CHANGELOG.md
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16560
The following tests have been observed to occasionally fail when
running under the CI. Updated our exceptions list to track them.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16553
All supported Linux kernels, 4.18 and newer, provide O_TMPFILE.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16553
There is no longer be a need for the ci_reason exception with
the update CI GitHub Actions infrastruture. Retire it.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16553
The qemu-9-summary-page.sh script reads the file env.txt in the
first lines. When the module didn't build, this file was not copied
into the tarfile - causing the scipt to abort.
Fix: copy needed files into the tarfile in case of module build
failures. The fix ignores also empty tarfiles in future.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16555
In zpool_create.shlib, check_feature_set iterates over all features
mentioned in provided compatibility file to check if only those
features are enabled on the pool.
This commit fixes skipping over comment lines correctly. Otherwise,
the test case fails as comment lines are also treated as feature names
by check_feature_set function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15909
This commit changes the workflow of the github actions.
- Ubuntu 20.04, 22.04, 24.04 will be tested via QEMU now
- remove unused scripts of this commit: b7bc334d1
- re-add the zloop standalone testings via zloop.yml
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16549
Fix that error: "cat /tmp/failed.txt: No such file or directory"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16549
On larger files this should improve the speed.
Sample values of my system:
[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | sha256sum
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 -
real 0m1,050s
user 0m0,985s
sys 0m0,153s
[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | openssl sha256 -r
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 *stdin
real 0m0,254s
user 0m0,206s
sys 0m0,160s
I think cli_root/zdb/zdb_backup.ksh runs also an FreeBSD and I needed to
include the sysutils/coreutils package for the FreeBSD tests within the
QEMU patchset.
This could be reverted, when this pull request gets upstream
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16543
This commit adds functional tests for these systems:
- AlmaLinux 8, AlmaLinux 9, ArchLinux
- CentOS Stream 9, Fedora 39, Fedora 40
- Debian 11, Debian 12
- FreeBSD 13, FreeBSD 14, FreeBSD 15
- Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04
- enabled by default:
- AlmaLinux 8, AlmaLinux 9
- Debian 11, Debian 12
- Fedora 39, Fedora 40
- FreeBSD 13, FreeBSD 14
Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 2 instances of it
- run functional testings and complete in around 3h
- when tests are done, do some logfile preparing
- show detailed results for each system
- in the end, generate the job summary
Real-world benefits from this PR:
1. The github runner scripts are in the zfs repo itself. That means
you can just open a PR against zfs, like "Add Fedora 41 tester", and
see the results directly in the PR. ZFS admins no longer need
manually to login to the buildbot server to update the buildbot config
with new version of Fedora/Almalinux.
2. Github runners allow you to run the entire test suite against your
private branch before submitting a formal PR to openzfs. Just open a
PR against your private zfs repo, and the exact same
Fedora/Alma/FreeBSD runners will fire up and run ZTS. This can be
useful if you want to iterate on a ZTS change before submitting a
formal PR.
3. buildbot is incredibly cumbersome. Our buildbot config files alone
are ~1500 lines (not including any build/setup scripts)!
It's a huge pain to setup.
4. We're running the super ancient buildbot 0.8.12. It's so ancient
it requires python2. We actually have to build python2 from source
for almalinux9 just to get it to run. Ugrading to a more modern
buildbot is a huge undertaking, and the UI on the newer versions is
worse.
5. Buildbot uses EC2 instances. EC2 is a pain because:
* It costs money
* They throttle IOPS and CPU usage, leading to mysterious,
* hard-to-diagnose, failures and timeouts in ZTS.
* EC2 is high maintenance. We have to setup security groups, SSH
* keys, networking, users, etc, in AWS and it's a pain. We also
* have to periodically go in an kill zombie EC2 instances that
* buildbot is unable to kill off.
6. Buildbot doesn't always handle failures well. One of the things we
saw in the past was the FreeBSD builders would often die, and each
builder death would take up a "slot" in buildbot. So we would
periodically have to restart buildbot via a cron job to get the slots
back.
7. This PR divides up the ZTS test list into two parts, launches two
VMs, and on each VM runs half the test suite. The test results are
then merged and shown in the sumary page. So we're basically
parallelizing ZTS on the same github runner. This leads to lower
overall ZTS runtimes (2.5-3 hours vs 4+ hours on buildbot), and one
unified set of results per runner, which is nice.
8. Since the tests are running on a VM, we have much more control over
what happens. We can capture the serial console output even if the
test completely brings down the VM. In the future, we could also
restart the test on the VM where it left off, so that if a single test
panics the VM, we can just restart it and run the remaining ZTS tests
(this functionaly is not yet implemented though, just an idea).
9. Using the runners, users can manually kill or restart a test run
via the github IU. That really isn't possible with buildbot unless
you're an admin.
10. Anecdotally, the tests seem to be more stable and constant under
the QEMU runners.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16537
On load the test needs sometimes a bit more time then just one second.
Doubling the time will help on the QEMU based testings.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16537
The test needs some adjusting within the timings.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#16537
The report generator expects the log to be clean and tidy UTF-8. That
can be a problem if you use some of the verbose/debug test runner
options, which sends all sorts of weird output from arbitrary programs
to the log.
This just makes Python a little more relaxed about such things. It
shouldn't matter in practice, as those lines didn't match the test
result regex anyway, and are discarded immediately.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#16432
Some libc's like uClibc lag the proper definition of SEEK_DATA
and SEEK_HOLE. Since we have only two files in ZTS which use
these definitons, let's define them by hand:
```
#ifndef SEEK_DATA
#define SEEK_DATA 3
#endif
#ifndef SEEK_HOLE
#define SEEK_HOLE 4
#endif
```
There should be no failures, because:
- FreeBSD has support for SEEK_DATA/SEEK_HOLE since FreeBSD 8
- Linux has it since Linux 3.1
- the libc will submit the parameters unchanged to the kernel
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Optionally turn off disk's enclosure slot if an I/O is hung
triggering the deadman.
It's possible for outstanding I/O to a misbehaving SCSI disk to
neither promptly complete or return an error. This can occur due
to retry and recovery actions taken by the SCSI layer, driver, or
disk. When it occurs the pool will be unresponsive even though
there may be sufficient redundancy configured to proceeded without
this single disk.
When a hung I/O is detected by the kmods it will be posted as a
deadman event. By default an I/O is considered to be hung after
5 minutes. This value can be changed with the zfs_deadman_ziotime_ms
module parameter. If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set
the disk's enclosure slot will be powered off causing the outstanding
I/O to fail. The ZED will then handle this like a normal disk failure.
By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set.
As part of this change `zfs_deadman_events_per_second` is added
to control the ratelimitting of deadman events independantly of
delay events. In practice, a single deadman event is sufficient
and more aren't particularly useful.
Alphabetize the zfs_deadman_* entries in zfs.4.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#16226
Last commit should fix the underlying problem, so these should be
passing reliably again.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16364