Commit Graph

9222 Commits

Author SHA1 Message Date
Tony Hutter
f2ebbe46f6
Linux 6.9 compat: META (#16358)
Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 16:27:54 -07:00
Rob Norris
7ca7bb7fd7 Linux 5.16: use bdev_nr_bytes() to get device capacity
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:10:06 -07:00
Rob Norris
e951dba48a Linux 6.10: work harder to avoid kmem_cache_alloc reuse
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:10:02 -07:00
Rob Norris
b409892ae5 Linux 6.10: rework queue limits setup
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:09:55 -07:00
Zhao Yongming
4ee66cdf4e
Add building support for Artix Linux (#16265)
Artix Linux is systemd free distribution based on Arch Linux, with
openrc dinit runit s6 as init alternatives. This patch will make
init scripts installation work the way Gentoo Linux with openrc.

The scripts tweaking for other init will be left to packager.

Signed-off-by: Yongming Zhao <ming.zym@gmail.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-15 16:58:00 -07:00
Mateusz Guzik
a7fc4c85e3
zstd: don't call zstd_mempool_reap if there are no buffers (#16302)
zfs_zstd_cache_reap_now is issued every second.

zstd_mempool_reap checks for both pool existence and buffer count, but
that's still 2 func calls which are trivially avoidable.

With clang it even avoids pushing the stack pointer (but still suffers
the mispredict due to a forward jump, not modified in case someone is
using zstd):

<+0>:     cmpq   $0x0,0x0(%rip)        # <zfs_zstd_cache_reap_now+8>
<+8>:     je     0x217de4 <zfs_zstd_cache_reap_now+36>
<+10>:    push   %rbp
<+11>:    mov    %rsp,%rbp
<+14>:    mov    0x0(%rip),%rdi        # <zfs_zstd_cache_reap_now+21>
<+21>:    call   0x217df0 <zstd_mempool_reap>
<+26>:    mov    0x0(%rip),%rdi        # <zfs_zstd_cache_reap_now+33>
<+33>:    pop    %rbp
<+34>:    jmp    0x217df0 <zstd_mempool_reap>
<+36>:    ret

Preferably the call would not be made to begin with if zstd is not used,
but this retains all the logic confined to zstd code.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-15 14:51:37 -07:00
George Amanakis
c87cb22ba9
head_errlog: fix use-after-free
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273
2024-07-15 09:05:42 -07:00
Daniel Berlin
f7d8b13336
Fix missing semicolon in trace_dbuf.h (#16281)
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 17:44:10 -07:00
a1ea321
398e675f58
one-word manpage correction: snapshot->rollback (#16294)
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 16:27:12 -07:00
Rob Norris
cbd95a950a
ZTS: handle FreeBSD version numbers correctly (#16340)
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 10:58:03 -07:00
Mark Johnston
a10faf5ce6
FreeBSD: Use the new freeuio() helper to free dynamically allocated UIOs (#16300)
This freeuio() interface was introduced to FreeBSD recently.  For now
it simply calls free(), so this change has no effect.  However, this
may not always be true, and in CheriBSD this change is required.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brooks Davis <brooks.davis@sri.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-11 16:52:51 -07:00
Tony Hutter
156a64161b
Linux 6.9: Fix UBSAN errors in zap_micro.c
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
2024-07-11 16:41:26 -07:00
Mark Johnston
4367312760
zvol: Fix suspend lock leaks (#16270)
In several functions, we use a flag variable to track whether
zv_suspend_lock is held.  This flag was not getting reset in a
particular case where we need to retry the underlying operation,
resulting in a lock leak.  Make sure to update the flag where necessary.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-10 14:27:44 -07:00
Peter Doherty
326040b285
Fix the name of the zfs_prefetch_disable parameter (#16319)
The ZFS module parameter name is zfs_prefetch_disable, not
zfs_disable_prefetch.

Signed-off-by: Peter Doherty <peterd@acranox.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-09 09:59:55 -07:00
Tino Reichardt
9ffe441361
Fix zdb "Memory fault" found on FreeBSD ZTS (#16332)
Reason: nvlist_free() tries to free sth. which isn't allocted
Solution: init this variable with NULL

Closes #16311
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-09 09:36:17 -07:00
Mark Johnston
f72e081fbf
FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
This way we can avoid making assumptions about the SDT probe
implementation.  No functional change intended.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-08 17:59:08 -07:00
Mateusz Piotrowski
fd51786f86
zfs.4: Document the actual default for zfs_txg_history (#16305)
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-06-28 11:21:08 -07:00
Allan Jude
5f220c62e1
Fix a mis-merge in the zdb man page (#16304)
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.

Signed-off-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-06-28 10:38:22 -07:00
Tony Hutter
49f3ce3385
Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: #16089

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-06-28 09:52:03 -07:00
Martin Wagner
c98295eed2
disable automatic dependency tracking for dkms builds
Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes #16221 
Closes #16241
2024-06-13 18:08:49 -07:00
Mateusz Guzik
121a2d3354
FreeBSD: unregister mountroot eventhandler on unload
Otherwise if zfs is unloaded and reroot is being used it trips over a
stale pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #16242
2024-06-13 17:49:50 -07:00
bnovkov
20c8bdd85e
FreeBSD: Update use of UMA-related symbols in arc_available_memory
Recent UMA changes repurposed the use of UMA_MD_SMALL_ALLOC in a way
that breaks arc_available_memory on -CURRENT. This change
ensures that arc_available_memory uses the new symbol
while maintaining compatibility with older FreeBSD releases.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Bojan Novković <bnovkov@FreeBSD.org>
Closes #16230
2024-06-06 18:11:00 -07:00
Derek Schrock
4de260efe3
contrib/bash_completion.d: squelch FreeBSD seq when first < last
With seq x -1 z and x is less than z FreeBSD seq will print the error:

	$ seq 1 -1 2
	seq: needs positive increment

Hide this error.  Alternatively $COMP_CWORD could be checked for < 2.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Derek Schrock <dereks@lifeofadishwasher.com>
Closes #16234
2024-06-06 17:37:26 -07:00
Ameer Hamza
b558f0a9d6
zdb: fix FreeBSD build failure
This fixes FreeBSD build failure with clang-18 after 23a489a got merged.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16252
2024-06-06 17:01:26 -07:00
Ameer Hamza
23a489a411
zdb: detect cachefile automatically otherwise force import
If a pool is created with the cache file located in a non-default 
path /etc/default/zpool.cache, removed, or the cachefile property 
is set to none, zdb fails to show the pool unless we specify the 
cache file or use the -e option. This PR automates this process.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16071
2024-06-03 16:28:43 -07:00
Rob Norris
a72751a342 icp: remove redundant FreeBSD check
We don't build illumos-crypto for FreeBSD.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:59 -07:00
Rob Norris
4e714c0be1 icp: remove unused headers
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:51 -07:00
Rob Norris
ae512620d0 icp: remove skein module
Nothing calls it through the KCF interface, so this is all unused.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:39 -07:00
Rob Norris
f39241aeb3 icp: remove unused SHA2 HMAC mechanisms
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:30 -07:00
Rob Norris
10de12e9ed icp: reorganise SHA2 digest mechanisms
sha2_mech_type_t serves double-duty, as the list of MAC providers and
also the algo type for direct callers to SHA2Init. Until we disentangle
that, reorganise it to make the separation more clear. While we're
there, remove the digest mechs we don't use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:23 -07:00
Rob Norris
1291c46ea4 icp: remove digest entry points
For whatever reason, we call digest mechanisms directly, not through the
KCF digest provider. So we can remove those entry points entirely.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:16 -07:00
Rob Norris
94f1e56e41 icp: remove unused KCF_ macros
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:06 -07:00
Rob Norris
4ed91dc26e icp: remove unusued incremental cipher methods
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:59 -07:00
Rob Norris
57249bcddc icp: brutally remove unused AES modes
Still retaining the struture, for now.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:51 -07:00
Rob Norris
4185179190 icp: remove unused blowfish_ctx and des_ctx
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:31 -07:00
Tony Hutter
a301dc364c
ZTS: Fix redacted_send failures on FreeBSD
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224
2024-05-31 15:11:00 -07:00
Zhenlei Huang
e2357561b9
FreeBSD: Add const qualifier to members of struct opensolaris_utsname
These members have directly references to the global variables
exposed by the kernel. They are not going to be changed by this
kernel module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Zhenlei Huang <zlei@FreeBSD.org>
Closes #16210
2024-05-30 09:58:20 -07:00
Pawel Jakub Dawidek
5137c132a5
zpool import output is not formated properly.
The 'zpool status' output assumes that the longest prefix is six
character long plus colon plus space, eg. 'status: ', 'action: '
or 'config: ' (so eight in total). This works well even when we have
messages that requires more than one line, as '\t' is exactly eight
characters, just like the longest prefix.

The 'zpool import' output is a bit different, as it may display the
comment pool property, then the longest prefix is 'comment: ', which is
nine characters long, not eight.
All the prefixes were given an extra space in front, but:
- 'status: ' did not get an extra space.
- Messages that require more than one line should use nine spaces of
  indentation, not eight.
- The extra space in front looks redundant if there is no comment
  property set on the given pool.

Fix it by adding an extra space to all prefixes, but only if the comment
property is defined. Also, when we need to continue the message in a new
line, use '\t ' for indentation.

While here, apply small corrections to a couple messages.

Before:

   pool: tank
     id: 7412636063178848859
  state: ONLINE
status: Some supported features are not enabled on the pool.
	(Note that they may be intentionally disabled if the
	'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identif[...]
	some features will not be available without an explicit 'zp[...]
comment: Example comment.
 config:

	bclone      ONLINE
	  ada0      ONLINE

After:

  pool: tank
    id: 10180960571062436759
 state: ONLINE
status: Some supported features are not enabled on the pool.
	(Note that they may be intentionally disabled if the
	'compatibility' property is set.)
action: The pool can be imported using its name or numeric identifi[...]
	some features will not be available without an explicit 'zp[...]
config:

	tank        ONLINE
	  ada3      ONLINE

   pool: dozer
     id: 11028319538368222579
  state: ONLINE
 status: Some supported features are not enabled on the pool.
	 (Note that they may be intentionally disabled if the
	 'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identif[...]
	 some features will not be available without an explicit 'z[...]
comment: Example comment.
 config:

	dozer       ONLINE
	  ada1      ONLINE

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Dawidek <pawel@dawidek.net>
Closes #16128
2024-05-29 13:34:59 -07:00
Martin Matuška
ae22044da9
spl: fix compilation without HAVE_BACKTRACE
The __maybe_unused macro is defined in spl/sys/debug.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16229
2024-05-29 10:51:01 -07:00
Pawel Jakub Dawidek
01c8efdd59
Simplify issig().
We always call it twice with JUSTLOOKING and then FORREAL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #16225
2024-05-29 10:49:11 -07:00
Brian Behlendorf
6b95031f56
zed: Add deadman-slot_off.sh zedlet
Optionally turn off disk's enclosure slot if an I/O is hung
triggering the deadman.

It's possible for outstanding I/O to a misbehaving SCSI disk to
neither promptly complete or return an error.  This can occur due
to retry and recovery actions taken by the SCSI layer, driver, or
disk.  When it occurs the pool will be unresponsive even though
there may be sufficient redundancy configured to proceeded without
this single disk.

When a hung I/O is detected by the kmods it will be posted as a
deadman event.  By default an I/O is considered to be hung after
5 minutes.  This value can be changed with the zfs_deadman_ziotime_ms
module parameter.  If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set
the disk's enclosure slot will be powered off causing the outstanding
I/O to fail.  The ZED will then handle this like a normal disk failure.
By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set.

As part of this change `zfs_deadman_events_per_second` is added
to control the ratelimitting of deadman events independantly of
delay events.  In practice, a single deadman event is sufficient
and more aren't particularly useful.

Alphabetize the zfs_deadman_* entries in zfs.4.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16226
2024-05-29 10:46:41 -07:00
Alexander Motin
800d59d577
Some improvements to metaslabs eviction
- Add old eviction for special and dedup metaslab classes. Those
vdevs may be potentially big and fragmented with large metaslabs,
while their asynchronous write pattern is not really different
from normal class. It seems an omission to not evict old metaslabs
from them.
 - If we have metaslab preload enabled, which means we are not too
low on memory, do not evict active metaslabs even if they are not
used for some time.  Eviction of active metaslabs means we won't
be able to write anything until we load them, that may take some
time, that is straight opposite to metaslab preload goals.  For
small systems the memory saving should be less important after
recent reduction in number of allocators and so open metaslabs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16214
2024-05-29 08:53:31 -07:00
Alexander Motin
02c5aa9b09
Destroy ARC buffer in case of fill error
In case of error dmu_buf_fill_done() returns the buffer back into
DB_UNCACHED state.  Since during transition from DB_UNCACHED into
DB_FILL state dbuf_noread() allocates an ARC buffer, we must free
it here, otherwise it will be leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
Closes #15802
Closes #16216
2024-05-24 19:11:18 -07:00
George Amanakis
8865dfbcaa
Fix assertion in Persistent L2ARC
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16202
Closes #16207
2024-05-24 19:02:58 -07:00
Rob N
d0aa9dbccf
Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206
2024-05-24 19:00:29 -07:00
Rob N
34906f8bbe
zap: reuse zap_leaf_t on dbuf reuse after shrink
If a shrink or truncate had recently freed a portion of the ZAP, the
dbuf could still be sitting on the dbuf cache waiting for eviction. If
it is then allocated for a new leaf before it can be evicted, the
zap_leaf_t is still attached as userdata, tripping the VERIFY.

Instead, just check for the userdata, and if we find it, reuse it.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16157.
Closes #16204
2024-05-24 18:55:47 -07:00
Rob N
708be0f415
Linux 6.7 compat: detect if kernel defines intptr_t
Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16201
2024-05-24 18:54:24 -07:00
Brooks Davis
7572e8ca04
Avoid a gcc -Wint-to-pointer-cast warning
On 32-bit platforms long long is generally 64-bits.  Sufficiently modern
versions of gcc (13 in my testing) complains when casting a pointer to
an integer of a different width so cast to uintptr_t first to avoid the
warning.

Fixes: c183d164aa Parallel pool import

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #16203
2024-05-24 18:45:58 -07:00
Pawel Jakub Dawidek
08648cf0da
Allow block cloning to be interrupted by a signal.
Even though block cloning is much faster than regular copying,
it is not instantaneous - the file might be large and the recordsize
small. It would be nice to be able to interrupt it with a signal
(e.g., SIGINFO on FreeBSD to see the progress).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #16208
2024-05-24 18:45:09 -07:00
Alexander Motin
efbef9e6cc
FreeBSD: Add zfs_link_create() error handling
Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in #7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13215
Closes #16138
2024-05-16 17:56:55 -07:00