Commit Graph

10583 Commits

Author SHA1 Message Date
Rob Norris
143f410e99 libspl/mnttab: make mnttab source filenames consistent
FreeBSD's getextmntent.c is only separate because it has a different
license to mnttab.c, otherwise it would go there too.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
c0ea89db9f libzfs/mnttab: shorten names, reorg a bit
We can't change the public interface, but internally we don't need so
much redundant naming.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
f43cb1fef6 libzfs/mnttab: lift node alloc/free
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
0ecf5e3f62 libzfs/mnttab: always enable the cache
There's no real reason not to enable it always; the `zfs` command always
enables it anyway, and right now there's multiple places that do mount
work that don't go through the cache anyway. Having it always be on lets
us remove a bunch of the fallback code.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
b5637fba1c libzfs/mnttab: use SPL mutexes
More consistent, less typing, and we can check ownership.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
02224bca40 libzfs/mnttab: lift mnttab cache into separate file
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Alek P
ae7fcd5f92
fix libzfs diff mem leak in an error path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18301
2026-03-10 12:39:49 -07:00
Ameer Hamza
5b93d1a218 L2ARC: Fix prev_hdr use-after-free in l2arc_write_sublist
prev_hdr is dereferenced after the sublist lock is dropped for write I/O
but nothing prevents it from being freed during that window. Eliminate
prev_hdr entirely and simplify persistent marker repositioning logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:23 -07:00
Ameer Hamza
be5d36919a man: Update L2ARC documentation for depth cap and write budget fairness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:18 -07:00
Ameer Hamza
b27a87f399 L2ARC: Write budget fairness for metadata monopolization
Under heavy metadata load, metadata passes can monopolize the write
budget every cycle while data passes get nothing written. Track
consecutive monopolized cycles per device in l2ad_meta_cycles. After
l2arc_meta_cycles (default 2) consecutive cycles where metadata fills
the write budget, skip metadata for one cycle to let data run.  Reset
the counter when nothing is written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:14 -07:00
Ameer Hamza
62ca8f721b L2ARC: Scan-based depth cap for persistent markers
With persistent markers and inclusive scanning, the marker traverses the
entire ARC state across many feed cycles, writing buffers far from the
tail that may no longer be relevant.

Track cumulative bytes scanned per pass in l2arc_ext_scanned. When scans
reach l2arc_ext_headroom_pct (default 25%) of the ARC state size, reset
the pass markers to the tail via lazy reset flags. This keeps markers
focused on the tail zone where buffers soon to be evicted have the most
value for L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:08 -07:00
Ameer Hamza
15fc3d64c8 L2ARC: Lazy sublist reset flags for persistent markers
Replace direct marker-to-tail manipulation with per-sublist boolean
flags consumed lazily by feed threads.  Each scanning thread resets its
own marker when it sees the flag, rather than having another thread
manipulate the marker directly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:01 -07:00
Ameer Hamza
22fdaf0b1f L2ARC: Even sublist headroom distribution with round-robin selection
The dynamic headroom redistribution formula gave later sublists
progressively larger scanning budgets, and random sublist selection
caused uneven coverage across sublists. For depth cap to work
effectively, each sublist should be equally and fairly treated.
Use equal per-sublist headroom (headroom / num_sublists) for even
distribution and deterministic round-robin selection for fair
coverage across cycles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 10:59:41 -07:00
Rob Norris
0b0971f82f README: describe specific kernels/distros we target
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:18 -07:00
Rob Norris
97e080c496 config: remove minimum kernel version check
The autoconf checks are more than enough to decide whether or not we can
work with this kernel or not.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:01 -07:00
Ivan Shapovalov
8531621aba zfs_main: create, clone, rename: accept -pp for non-mountable parents
Teach `zfs {create,clone,rename}` to accept a doubled `-p` flag (`-pp`)
to create non-existing ancestor datasets with `canmount=off`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:50:18 -07:00
Ivan Shapovalov
2f3f1ab1ba libzfs: teach zfs_create_ancestors() to accept properties
This will be used to support creating non-mountable ancestors in zfs(8).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:49:52 -07:00
Ameer Hamza
1eace59060
libzfs: use mount_setattr for selective remount including legacy mounts
When a namespace property is changed via zfs set, libzfs remounts the
filesystem to propagate the new VFS mount flags. The current approach
uses mount(2) with MS_REMOUNT, which reads all namespace properties
from ZFS and applies them together. This has two problems:

1. Linux VFS resets unspecified per-mount flags on remount. If an
   administrator sets a temporary flag (e.g. mount -o remount,noatime),
   a subsequent zfs set on any namespace property clobbers it.

2. Two concurrent zfs set operations on different namespace properties
   can overwrite each other's mount flags.

Additionally, legacy datasets (mountpoint=legacy) were never remounted
on namespace property changes since zfs_is_mountable() returns false
for them.

Add zfs_mount_setattr() which uses mount_setattr(2) to selectively
update only the mount flags that correspond to the changed property.
For legacy datasets, /proc/mounts is iterated to update all
mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy
datasets fall back to a full remount; legacy mounts are skipped to
avoid clobbering temporary flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18257
2026-03-09 11:06:22 -07:00
Alexander Ziaee
d45c8d6489
FreeBSD: Improve dmesg kernel message prefix
Provide intuitive log search keywords and increased system consistency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Ziaee <ziaee@FreeBSD.org>
Closes #18290
2026-03-09 10:17:23 -07:00
Christos Longros
304de7f19b
libzfs: handle EDOM error in zpool_create
When creating a pool with devices that have incompatible block sizes,
the kernel returns EDOM. However, zpool_create() did not handle this
errno, falling through to zpool_standard_error() which produced a
confusing message about invalid property values.

Add a case EDOM handler in zpool_create() to return EZFS_BADDEV with
a descriptive auxiliary message, consistent with the existing EDOM
handler in zpool_vdev_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18268
2026-03-08 12:59:10 -07:00
Andrew Walker
c5905b2cb7
Implement lzc_send_progress
This commit adds an implementation of lzc_send_progress, which
existed in the libzfs_core header, but not in ABI and lacked
an actual implementation. The libzfs_send_progress function
is altered so that it wraps around the lzc operation. This
fills a functional gap in libzfs core.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18288
2026-03-06 11:05:58 -08:00
Juhyung Park
c58b8b7dc2
Fix check for .cfi_negate_ra_state on aarch64
Checking for LD_VERSION in unreliable as not all distros define it on
the compiler's preprocessor.

Explicitly check it via autoconf.

This fixes support for Ubuntu 18.04 on arm64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Closes #18262
2026-03-06 11:04:37 -08:00
Rob Norris
e73ada771d
libzpool: lift zfs_file ops out to separate source file
So its easier to remove and replace on non-Unix platforms.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18281
2026-03-05 18:07:46 -08:00
Garth Snyder
d979457760
zstream: consolidate shared code
zstream currently contains three identical copies of dump_record(),
which appear to all be drawn from libzfs_sendrecv.c. The original
is marked internal.

This PR adds zstream_util.[hc] and puts the shared code there along with
a couple of other items in common.

No functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
Closes #18284
2026-03-05 15:33:03 -08:00
Idefix2020
5dad9459d5
Add --no-preserve-encryption flag
* Add an option to send datasets with params or replicate
without preserving encryption
* Add a test case for the new functionality

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Chris Jacobs <idefix2020dev@gmail.com>
Closes #18240
2026-03-05 15:08:17 -08:00
Rob Norris
c329530e6b Add simd_config.h and HAVE_SIMD() selector
We need to select which SIMD variable to check based on the compilation
target: HAVE_KERNEL_xxx for the Linux kernel, HAVE_TOOLCHAIN_xxx for
other platforms.

This adds a HAVE_SIMD() macro returns the right result depending on the
definedness or value of the variable for this target.

The macro is in simd_config.h, which is forcibly included in every
compiler call (like zfs_config.h), to ensure that it can be used
directly without further includes.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:42 -08:00
Rob Norris
35f74f84e6 Convert all HAVE_<name> SIMD gates to HAVE_SIMD(<name>)
The original names no longer exist, and the new ones will need to be
selectable based on the current compilation target.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:37 -08:00
Rob Norris
92a6ab405f config: also do SIMD checks on the kernel toolchain
The kernel may be built with a different compiler, and also includes
objtool, which may fail on unknwon instructions sequences. So, we want
to run the checks a second time for that toolchain too.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:32 -08:00
Rob Norris
c183268019 config: generate SIMD checks from table
No need to repeat all that boilerplate each time!

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:26 -08:00
Rob Norris
23bd583830 config: remove checks for unused SIMD gates
Specifically, we don't have any code gated on:

    HAVE_SSE
    HAVE_SSE3
    HAVE_SSE4_2
    HAVE_AVX512CD
    HAVE_AVX512DQ
    HAVE_AVX512IFMA
    HAVE_AVX512VBMI
    HAVE_AVX512PF
    HAVE_AVX512ER

So we can remove them and the checks that probe and generate them.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:20 -08:00
Rob Norris
e4b8d6a56f linux/simd_x86: remove obsolete kernel feature gates
Most of the X86_FEATURE_* defines we use were introduced in kernels much
older than those we support, so there's no need to check for them.

For the history, these are the ones being removed, and the kernel
versions/commits where they were introduced:

    <4.6  torvalds/linux@cd4d09ec6f (refactor/consolidation commit)
        OSXSAVE
        BMI1
        BMI2
        AES
        PCLMULQDQ
        MOVBE
        SHA_NI
        AVX512F
        AVX512CD
        AVX512ER
        AVX512PF

    4.6   torvalds/linux@d050049442
        AVX512BW
        AVX512DQ
        AVX512VL

    4.10  torvalds/linux@a8d9df5a50
        AVX512IFMA
        AVX512VBMI

    4.15  torvalds/linux@c128dbfa0f
        VAES
        VPCLMULQDQ

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:00:43 -08:00
Alexander Motin
1e1d64d665
Fix log vdev removal issues
When we clear the log, we should clear all the fields, not only
zh_log.  Otherwise remaining ZIL_REPLAY_NEEDED will prevent the
vdev removal.  Handle it also from the other side, when zh_log
is already cleared, while zh_flags is not.

spa_vdev_remove_log() asserts that allocated space on removed log
device is zero.  While it should be so in perfect world, it might
be not if space leaked at any point.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18277
2026-03-04 09:12:14 -05:00
Brian Behlendorf
f6205fdf64
ZTS: Adjust mmp_on_uberblocks threshold
Decrease the number of required uberblock blocks write slightly due
to observed variation when running in the CI.  This should help
avoid future false positives.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18280
2026-03-03 13:11:51 -08:00
Brian Behlendorf
75659a4e50
ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18274
2026-03-03 11:18:46 -08:00
Rob Norris
1e2c94a043
More consistent use of TREE_* macros in AVL comparators
Where is it appropriate and obvious, use TREE_CMP(), TREE_ISIGN() and
TREE_PCMP() instead or direct comparisons. It can make the code a lot
smaller, less error prone, and easier to read.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18259
2026-03-03 09:08:23 -08:00
Brian Behlendorf
0f90a797dd
Fix vdev_rebuild_range() tx commit
The spa_sync thread waits on ->spa_txg_zio and will set ZIO_WAIT_DONE
before running the sync tasks.  The dmu_tx_commit() call must be done
after we add the child zio to the ->spa_txg_zio parent otherwise its
possible the child is added after txg_sync has waited.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18276
2026-03-03 09:05:34 -08:00
Ryan Moeller
ac0fd40c8c Add zpool properties for allocation class space
The existing zpool properties accounting pool space (size, allocated,
fragmentation, expandsize, free, capacity) are based on the normal
metaslab class or are cumulative properties of several classes combined.

Add properties reporting the space accounting metrics for each metaslab
class individually.

Also introduce pool-wide AVAIL, USABLE, and USED properties reporting
values corresponding to FREE, SIZE, and ALLOC deflated for raidz.

Update ZTS to recognize the new properties and validate reported values.

While in zpool_get_parsable.cfg, add "fragmentation" to the list of
parsable properties.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Cloes #18238
2026-03-02 15:50:23 -08:00
Ryan Moeller
6ba3f915d0 zcommon: Fix description of vdev capacity format
Capacity is reported as a percentage not a size.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Closes #18238
2026-03-02 15:49:23 -08:00
Akash B
f8e5af53e9
Fix redundant declaration of dsl_pool_t
Remove redundant dsl_pool variable and duplicate spa_get_dsl()
call in vdev_rebuild_thread.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #18263
2026-02-27 10:39:52 -08:00
Andriy Tkachuk
f8457fbdc4
Fix deadlock on dmu_tx_assign() from vdev_rebuild()
vdev_rebuild() is always called with spa_config_lock held in
RW_WRITER mode. However, when it tries to call dmu_tx_assign()
the latter may hang on dmu_tx_wait() waiting for available txg.
But that available txg may not happen because txg_sync takes
spa_config_lock in order to process the current txg. So we have
a deadlock case here:

 - dmu_tx_assign() waits for txg holding spa_config_lock;
 - txg_sync waits for spa_config_lock not progressing with txg.

Here are the stacks:

    __schedule+0x24e/0x590
    schedule+0x69/0x110
    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    dmu_tx_wait+0x8e/0x1e0 [zfs]
    dmu_tx_assign+0x49/0x80 [zfs]
    vdev_rebuild_initiate+0x39/0xc0 [zfs]
    vdev_rebuild+0x84/0x90 [zfs]
    spa_vdev_attach+0x305/0x680 [zfs]
    zfs_ioc_vdev_attach+0xc7/0xe0 [zfs]

    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    spa_config_enter+0xf9/0x120 [zfs]
    spa_sync+0x6d/0x5b0 [zfs]
    txg_sync_thread+0x266/0x2f0 [zfs]

The solution is to pass txg returned by spa_vdev_enter(spa)
at the top of spa_vdev_attach() to vdev_rebuild() and call
dmu_tx_create_assigned(txg) which doesn't wait for txg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18210
Closes #18258
2026-02-26 11:18:02 -08:00
Rob Norris
f3d4c79496
zpl_super: prefer "new" mount API when available
This API has been available since kernel 5.2, and having it available
(almost) everywhere should give us a lot more flexibility for mount
management in the future.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18260
2026-02-25 13:17:33 -08:00
Rob Norris
09c27a14a3 icp: add SHA512 implementation using Intel SHA512 extensions
Generated from crypto/sha/asm/sha512-x86_64.pl in
openssl/openssl@241d4826f8.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:48:30 -08:00
Rob Norris
3547a358fd simd: detect and surface support for Intel SHA512 extensions
Recent Intel CPUs (starting with Arrow Lake and Lunar Lake) include new
vectorised SHA512 instructions. Detect them and make them available to
the rest of the system.

Note the internal name "sha512ext". This is to disambiguate from other
uses of "sha512".

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:47:48 -08:00
clefru
6495dafd58
range_tree: use zfs_panic_recover() for partial-overlap remove
zfs_range_tree_remove_impl() used a bare panic() when a segment to be
removed was not completely overlapped by an existing tree entry.  Every
other consistency check in range_tree.c uses zfs_panic_recover(), which
respects the zfs_recover tunable and allows pools with on-disk
corruption to be imported and recovered.  This one call was
inconsistent, making the partial-overlap case unrecoverable regardless
of zfs_recover.

Replace panic() with zfs_panic_recover() so that operators can set
zfs_recover=1 to import a corrupted pool and reclaim data, consistent
with all other range tree error paths.

Related-to: https://github.com/openzfs/zfs/issues/13483
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #18255
2026-02-25 11:26:10 -08:00
Tony Hutter
4da3f059a3
CI: Remove deprecated Fedora 41
Fedora 41 was deprecated on Dec 15 2025.  Remove it from CI tests.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18261
2026-02-25 11:20:23 -08:00
Alexander Motin
991fc56fae
Introduce dedupused/dedupsaved pool properties
Currently there is only a dedup ratio reported via pool properties.
If dedup is enabled only for some datasets, it is impossible to say
how much space the ratio actually covers.  Fix this by introducing
dedupused/dedupsaved pool properties, similar to earlier added
block cloning ones.  Combined with work to expose allocation classes
stats, it should give user-space enough visibility to correlate
`zpool list` and `zfs list` space numbers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18245
2026-02-25 09:41:38 -05:00
Mateusz Piotrowski
3408332d71
zhack: Fix importing large allocation profiles on small pools (#18256)
This patch fixes a segmentation fault in zhack metaslab leak which might
be triggered by feeding zhack with a fragmentation profile that's
exported from a pool larger than the target pool.

Fixes: 8f15d2e4d5
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
2026-02-24 10:24:22 -08:00
Rob Norris
0f608aa6ca Linux 7.0: add shims for the fs_context-based mount API
The traditional mount API has been removed, so detect when its not
available and instead use a small adapter to allow our existing mount
functions to keep working.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:45:12 -08:00
Rob Norris
d34fd6cff3 Linux 7.0: posix_acl_to_xattr() now allocates memory
Kernel devs noted that almost all callers to posix_acl_to_xattr() would
check the ACL value size and allocate a buffer before make the call. To
reduce the repetition, they've changed it to allocate this buffer
internally and return it.

Unfortunately that's not true for us; most of our calls are from
xattr_handler->get() to convert a stored ACL to an xattr, and that call
provides a buffer. For now we have no other option, so this commit
detects the new version and wraps to copy the value back into the
provided buffer and then free it.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:48 -08:00
Rob Norris
204de946eb Linux 7.0: blk_queue_nonrot() renamed to blk_queue_rot()
It does exactly the same thing, just inverts the return. Detect its
presence or absence and call the right one.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:20 -08:00