Commit Graph

10465 Commits

Author SHA1 Message Date
Brian Behlendorf e9a8c6e080 draid: allow seq resilver reads from degraded vdevs
When sequentially resilvering allow a dRAID child to be read
as long as the DTLs indicate it should have a good copy of the
data and the leaf isn't being rebuilt.  The previous check was
slightly too broad and would skip dRAID spare and replacing
vdevs if one of their children was being replaced.  As long
as there exists enough additional redundancy this is fine, but
when there isn't this vdev must be read in order to correctly
reconstruct the missing data.

A new test case has been added which exhausts the available
redundancy, faults another device causing it to be degraded,
and then performs a sequential resilver for the degraded device.
In such a situation enough redundancy exists to perform the
replacement and a scrub should detect no checksum errors.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18405
2026-04-23 14:59:47 -07:00
Alexander Motin 63b8da8ff7 Linux: Refactor zpl_fadvise()
Similar to FreeBSD stop issuing prefetches on POSIX_FADV_SEQUENTIAL.
It should not have this semantics, only hint speculative prefetcher,
if access ever happen later.  Instead after POSIX_FADV_WILLNEED
handling call generic_fadvise(), if available, to do all the generic
stuff, including setting f_mode in struct file, that we could later
use to control prefetcher as part of read/write operations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18395
2026-04-23 14:59:39 -07:00
Tony Hutter 26e9a69fea CI: Free 35GB of unused files on the runner
Free 35GB of unused files, mostly from unused development environments.
This helps with the out of disk space problems we were seeing on
FreeBSD runners.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18400
2026-04-23 14:59:35 -07:00
Rob Norris fc285caa84 linux/vfsops: remove zfs_mnt_t, pass directly
A cleanup of opportunity. Since we already are modifying the contents of
zfs_mnt_t, we've broken any API guarantee, so we might as well go the
rest of the way and get rid of it, and just pass the osname and/or the
vfs_t directly.

It seems like zfs_mnt_t was never really needed anyway; it was added in
1c2555ef92 (March 2017) to minimise the difference to illumos, but
zfs_vfsops was made platform-specific anyway in 7b4e27232d.

We also remove setting SB_RDONLY on the caller's flags when failing a
read-write remount on a read-only snapshot or pool. Since 0f608aa6ca
the caller's flags have been a pointer back to fc->sb_flags, which are
discarded without further ceremony when the operation fails, so the
change is unnecessary and we can simplify the call further.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:31 -07:00
Rob Norris a8942fdb89 linux/super: work around kernels that enforce "forbidden" mount options
Before Linux 5.8 (include RHEL8), a fixed set of "forbidden" options
would be rejected outright. For those, we work around it by providing
our own option parser to avoid the codepath in the kernel that would
trigger it.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:27 -07:00
Rob Norris 0b223ef577 linux/super: implement new mount params parser
Adds zpl_parse_param and wires it up to the fs_context. This uses the
kernel's standard mount option parsing infrastructure to keep the work
we need to do to a minimum. We simply fill in the vfs_t we attached to
the fs_context in the previous commit, ready to go for the mount/remount
call.

Here we also document all the options we need to support, and why. It's
a lot of history but in the end the implementation is straightforward.

Finally, if we get SB_RDONLY on the proposed superblock flags, we record
that as the readonly mount option, because we haven't necessarily seen a
"ro" param and we still need to know for remount, the `readonly` dataset
property, etc.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:22 -07:00
Rob Norris 43eed9ee41 linux/super: match vfs_t lifetime to fs_context
vfs_t is initially just parameters for the mount or remount operation,
so match them to the lifetime of the fs_context that represents that
operation.

When we actually execute the operation (calling .get_tree or .reconfigure),
transfer ownership of those options to the associated zfsvfs_t.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:18 -07:00
Rob Norris f5a60b6cae linux/super: remove zpl_parse_monolithic
Final bit of cleanup of the old method.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:14 -07:00
Rob Norris 36ae5a65aa linux/vfsops: remove old options parser
We're working to replace this, and its easier to drop it outright while
we get set up.

To keep things compiling, the calls to zfsvfs_parse_options() are
replaced with zfsvfs_vfs_alloc(), though without any option parsing at
all nothing will work. That's ok, next commits are working towards it.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:09 -07:00
Rob Norris 7843c42b27 linux/vfsops: add vfs_t allocator, make public
In a few commits, we're going  to need to allocate and free vfs_t from
zpl_super.c as well, so lets keep them uniform.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18377
2026-04-23 14:59:02 -07:00
Andriy Tkachuk 9b8ccbd2cb draid: fix import failure after disks replacements
Currently, it's possible that draid vdev asize would decrease
after disks replacements when the disk size is a little less than
all other disks in the pool. In such situations, import would
fail on this check in vdev_open():

        /*
         * Make sure the allocatable size hasn't shrunk too much.
         */
        if (asize < vd->vdev_min_asize) {
                vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
                    VDEV_AUX_BAD_LABEL);
                return (SET_ERROR(EINVAL));
        }

Solution: fix vdev_draid_min_asize() so that it would round up
the required minimal disk capacity to the VDEV_DRAID_ROWHEIGHT.
This would refuse replacements with the disks whose size is less
than minimally required to avoid draid asize decrement.

Note: we also use VDEV_DRAID_ROWHEIGHT in vdev_draid_open() when
calculating asize, and thats why we need to round up min_size at
vdev_draid_min_asize() to avoid asize drops.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18380
2026-04-23 14:58:57 -07:00
Rob Norris 3ca81f610b Linux 7.0: ensure LSMs get to process mount options
Normally, kernel gives any LSM registering a `sb_eat_lsm_opts` hook a
first look at mount options coming in from a userspace mount request.
The LSM may process and/or remove any options. Whatever is left is
passed to the filesystem.

This is how the dataset properties `context`, `fscontext`, `defcontext`
and `rootcontext` are used to configure ZFS mounts for SELinux. libzfs
will fetch those properties from the dataset, then add them to the mount
options.

In 0f608aa6ca (#18216) we added our own mount shims to cover the loss of
the kernel-provided ones. It turns out that if a filesystem provides a
`.parse_monolithic callback`, it is expected to do _all_ mount option
parameter processing - the kernel will not get involved at all. Because
of that, LSMs are never given a chance to process mount options. The
`context` properties are never seen by SELinux, nor are any other
options targetting other LSMs.

Fix this by calling `security_sb_eat_lsm_opts()` in
`zpl_parse_monolithic()`, before we stash the remaining options for
`zfs_domount()`.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18376
2026-04-23 14:58:50 -07:00
Christos Longros 74052404c6 ci: update FreeBSD CI images from 14.3 to 14.4
Update FreeBSD CI targets from 14.3 to 14.4 in both the QEMU
start script and the workflow configuration.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18362
2026-04-23 14:58:44 -07:00
John Cabaj 6756fd4740 Linux 7.0: autoconf: Remove copy-from-user-inatomic API checks (#18348) (#18354)
This function was removed in c6442bd3b6: "Removing old code outside
of 4.18 kernsls", but fails at present on PowerPC builds due to the
recent inclusion of 6bc9c0a90522: "powerpc: fix KUAP warning in VMX
usercopy path" in the upstream kernel, which introduces a use of
cpu_feature_keys[], which is a GPL-only symbol. Removing the API
check as it doesn't appear necessary.

Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
2026-04-23 14:58:39 -07:00
Tony Hutter 0d42a6c357 CI: Add ARM builder
Do a ZFS build inside of an ARM runner.  This only does a simple
build, it does not run the test suite.  The build runs on the
runner itself rather than in a VM, since nesting is not supported on
Github ARM runners.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18343
2026-04-23 14:58:34 -07:00
Ameer Hamza 2c861ebcde CI: Support repository variable override for ZTS OS selection
Allow restricting ZTS OS targets by setting the vars.ZTS_OS_OVERRIDE
repository variable (e.g. '["debian13"]') to reduce shared runner
contention when running the full OS matrix is unnecessary. When unset,
the existing ci_type-based OS selection is used unchanged.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18342
2026-04-23 14:58:28 -07:00
Rob Norris 20b8936c1a linux/super: flatten zpl_fill_super into zpl_get_tree
Target of opportunity; with no other callers, there's no need for it to
be a static function.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-04-23 14:57:44 -07:00
Rob Norris 04692b29da linux/super: flatten zpl_mount_impl into zpl_get_tree
Target of opportunity; with no other callers, there's no need for it to
be a static function.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-04-23 14:57:37 -07:00
Rob Norris 7c3f75af2f linux/super: flatten mount/remount into get_tree/reconfigure
With the old API gone, there's no need to massage new-style calls into
its shape and call another function; we can just make those handlers
work directly.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-04-23 14:57:29 -07:00
Rob Norris 0edbfbfb2d linux/super: remove support for old mount API
Removing the HAVE_FS_CONTEXT gates and anything that would be used if it
wasn't set.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-04-23 14:57:23 -07:00
Rob Norris bec56a4c10 config: refuse to build without fs_context
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-04-23 14:57:18 -07:00
Rob Norris 59185c5691 Linux 7.0: also set setlease handler on directories (#18331)
It turns out the kernel can also take directory leases, most notably in
the NFS server. Without a setlease handler on the directory file ops,
attempts to open a directory over NFS can fail with EINVAL.

Adding a directory setlease handler was missed in 168023b603. This fixes
that, allowing directories to be properly accessed over NFS.

Sponsored-by: TrueNAS
Reported-by: Satadru Pramanik <satadru@gmail.com>

Signed-off-by: Rob Norris <rob.norris@truenas.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2026-04-23 14:57:13 -07:00
Brian Behlendorf 1bc922516e ZTS: Add back redundancy_draid_spare3 exception
Observed again in the CI.  Put the maybe exception back in place
and reference a newly created issue for this sporadic failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18320
2026-04-23 14:57:08 -07:00
Brian Behlendorf 7894a5e884 ZTS: redundancy_draid_spare{1,3} exceptions
Update the redundancy_draid_spare1 exception to reference an issue
which describes the failure.

Remove the exception for the redundancy_draid_spare3 test.  I have
not observed it in local testing.  If it reproduces in the CI we
can create a new issue for it and put back the exception.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18308
2026-04-23 14:57:00 -07:00
Rob Norris 97949da709 config: fix STATX_MNT_ID detection
statx(2) requires _GNU_SOURCE to be defined in order for sys/stat.h to
produce a definition for struct statx and the STATX_* defines. We get
that at compile time because we pass -D_GNU_SOURCE through to
everything, but in the configure check we aren't setting _GNU_SOURCE, so
we don't find STATX_MNT_ID, and so don't set HAVE_STATX_MNT_ID.

(This was fine before ccf5a8a6fc, because linux/stat.h does not require
_GNU_SOURCE).

Simple fix: in the check, define _GNU_SOURCE before including
sys/stat.h.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18312
2026-04-23 14:56:54 -07:00
Andriy Tkachuk 938c8c98b1 draid: fix data corruption after disk clear
Currently, when there there are several faulted disks with attached
dRAID spares, and one of those disks is cleared from errors (zpool
clear), followed by its spare being detached, the data in all the
remaining spares that were attached while the cleared disk was in
FAULTED state might get corrupted (which can be seen by running scrub).
In some cases, when too many disks get cleared at a time, this can
result in data corruption/loss.

dRAID spare is a virtual device whose blocks are distributed among
other disks. Those disks can be also in FAULTED state with attached
spares on their own. When a disk gets sequentially resilvered (rebuilt),
the changes made by that resilvering won't get captured in the DTL
(Dirty Time Log) of other FAULTED disks with the attached spares to
which the data is written during the resilvering (as it would normally
be done for the changes made by the user if a new file is written or
some existing one is deleted). It is because sequential resilvering
works on the block level, without touching or looking into metadata,
so it doesn't know anything about the old BPs or transactions groups
that it is resilvering. So later on, when that disk gets cleared
from errors and healing resilvering is trying to sync all the data
from its spare onto it, all the changes made on its spare during the
resilvering of other disks will be missed because they won't be
captured in its DTL. That's why other dRAID spares may get corrupted.

Here's another way to explain it that might be helpful. Imagine a
scenario:

1. d1 fails and gets resilvered to some spare s1 - OK.
2. d2 fails and gets sequentially resilvered on draid spare s2. Now,
   in some slices, s2 would map to d1, which is failed. But d1 has s1
   spare attached, so the data from that resilvering goes to s1, but
   not recorded in d1's DTL.
3. Now, d1 gets cleared and its s1 gets detached. All the changes
   done by the user (writes or deletions) have their txgs captured
   in d1's DTL, so they will be resilvered by the healing resilver
   from its spare (s1) - that part works fine. But the data which
   was written during resilvering of d2 and went to s1 - that one
   will be missed from d1's DTL and won't get resilvered to it. So
   here we are:
4. s2 under d2 is corrupted in the slices which map to d1, because
   d1 doesn't have that data resilvered from s1.

Now, if there are more failed disks with draid spares attached which
were sequentially resilvered while d1 was failed, d3+s3, d4+s4 and
so on - all their spares will be corrupted. Because, in some slices,
each of them will map to d1 which will miss their data.

Solution: add all known txgs starting from TXG_INITIAL to DTLs of
non-writable devices during sequential resilvering so when healing
resilver starts on disk clear, it would be able to check and heal
blocks from all txgs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18286
Closes #18294
2026-04-23 14:54:23 -07:00
Andriy Tkachuk 33961142a2 Fix deadlock on dmu_tx_assign() from vdev_rebuild()
vdev_rebuild() is always called with spa_config_lock held in
RW_WRITER mode. However, when it tries to call dmu_tx_assign()
the latter may hang on dmu_tx_wait() waiting for available txg.
But that available txg may not happen because txg_sync takes
spa_config_lock in order to process the current txg. So we have
a deadlock case here:

 - dmu_tx_assign() waits for txg holding spa_config_lock;
 - txg_sync waits for spa_config_lock not progressing with txg.

Here are the stacks:

    __schedule+0x24e/0x590
    schedule+0x69/0x110
    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    dmu_tx_wait+0x8e/0x1e0 [zfs]
    dmu_tx_assign+0x49/0x80 [zfs]
    vdev_rebuild_initiate+0x39/0xc0 [zfs]
    vdev_rebuild+0x84/0x90 [zfs]
    spa_vdev_attach+0x305/0x680 [zfs]
    zfs_ioc_vdev_attach+0xc7/0xe0 [zfs]

    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    spa_config_enter+0xf9/0x120 [zfs]
    spa_sync+0x6d/0x5b0 [zfs]
    txg_sync_thread+0x266/0x2f0 [zfs]

The solution is to pass txg returned by spa_vdev_enter(spa)
at the top of spa_vdev_attach() to vdev_rebuild() and call
dmu_tx_create_assigned(txg) which doesn't wait for txg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18210
Closes #18258
2026-04-23 14:54:14 -07:00
Rob Norris 12cd6ffa39 README: describe specific kernels/distros we target
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-04-23 14:33:35 -07:00
Rob Norris 5445c3720b config: remove minimum kernel version check
The autoconf checks are more than enough to decide whether or not we can
work with this kernel or not.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-04-23 14:33:28 -07:00
Ameer Hamza cb2e2f9c4f libzfs: use mount_setattr for selective remount including legacy mounts
When a namespace property is changed via zfs set, libzfs remounts the
filesystem to propagate the new VFS mount flags. The current approach
uses mount(2) with MS_REMOUNT, which reads all namespace properties
from ZFS and applies them together. This has two problems:

1. Linux VFS resets unspecified per-mount flags on remount. If an
   administrator sets a temporary flag (e.g. mount -o remount,noatime),
   a subsequent zfs set on any namespace property clobbers it.

2. Two concurrent zfs set operations on different namespace properties
   can overwrite each other's mount flags.

Additionally, legacy datasets (mountpoint=legacy) were never remounted
on namespace property changes since zfs_is_mountable() returns false
for them.

Add zfs_mount_setattr() which uses mount_setattr(2) to selectively
update only the mount flags that correspond to the changed property.
For legacy datasets, /proc/mounts is iterated to update all
mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy
datasets fall back to a full remount; legacy mounts are skipped to
avoid clobbering temporary flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18257
2026-04-23 14:33:23 -07:00
Alexander Ziaee a94b137aac FreeBSD: Improve dmesg kernel message prefix
Provide intuitive log search keywords and increased system consistency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Ziaee <ziaee@FreeBSD.org>
Closes #18290
2026-04-23 14:33:15 -07:00
Juhyung Park 02ed091060 Fix check for .cfi_negate_ra_state on aarch64
Checking for LD_VERSION in unreliable as not all distros define it on
the compiler's preprocessor.

Explicitly check it via autoconf.

This fixes support for Ubuntu 18.04 on arm64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Closes #18262
2026-04-23 14:33:00 -07:00
Rob Norris 1ace2bf889 zpl_super: prefer "new" mount API when available
This API has been available since kernel 5.2, and having it available
(almost) everywhere should give us a lot more flexibility for mount
management in the future.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18260
2026-04-23 14:31:33 -07:00
Tony Hutter 04daeffe7c CI: Remove deprecated Fedora 41
Fedora 41 was deprecated on Dec 15 2025.  Remove it from CI tests.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18261
2026-04-23 14:31:21 -07:00
Rob Norris 20a30acc54 Linux 7.0: add shims for the fs_context-based mount API
The traditional mount API has been removed, so detect when its not
available and instead use a small adapter to allow our existing mount
functions to keep working.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-04-23 14:31:15 -07:00
Rob Norris ffa0a5af30 Linux 7.0: posix_acl_to_xattr() now allocates memory
Kernel devs noted that almost all callers to posix_acl_to_xattr() would
check the ACL value size and allocate a buffer before make the call. To
reduce the repetition, they've changed it to allocate this buffer
internally and return it.

Unfortunately that's not true for us; most of our calls are from
xattr_handler->get() to convert a stored ACL to an xattr, and that call
provides a buffer. For now we have no other option, so this commit
detects the new version and wraps to copy the value back into the
provided buffer and then free it.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-04-23 14:31:09 -07:00
Rob Norris 786b7c2a90 Linux 7.0: blk_queue_nonrot() renamed to blk_queue_rot()
It does exactly the same thing, just inverts the return. Detect its
presence or absence and call the right one.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-04-23 14:31:04 -07:00
Louis Leseur ca18f1ad5f build: get objtool from $kernelbuild
On systems where `$kernelsrc` is different than `$kernelbuild`, the
objtool binary will be located in `$kernelbuild` as it's the result of
running `make prepare` during kernel build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Louis Leseur <louis.leseur@gmail.com>
Closes #18248
Closes #18249
2026-04-23 14:30:58 -07:00
Rob Norris faddb7f5ca Linux 7.0: explicitly set setlease handler to kernel implementation
The upcoming 7.0 kernel will no longer fall back to generic_setlease(),
instead returning EINVAL if .setlease is NULL. So, we set it explicitly.

To ensure that we catch any future kernel change, adds a sanity test for
F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test,
also a small adjustment to the test runner to allow OS-specific helper
programs.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18215
2026-04-23 14:30:53 -07:00
Rob Norris 423466063d spdxcheck: enforce SPDX license tags on build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-04-23 14:30:23 -07:00
Rob Norris fc44c73021 build: add SPDX license tags to build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-04-23 14:29:46 -07:00
Tony Hutter 1c702dda34 Tag zfs-2.4.1
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
zfs-2.4.1
2026-02-19 11:14:37 -08:00
Alexander Motin 3dcd071b51 Fix available space accounting for special/dedup (#18222)
Currently, spa_dspace (base to calculate dataset AVAIL) only includes
the normal allocation class capacity, but dd_used_bytes tracks space
allocated across all classes.  Since we don't want to report free
space of other classes as available (we can't promise new allocations
will be able to use it), report only allocated space, similar to how
we report space saved by dedup and block cloning.

Since we need deflated space here, make allocation classes track
deflated allocated space also.  While here, make mc_deferred also
deflated, matching its use contexts.  Also while there, use
atomic_load() to read the allocation class stats.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18190
Closes #18222
2026-02-19 11:14:37 -08:00
Tony Hutter 46500a0803 CI: Test & fix Linux ZFS built-in build
ZFS can be built directly into the Linux kernel.  Add a test build
of this to the CI to verify it works.  The test build is only enabled
on Fedora runners (since they run the newest kernels) and is done in
parallel with ZTS.  The test build is done on vm2, since it typically
finishes ~15min before vm1 and thus has time to spare.

In addition:

- Update 'copy-builtin' to check that $1 is a directory
- Fix some VERIFYs that were causing the built-in build to fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18234
2026-02-19 11:14:37 -08:00
Attila Fülöp c629e594e4 Linux 6.19 compat: in-tree build: fix duplicate GCM assembly functions
Linux 6.19 added an AES-GCM VAES-AVX2 assembly implementation. It's
basically a translation from the BoringSSL perlasm syntax to macro
assembly. We're using the same source but the perlasm generated flat
assembly which shares some global function names with the former.
When  building in-tree this results in the linker failing due to the
duplicate symbols.

To avoid the error we prepend `icp_` via a macro to our function
names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Moch <mail@alexmoch.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18204
Closes #18224
2026-02-17 13:52:43 -08:00
rmacklem f83a7864aa zfs_vnops_os.c: Move a vput() to after zfs_setattr_dir()
Without this patch, the following crash can occur when
a file system is configured with "xattr=dir".

VNASSERT failed: locked not true at
 /posix-acl/freebsd-rdma/sys/kern/vfs_subr.c:5786 (assert_vop_locked)
    hold count flags ()
    flags ()
    lock type zfs: UNLOCKED
panic: zfs_dirent_lookup: vnode is not locked but should be
cpuid = 3
time = 1770520763
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x136/frame 0xfffffe00914c8270
panic() at panic+0x43/frame 0xfffffe00914c82d0
assert_vop_locked() at assert_vop_locked+0x78
zfs_dirent_lookup() at zfs_dirent_lookup+0x41
zfs_setattr_dir() at zfs_setattr_dir+0x123
zfs_setattr() at zfs_setattr+0x1389
zfs_freebsd_setattr() at zfs_freebsd_setattr+0x56b
VOP_SETATTR_APV() at VOP_SETATTR_APV+0x5d
setfown() at setfown+0xb1
kern_fchownat() at kern_fchownat+0x192

This patch fixes the problem by moving the vput() call for
attrzp to after the zfs_setattr_dir() call that takes it as
an argument.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes: #18188
2026-02-17 11:54:58 -08:00
Austin Wise 612d4019f1 Fix activating large_microzap on receive
This ensures that the in-memory state of the feature is recorded and
that `dsl_dataset_activate_feature` is not called when the feature
is already active.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Austin Wise <AustinWise@gmail.com>
Closes #18143
Closes #18144
2026-02-17 11:54:58 -08:00
Alexander Motin 25327ed7ce Improve caching for dbuf prefetches
To avoid read errors with transaction open dmu_tx_check_ioerr()
is used to read everything required in advance.  But there seems
to be a chance for the buffer to evicted from dbuf cache in
between, which result in immediate eviction from ARC, which may
require additional disk read later in a place where error handling
is problematic.

To partially workaround this introduce a new flag DMU_IS_PREFETCH,
relayed to ARC as ARC_FLAG_PREFETCH | ARC_FLAG_PRESCIENT_PREFETCH,
making ARC delay eviction by at least several seconds, or till the
actual read inside the transaction, that will promote it to demand
access.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18160
2026-02-17 11:54:58 -08:00
Mariusz Zaborski 11647c669e Flush RRD only when TXGs contain data
This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #18082
Closes #18138
2026-02-11 11:41:13 -08:00
Marc Sladek a0350f61c4 Fix send:raw permission for send -w -I
When performing an incremental raw send with intermediates (-w -I),
the standard 'send' permission was incorrectly required instead of
allowing 'send:raw'. This was due to a strict boolean comparison on
the 'rawok' flag in zfs_secpolicy_send() with non-boolean value.

This change normalizes the 'rawok' variable to be strictly 0/1 and
updates the test suite to properly verify delegated raw send behavior.

Introduced-by: https://github.com/openzfs/zfs/pull/17543
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marc Sladek <marc@sladek.dev>
Closes #18198
Closes #18193
2026-02-11 11:41:13 -08:00