Compare commits

...

101 Commits

Author SHA1 Message Date
Tony Hutter ab38521f31 Tag zfs-2.3.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-11-13 09:13:28 -08:00
Alexander Motin 12d3e1fc61 FreeBSD: Satisfy ASSERT_VOP_IN_SEQC()
zfs_aclset_common() might be called for newly created or not even
created vnodes, that triggers assertions on newer FreeBSD versions
with DEBUG_VFS_LOCKS included into INVARIANTS.  In the first case
make sure to call vn_seqc_write_begin()/_end(), in the second just
skip the assertion.

The similar has to be done for project management IOCTL and file-
bases extended attributes, since those are not going through VFS.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17722
2025-11-13 09:13:28 -08:00
Tino Reichardt a68a9c726d CI: Update FreeBSD versions and ci-type handling
Update FreeBSD versions:
- add FreeBSD 15.0-STABLE
- add FreeBSD 16.0-CURRENT

So we use the latest versions of each line now:
  - Freebsd 14.3 (RELEASE)
  - FreeBSD 15.0 (STABLE)
  - FreeBSD 16.0 (CURRENT)

In commits - you may specify which type of CI should run:
- ZFS-CI-Type: quick
- ZFS-CI-Type: linux
- ZFS-CI-Type: freebsd
- ZFS-CI-Type: full

Reviewed-by: Alexx Saver <lzsaver@users.noreply.github.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17896
2025-11-12 10:59:16 -08:00
Alexander Motin 19b9d93970 BRT: Fix ranges to blocks conversion math
BRT_RANGESIZE_TO_NBLOCKS() takes number of ranges as its argument.
To get number of blocks we should multiply it by the entry size,
not divide by it, as it was due to missing parentheses.

Before #17875 this could cause small memory corruptions for vdevs
bigger than 64TB, but the change made the bug more noticeable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17886
Closes #17915
2025-11-12 09:52:16 -08:00
Brian Behlendorf 54d76c8d1e zstd: disable intrinsics
Disable the aarch64 NEON SIMD intrinsics for kernel builds.  Safely
using them in the kernel context requires saving/restoring the FPU
registers which is not currently done.

Additionally, remove the aarch64 optimized PREFETCH_L1 and PREFETCH_L2
instruction.  Rely on the more portable compiler built ins.

This lets us remove the problematic workaround in the aarch64_compat.h
header which undefines the __aarch64__ macro.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17904
Closes #17852
2025-11-12 09:52:16 -08:00
Tony Hutter b8ee796945 Linux 6.17 compat: Fix broken projectquota on 6.17
We need to specifically use the FX_XFLAG_* macros in zpl_ioctl_*attr()
codepaths, and the FS_*_FL macros in the zpl_ioctl_*flags() codepaths.
The earlier code just assumes the FS_*_FL macros for both codepaths.
The 6.17 kernel add a bitmask check in copy_fsxattr_from_user() that
exposed this error via failing 'projectquota' ZTS tests.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17884
Closes #17869
2025-11-12 09:52:16 -08:00
youzhongyang 8e7a310860 Synchronize the update of feature refcount
The concurrent execution of feature_sync() can lead to a panic due 
to an unprotected update of the feature refcount.  Resolve this by
using the spa->spa_feat_stats_lock to synchronize the update of the 
refcount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #17184
Closes #17632
2025-11-05 08:53:06 -08:00
Alexander Motin 9f2cbea1dc zdb: Fix asize overflow in verify_livelist_allocs()
Spacemap entry might be too big to fit into a block pointer ashift.
We hit an assertion trying to run `zdb -bvy` on a large pool.  But
it seems the code does not really need size there, since we only
need to search for a range of offsets, so setting it to zero should
just make btree return position just before the first entry.  I
suspect the previous code could actually miss the first entry
due to this if its size was smaller.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17764
2025-10-22 10:36:30 -07:00
Alexander Motin 81ceee0cff Fix two infinite loops if dmu_prefetch_max set to zero
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17692
Closes #17729
2025-10-22 10:36:30 -07:00
Robert Evans a60214e792 dnode_next_offset: backtrack if lower level does not match
This changes the basic search algorithm from a single search up and down
the tree to a full depth-first traversal to handle conditions where the
tree matches at a higher level but not a lower level.

Normally higher level blocks always point to matching blocks, but there
are cases where this does not happen:

1. Racing block pointer updates from dbuf_write_ready.

   Before f664f1ee7f (#8946), both dbuf_write_ready and
   dnode_next_offset held dn_struct_rwlock which protected against
   pointer writes from concurrent syncs.

   This no longer applies, so sync context can f.e. clear or fill all
   L1->L0 BPs before the L2->L1 BP and higher BP's are updated.

   dnode_free_range in particular can reach this case and skip over L1
   blocks that need to be dirtied. Later, sync will panic in
   free_children when trying to clear a non-dirty indirect block.

   This case was found with ztest.

2. txg > 0, non-hole case. This is #11196.

   Freeing blocks/dnodes breaks the assumption that a match at a higher
   level implies a match at a lower level when filtering txg > 0.

   Whenever some but not all L0 blocks are freed, the parent L1 block is
   rewritten. Its updated L2->L1 BP reflects a newer birth txg.

   Later when searching by txg, if the L1 block matches since the txg is
   newer, it is possible that none of the remaining L1->L0 BPs match if
   none have been updated.

   The same behavior is possible with dnode search at L0.

   This is reachable from dsl_destroy_head for synchronous freeing.
   When this happens open context fails to free objects leaving sync
   context stuck freeing potentially many objects.

   This is also reachable from traverse_pool for extreme rewind where it
   is theoretically possible that datasets not dirtied after txg are
   skipped if the MOS has high enough indirection to trigger this case.

In both of these cases, without backtracking the search ends prematurely
as ESRCH result implies no more matches in the entire object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16025
Closes #11196
2025-10-21 11:30:30 -07:00
Tony Hutter 2a5349bf93 zvol: verify IO type is supported
ZVOLs don't support all block layer IO request types.  Add a check for
the IO types we do support.  Also, remove references to
io_is_secure_erase() since they are not supported on ZVOLs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17803
2025-10-21 11:02:42 -07:00
Tony Hutter 0bb5950e72 zvol: Fix blk-mq sync
The zvol blk-mq codepaths would erroneously send FLUSH and TRIM
commands down the read codepath, rather than write.  This fixes
the issue, and updates the zvol_misc_fua test to verify that
sync writes are actually happening.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17761
Closes #17765
2025-10-21 11:02:42 -07:00
Brian Behlendorf 1baecd3a78 Linux 6.17 compat: META
Update the META file to reflect compatibility with the 6.17
kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17789
2025-10-21 11:02:42 -07:00
Brian Behlendorf 2b0502c578 Add interface to interface spa_get_worst_case_min_alloc() function
Provide an interface to retrieve the lowest and highest minimum
allocation size for the normal allocation class.  This can be used
by external consumers of the DMU to estimate potential wasted
capacity when setting the recordsize for an object.

The new "min_alloc" and "max_alloc" keys are added to the pool
configuration and used by default_volblocksize() to warn when
an ineffecient block size is requested.  For older kmods which
don't yet include the new keys fallback to the previous logic.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17758
2025-10-21 11:02:42 -07:00
Rob Norris 660077ffee contrib/initramfs/scripts/zfs: shellcheck fixup
I got a newer shellcheck, and it pointed out that read without a target
variable is not POSIXly. The var was removed in c3ef9f7528, so I put it
back, and now shellcheck complains about an unused var. That's actually
correct, but necessary, so I've added a suppression for that, probably
better.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17626
2025-10-21 11:02:42 -07:00
Brian Behlendorf e5ca94f21e Fix 'zpool add' safety check corner cases
Three cases were discovered where 'zpool add' would fail to
warn when adding vdevs to a pool with a mismatched replication
level.  These are:

  1. When a pool contains mixed file and disk vdevs.
  2. When a pool contains an active dRAID distributed spare
  3. When a pool contains an active hot spare

The lack of warnings are caused by get_replication() assessing
the current pool configuration an inconsistent and disabling
the mismatched replication check for the new pool configuration
after 'zpool add'.  This change updates get_replication() to
be slightly more tolerant in the non-fatal case.

The zpool_add_010_pos.ksh test case was split in to separate
tests: zpool_add_warn_create.ksh, pool_add_warn_degraded.ksh,
and zpool_add_warn_removal.  These test were extended to
include coverage for dRAID pools and the three scenarios
described above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17780
2025-10-21 11:02:35 -07:00
Rob Norris 365926a37c Linux 6.17: d_set_d_op() is no longer available
We only have extremely narrow uses, so move it all into a single
function that does only what we need, with and without d_set_d_op().

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2025-10-21 10:36:35 -07:00
Ameer Hamza 4b24cbba80 CI: Fix FreeBSD 15.0 by staying on ALPHA4 due to broken ALPHA5 image
FreeBSD 15.0-ALPHA5 image fails to boot on cloud VMs due to missing
/boot/efi mount point, causing the system to drop to single user mode
where SSH cannot start. Work around this by staying on ALPHA4 and
setting IGNORE_OSVERSION=yes to bypass pkg's kernel version mismatch
prompt during bootstrap. This allows CI to proceed with ALPHA4 until we
have a stable FreeBSD 15.0 image.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17846
2025-10-21 10:35:58 -07:00
Tino Reichardt 007f325e1b CI: Switch FreeBSD 15 to 15.0-ALPHA4 and add FreeBSD 16
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17815
2025-10-21 10:35:58 -07:00
Shreshth3 03c956e806 docs: fix a few small typos (#17804)
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-10-21 10:35:58 -07:00
Tony Hutter 2a331e411d CI: Add ZTS -O option, log Setup Testing Machines step
Add a -O option to zfs-test.sh to dump debug information on test
timeout.  The debug info includes:

- 30 lines from 'top'
- /proc/<PID>/stack output of process with highest CPU usage
- Last lines strace-ing process with highest CPU usage
- /proc/sysrq-trigger kernel stack traces

All debug information gets dumped to /dev/kmsg (Linux only).

In addition, print out the VM console lines from the "Setup Testing
Machines" step.  We have often see VMs timeout at this step and don't
know why.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17753
2025-10-21 10:35:58 -07:00
Brian Behlendorf 39b9b62a96 CI: Switch FreeBSD 15 to 15.0-ALPHA3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17795
2025-10-21 10:35:58 -07:00
Brian Behlendorf 395f099323 CI: Remove Buildbot references
The Buildbot CI infrastructure has been fully replaced by GitHub
Actions.  Remove any lingering references from the repository.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17794
2025-10-21 10:35:58 -07:00
Brian Behlendorf 78b21d8d19 CI: update perf and bpftools with the kernel packages
When updating a Fedora instance to an experimental kernel make sure
to include the matching versioned perf and bpftool packages.  This
helps ensure there are no unexpected conflicts which would prevent
the new packages from being installed.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17791
2025-10-21 10:35:58 -07:00
Alexander Motin a83eefee1e CI: Switch FreeBSD 15 to 15.0-ALPHA2
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17749
2025-10-21 10:35:58 -07:00
Tony Hutter cb66796f87 CI: Increase setup timeout to 20min, add timestamps
- Increase qemu-1-setup.sh timeout to 20min since it sometimes
  fails to complete after 15min.

- Timestamp all qemu-1-setup.sh lines to look for hangs.

- Add a 'watchdog' process to print out the top running process every
  30sec to help with debugging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17714
2025-10-21 10:35:58 -07:00
Shengqi Chen f4a85d6f2c ci: fix syntax issues in zfs-qemu.yml
Otherwise it might become `if [ == "" ]` which is ill-formed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-10-21 10:35:58 -07:00
Shengqi Chen 0c9302b042 ci: use real head sha instead of GITHUB_SHA when generating CI type
Because GitHub creates a merge commit on top of real head, so the check
on HEAD will fail regardlessly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-10-21 10:35:58 -07:00
Tony Hutter 4f33cd2350 CI: Increase 'Setup QEMU' timeout to 15 minutes
We've seen Fedora 42 still setting up after 10 min.  Change the timeout
to 15 min.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17697
2025-10-21 10:35:58 -07:00
Tony Hutter 34f96a15c7 Tag zfs-2.3.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-08-20 09:29:36 -07:00
Tino Reichardt fdb5078d82 CI: Add Debian 13 to the FULL_OS runner list
This commit adds Debian 13 alias Trixie to the checked operating
systems. The image needs to be run with UEFI support.

Current Debian version overview:
- Debian 11 (Bullseye) -> "oldoldstable"
- Debian 12 (Bookworm) -> "oldstable"
- Debian 13 (Trixie) -> new "stable"

The CI will be run on Debian 12 and Debian 13 now.
Debian 11 is kept, but won't be used automatically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17648
2025-08-20 09:29:36 -07:00
Shengqi Chen 8cca55f18b Debian rules: install scripts/objtool-wrapper.in into dkms tree
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17633
Closes #17646
2025-08-20 09:20:45 -07:00
Attila Fülöp ef1ee9421d objtool-wrapper: Update Debian packaging
6cf17f65 (#17456) introduced a change to `configure.ac` which
breaks the patching done in the Debian packages DKMS source
installation phase. This results in a failed module build.

Adapt the awk script doing the patching to handle the added
`AC_CONFIG_FILE` entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17633
Closes #17646
2025-08-20 09:12:26 -07:00
shodanshok 435006d81d add uncompressed_size to arc_summary
Add uncompressed ARC size to statistics reported by arc_summary.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17556
2025-08-19 10:30:04 -07:00
rmacklem 725886d67a FreeBSD: Add support for _PC_HAS_HIDDENSYSTEM
In FreeBSD there is now a pathconf name _PC_HAS_HIDDENSYSTEM.
This patch adds support for it to OpenZFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17518
2025-08-19 10:30:04 -07:00
Meriel Luna Mittelbach bce049389d Add templated zfs-mount@.service
Runs `zfs mount -R <dataset>` at boot, after `zfs mount -a`.
Intended to replace `mountpoint=legacy` in certain mount setups.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Meriel Luna Mittelbach <lunarlambda@gmail.com>
Closes #17483
2025-08-19 10:30:04 -07:00
Mark Johnston c3d74a0d6f FreeBSD: Ensure that z_pflags is initialized for new znodes
The field is subsequently accessed in zfs_mknode(), in
zfs_inherit_projid().  The Linux implementation of zfs_create_fs() has
this initialization already; there is no counterpart to
zfs_create_share_dir() that I can see.

Reported-by: KMSAN
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17486
2025-08-19 10:30:04 -07:00
Tony Hutter 9cf069b366 CI: Add optional patch level, fix hostname on F42
In the past there have been times when we need to generate new RPMs
for an existing ZFS release.  Typically this happens when a new RHEL
version comes out and the kernel symbols no longer match.  To get
users to auto-update we just bump the patch number.  For example, we
had to create zfs-2.1.13-1 for EL8.8 and zfs-2.1.13-2 for EL8.9.

This commit adds an optional patch level text box to the github
package builder runner.

In addition, this commit also uses `hostnamectl` instead of `hostname`
for F42+ compatibility, if available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17638
2025-08-18 17:06:55 -07:00
Richard Yao 3f87c9c276 Add CodeQL mismatched dsl_dataset_hold/_rele pairs check
This check is currently limited to checking mismatches that occur in the
same stack frame. It does not detect across stack frames.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17352
2025-08-18 17:06:55 -07:00
Patrick Fasano 9d14ce4db7 Add conflict/replacement with older SONAME libzfs and libzpool packages
In e8f0aa143e, the SONAMEs and package
names for libzfs and libzpool were bumped. The `contrib/debian/control`
file did not declare a conflict/replacement with the old package name.
This can cause dpkg to leave a system in an inconsistent state if the
old package is not manually uninstalled first.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Patrick Fasano <patrick@patrickfasano.com>
Closes #17586
2025-08-18 16:09:44 -07:00
Rob Norris 3b64a9619f FreeBSD: zfs_putpages: don't undirty pages until after write completes
In syncing mode, zfs_putpages() would put the entire range of pages onto
the ZIL, then return VM_PAGER_OK for each page to the kernel. However,
an associated zil_commit() or txg sync had not happened at this point,
so the write may not actually be on disk.

So, we rework that case to use a ZIL commit callback, and do the
post-write work of undirtying the page and signaling completion there.
We return VM_PAGER_PEND to the kernel instead so it knows that we will
take care of it.

The original version of this (238eab7dc1) copied the Linux model and did
the cleanup in a ZIL callback for both async and sync. This was a
mistake, as FreeBSD does not have a separate "busy for writeback" flag
like Linux which keeps the page usable. The full sbusy flag locks the
entire page out until the itx callback fires, which for async is after
txg sync, which could be literal seconds in the future.

For the async case, the data is already on the DMU and the in-memory
ZIL, which is sufficient for async writeback, so the old method of
logging it without a callback, undirtying the page and returning is more
than sufficient and reclaims that lost performance.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-08-12 22:41:17 -04:00
Mark Johnston a072611eef Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes"
This causes async putpages to leave the pages sbusied for a long time,
which hurts concurrency.  Revert for now until we have a better
approach.

This reverts commit 238eab7dc1.

Reported by:    Ihor Antonov <ngor@hugpoint.tech>
Discussed with: Rob Norris <rob.norris@klarasystems.com>

References: freebsd/freebsd-src@738a9a7
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Ported-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-08-12 22:41:17 -04:00
Brian Behlendorf 0fe10361ba Allow vmem_alloc backed multilists
Systems with a large number of CPU cores (192+) may trigger the large
allocation warning in multilist_create() on Linux.  Silence the warning
by converting the allocation to vmem_alloc().

On Linux this results in a call to kvalloc() which will alloc vmem
for large allocations and kmem for small allocations.

On FreeBSD both vmem_alloc and kmem_alloc internally use the same
allocator so there is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17616
2025-08-12 17:24:30 -07:00
Brian Behlendorf 3e78905ffb Silence zstd large allocation warning
Allow zstd_mempool_init() to allocate using vmem_alloc() instead
of kmem_alloc() to silence the large allocation warning on Linux
during module load when the system has a large number of CPUs.

It's not at all clear to me that scaling the allocation size with
the number of CPUs is beneficial and that should be evaluated.
But for the moment this should resolve the warning without
introducing any unexpected side effects.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17620
Closes #11557
2025-08-12 17:24:26 -07:00
Colin Percival 46de04d2e9 FreeBSD 15.0 is now "PRERELEASE"
Chase URL change from the FreeBSD project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@tarsnap.com>
Closes #17617
2025-08-12 17:24:22 -07:00
achill 41ca2296cd Linux 6.16 compat: META
Update the META file to reflect compatibility with the 6.16
kernel.

Tested with 6.16.0-0-stable of Alpine Linux edge, see
<https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/87929>.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Achill Gilgenast <achill@achill.org>
Closes #17578
2025-08-12 17:24:19 -07:00
René Wirnata 9651668457 zed: prettify slack notification message
This converts the body of a ZED slack notification from
plain text to code block style to help with readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: René Wirnata <rene.wirnata@pandascience.net>
Closes #17610
2025-08-12 17:24:15 -07:00
Rob Norris a49c957299 linux/zvol_os: fix crash with blk-mq on Linux 4.19
03987f71e3 (#16069) added a workaround to get the blk-mq hardware
context for older kernels that don't cache it in the struct request.
However, this workaround appears to be incomplete.

In 4.19, the rq data context is optional. If its not initialised, then
the cached rq->cpu will be -1, and so using it to index into mq_map
causes a crash.

Given that the upstream 4.19 is now in extended LTS and rarely seen,
RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached
hardware context has been available since 5.1, I'm not going to huge
lengths to get queue selection correct for the very few people that are
likely to feel it. To that end, we simply call raw_smp_processor_id() to
get a valid CPU id and use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17597
2025-08-12 17:24:11 -07:00
Todd Zullinger d1d706350e rpm: don't list /sbin/zgenhostid twice in %files
The location of zgenhostid was changed in 0ae733c7a (Install zgenhostid
to sbindir, 2021-01-21).  We include all files within sbindir two lines
earlier, which causes rpmbuild to report:

    File listed twice: /sbin/zgenhostid

Drop the redundant entry from the %files section.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Closes #17601
2025-08-12 17:24:08 -07:00
Attila Fülöp 11f844175e config: Avoid void main() in toolchain-simd.m4
Be standard-compliant by using `int main()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-12 17:23:57 -07:00
Attila Fülöp 57b614e025 SIMD: Don't require definition of HAVE_XSAVE
Currently we fail the compilation via the #error directive if
`HAVE_XSAVE` isn't defined. This breaks i586 builds since we check
the toolchains SIMD support only on i686 and onward.

Remove the requirement to fix the build on i586.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-12 17:23:51 -07:00
Rob Norris 0c7d6e20e6 Linux: zfs_putpage: document (and fix!) confusing sync/commit modes
The structure of zfs_putpage() and its callers is tricky to follow.
There's a lot more we could do to improve it, but at least now we have
some description of one of the trickier bits.

Writing this exposed a very subtle bug: most async pages pushed out
through zpl_putpages() would go to the ZIL with commit=false, which can
yield a less-efficient write policy. So this commit updates that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:46 -07:00
Rob Norris b9c45fe68c Linux: zfs_putpage: complete async page writeback immediately
For async page writeback, we do not need to wait for the page to be on
disk before returning to the caller; it's enough that the data from the
dirty page be on the DMU and in the in-memory ZIL, just like any other
write.

So, if this is not a syncing write, don't add a callback to the itx, and
instead just unlock the page immediately.

(This is effectively the same concept used for FreeBSD in d323fbf49c).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
Closes #14290
2025-08-12 17:23:43 -07:00
Rob Norris f72226a75c Linux: sync: remove async/sync accounting
All this machinery is there to try to understand when there an async
writeback waiting to complete because the intent log callbacks are still
outstanding, and force them with a timely zil_commit(). The next commit
fixes this properly, so there's no need for all this extra housekeeping.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:39 -07:00
Rob Norris 97fe86837c ZTS: mmap_ftruncate test to confirm async writeback behaviour
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-12 17:23:35 -07:00
Rob Norris df5e02d253 CI: match and trim out internal timestamp for test prefix
Adjust the regexes to match the test line with timestamps, then remove
them for the summary. The internal timestamp is still in the full logs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-12 17:23:28 -07:00
Rob Norris 245adb6a4f ZTS: include microsecond timestamps on all output
When reviewing test output after a failure, it's often quite difficult
to work out the order and timing of events, and to correlate test suite
output with kernel logs.

This adds timestamps to ZTS output to help with this, in three places:

- all of the standard log_XXX functions ultimately end up in _printline,
  which now prefixes output with a timestamp. An escape hatch
  environment variable is provided for user_cmd, which often calls the
  logging functions while also depending on the captured output.

- the test runner logging function log() also now prefixes its output
  with a timestamp.

- on failure, when capturing the kernel log in zfs_dmesg.ksh, the "iso"
  time format is requested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-12 17:23:07 -07:00
Brian Behlendorf 82a0868ce4 CI: Remove Debian backports
The latest Debian 11 image includes bullseye-backports as a default
repository in the /etc/apt/sources.list.  However, this repository
has gone end of life which effectively breaks the default install.

We shouldn't need anything in backports so lets unconditionally
remove backports on all Debian builders to resolve the issue.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17569
2025-08-12 17:22:45 -07:00
Coleman Kane e7e0bb3b61 linux: Fix out-of-src builds
The linux kernel modules haven't been building successfully when the
build occurs in a separate directory than the source code, which is a
common build pattern in Linux. Was not able to determine the root cause,
but the %.o targets in subdirectories are no longer being matched by the
pattern targets in the Linux Kbuild system. This change fixes the issue
by dynamically creating the missing ones inside our Kbuild.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #17517
2025-08-12 17:22:40 -07:00
Paul Dagnelie 6af1f61ad4 Fix zdb pool/ with -k
When examining the root dataset with zdb -k, we get into a mismatched
state. main() knows we are not examining the whole pool, but it strips
off the trailing slash. import_checkpointed_state() then thinks we are
examining the whole pool, and does not update the target path
appropriately. The fix is to directly inform import_checkpointed_state
that we are examining a filesystem, and not the whole pool.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17536
2025-08-12 17:21:47 -07:00
Carl George 8c4f625c12 CI: Add CentOS Stream 9/10 to the FULL_OS runner list
Testing on CentOS Stream provides several months advance notice of
changes coming to the RHEL kernel.  This should help OpenZFS be
proactive instead of reactive to new RHEL minor versions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Carl George <carlwgeorge@gmail.com>
ZFS-CI-Type: full
Closes #16904
Closes #17526
2025-08-12 17:20:16 -07:00
Tino Reichardt 7882e85a9b Delete unused .cirrus.yml
The Cirrus_CI was planned for testing FreeBSD, but never really used I
think. Currently it's not needed anymore, so remove it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17155
Closes #17535
2025-08-12 17:19:43 -07:00
Tino Reichardt 6b38d0f7ff ZTS: Fix FreeBSD 15.0 ksh errors
The package ksh93 is replaced by ksh now.
This works for FreeBSD 13 and 14 also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17523
2025-08-12 17:19:32 -07:00
Alexander Motin 80b6457fcd CI: Switch from FreeBSD 13.4 to 13.5
FreeBSD 13.4 is EOL since June 30, 2025.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #17519
2025-08-12 17:19:07 -07:00
Brian Behlendorf 2518f4b124 Revert "Fix incorrect expected error in ztest"
This reverts commit 2076011e0c.  The
comment which explains EINVAL should be expected for this case was
wrong, not the code.  The kernel will return ENOTSUP when attaching
a distributed spare to the wrong top-level dRAID vdev.  See the
check for this in spa_vdev_attach().

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17503
2025-08-12 17:18:40 -07:00
Igor Ostapenko 90d2c4407a ztest: Fix false positive of ENOSPC handling
Before running a pass zs_enospc_count is checked to free up some space
by destroying a random dataset. But the space freed may still be not
re-usable during the TXG_DEFER window breaking the next dataset creation
in ztest_generic_run().
    
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17506
2025-08-12 17:18:34 -07:00
Brian Behlendorf f7698f47e8 CI: run ztest on compressed zpool
When running ztest under the CI a common failure mode is for the
underlying filesystem to run out of available free space.  Since
the storage associated with a GitHub-hosted running is fixed, we
instead create a pool and use a compressed ZFS dataset to store
the ztest vdev files.  This significantly increases the available
capacity since the data written by ztest is highly compressible.
A compression ratio of over 40:1 is conservatively achieved using
the default lz4 compression.  Autotrimming is enabled to ensure
freed blocks are discarded from the backing cipool vdev file.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17501
2025-08-12 17:17:49 -07:00
Martin Rüegg 6c1130a730 pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4
71216b91d2 introduced a regression
on debian/ubuntu systems during build.

The reason being, that building the RPM for pyzfs was using
a different library path than building the library itself.
This is now harmonized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #16155
Closes #17480
2025-08-12 17:17:24 -07:00
Martin Rüegg 74b539d3dc pyzfs: Update ax_python_devel.m4 to serial 37
Fixes an obvious typo, where a variable was missing the required
leading dollar sign ($)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #17480
2025-08-12 17:17:17 -07:00
Chunwei Chen 024e60b927 Missing tests in make pkg
```
Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this
run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed
verification.
```

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17491
2025-08-12 17:17:09 -07:00
Olivier Certner 5289f6f961 spa: ZIO_TASKQ_ISSUE: Use symbolic priority
This allows to change the meaning of priority differences in FreeBSD
without requiring code changes in ZFS.

This upstreams commit fd141584cf89d7d2 from FreeBSD src.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17489
2025-08-12 17:16:00 -07:00
Paul Dagnelie 094305c937 Fix TestGroup warning due to missing tags
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17473
2025-08-12 15:59:59 -07:00
Tino Reichardt a826f7a993 ZTS: Use FreeBSD cloudinit images
FreeBSD provides CI-IMAGES since some time. These images are
based on nuageinit, which does not support fqdn and sudo for
example. So we need currently some workarounds to get it
working.

The FreeBSD images will be more compatible with cloud-init in
some near future. Then we can remove the workaround things.

These versions are used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-3s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17462
2025-08-12 15:58:46 -07:00
Attila Fülöp 86bf73c1eb objtool wrapper: use absolute path to call the wrapper
Older kernel versions run make outside of the build directory. This
works since all paths are absolute. Relative paths will fail in such
a scenario.

Use an absolute path to the objtool wrapper as well, since the
relative path breaks the build on older kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17541
2025-08-07 13:10:58 -04:00
Attila Fülöp 1d293b377a Linux build: handle CONFIG_OBJTOOL_WERROR=y
Linux 5.16 by default fails the build on objtool warnings. We have
known and understood objtool warnings we can't fix without
involving Linux maintainers.

To work around this we introduce an objtool wrapper script which
removes the `--Werror` flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17456
2025-08-07 13:10:33 -04:00
Alexander Motin 22eb2bdce3 Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-08-07 12:41:40 -04:00
Alexander Motin 809b553940 Introduce zfs rewrite subcommand (#17246)
This allows to rewrite content of specified file(s) as-is without
modifications, but at a different location, compression, checksum,
dedup, copies and other parameter values.  It is faster than read
plus write, since it does not require data copying to user-space.
It is also faster for sync=always datasets, since without data
modification it does not require ZIL writing.  Also since it is
protected by normal range range locks, it can be done under any
other load.  Also it does not affect file's modification time or
other properties.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-08-07 12:34:28 -04:00
Rob Norris abb6211e7a Linux 6.16: remove writepage and readahead_page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17443
2025-08-07 12:29:45 -04:00
khoang98 c405a7a35c Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread
Avoid calling dbuf_evict_one() from memory reclaim contexts (e.g. Linux
kswapd, FreeBSD pagedaemon). This prevents deadlock caused by reclaim
threads waiting for the dbuf hash lock in the call sequence:
dbuf_evict_one -> dbuf_destroy -> arc_buf_destroy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Kaitlin Hoang <kthoang@amazon.com>
Closes #17561
2025-08-07 12:15:14 -04:00
shodanshok 4808641e71 enforce arc_dnode_limit
Linux kernel shrinker in the context of null/root memcg does not scan
dentry and inode caches added by a task running in non-root memcg. For
ZFS this means that dnode cache routinely overflows, evicting valuable
meta/data and putting additional memory pressure on the system.

This patch restores zfs_prune_aliases as fallback when the kernel
shrinker does nothing, enabling zfs to actually free dnodes. Moreover,
it (indirectly) calls arc_evict when dnode_size > dnode_limit.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17487
Closes #17542
2025-08-07 12:11:34 -04:00
Alexander Motin 30fa92bff3 Increase meta-dnode redundancy in "some" mode
Loss of one indirect block of the meta dnode likely means loss of
the whole dataset.  It is worse than one file that the man page
promises, and in my opinion is not much better than "none" mode.

This change restores redundancy of the meta-dnode indirect blocks,
while same time still corrects expectations in the man page.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17339
2025-08-05 13:15:44 -04:00
Paul Dagnelie fd5a27c9db Ensure that gang_copies is always at least as large as copies
As discussed in the comments of PR #17004, you can theoretically run
into a case where a gang child has more copies than the gang header,
which can lead to some odd accounting behavior (and even trip a
VERIFY). While the accounting code could be changed to handle this, it
fundamentally doesn't seem to make a lot of sense to allow this to
happen. If the data is supposed to have a certain level of reliability,
that isn't actually achieved unless the gang_copies property is set to
match it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17484
2025-08-05 13:14:45 -04:00
Rob Norris 3ad3f439bb zts: add spdx license tags to gang_blocks tests (#17160)
Missed in #17073, probably because that PR was branched before #17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-08-05 13:14:11 -04:00
Paul Dagnelie a46ce73ca8 Make ganging redundancy respect redundant_metadata property (#17073)
The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-08-05 13:10:40 -04:00
Ameer Hamza 90790955a6 SPDX: Add missing CDDL-1.0 license
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
2025-08-05 12:51:35 -04:00
Igor Ostapenko 95abbc71c3 range_tree: Provide more debug details upon unexpected add/remove
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17581
2025-08-05 12:34:54 -04:00
Tino Reichardt fc658b9935 Faster checksum benchmark on system boot
While booting, only the needed 256KiB benchmarks are done now.

The delay for checking all checksums occurs when requested via:
- Linux: cat /proc/spl/kstat/zfs/chksum_bench
- FreeBSD: sysctl kstat.zfs.misc.chksum_bench

Reported by: Lahiru Gunathilake <gunathilakebllg@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Colin Percival <cperciva@tarsnap.com>
Closes #17563
Closes #17560
2025-08-05 12:34:13 -04:00
Paul Dagnelie 271b9797c5 Don't use wrong weight when passivating group
When we're passivating a metaslab group we start by passivating the 
metaslabs that have been activated for each of the allocators.  To do 
that, we need to provide a weight. However, currently this erroneously 
always uses a segment-based weight, even if segment-based weighting is 
disabled.

Use the normal weight function, which will decide which type of weight 
to use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17566
2025-08-05 12:33:52 -04:00
Brian Behlendorf 582e7847f6 Default to zfs_bclone_wait_dirty=1
Update the default FICLONE and FICLONERANGE ioctl behavior to wait
on dirty blocks.  While this does remove some control from the
application, in practice ZFS is better positioned to the optimial
thing and immediately force a TXG sync.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17455
2025-08-05 12:33:30 -04:00
Andriy Tkachuk 6d378564b4 zdb: fix checksum calculation for decompressed blocks
Currently, when reading compressed blocks with -R and decompressing
them with :d option and specifying lsize, which is normally bigger
than psize for compressed blocks, the checksum is calculated on
decompressed data. But it makes no sense since zfs always calculates
checksum on physical, i.e. compressed data. So reading the same block
produces different checksum results depending on how we read it,
whether we decompress it or not, which, again, makes no sense.

Fix: use psize instead of lsize when calculating the checksum so that
it is always calculated on the physical block size, no matter was it
compressed or not.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17547
2025-08-05 12:33:00 -04:00
Ameer Hamza 0c928f7a37 ZED: Fix device type detection and pool iteration logic
During hotplug REMOVED events, devid matching fails for partition-based
spares because devid information is not stored in pool config for
partitioned devices. However, when devid is populated by the hotplug
event, the original code skipped the search logic entirely, skipping
vdev_guid matching and resulting in wrong device type detection that
caused spares to be incorrectly identified as l2arc devices.
Additionally, fix zfs_agent_iter_pool() to use the return value from
zfs_agent_iter_vdev() instead of relying on search parameters, which
was previously ignored. Also add pool_guid optimization to enable
targeted pool searching when pool_guid is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17545
2025-08-05 12:32:28 -04:00
Chunwei Chen c79d5e4f33 Define sops->free_inode() to prevent use-after-free during lookup
On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can
be dereferenced without refcounts and locks. For this reason, dentry and
inode must only be freed after RCU grace period.

However, zfs currently frees inode in zfs_inode_destroy synchronously
and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on
Linux 5.2 and after, if we define sops->free_inode(), the kernel will do
call_rcu() for us.

This issue may be triggered more easily with init_on_free=1 boot
parameter:

BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:selinux_inode_permission+0x10e/0x1c0
Call Trace:
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? security_inode_permission+0x37/0x60
 ? __die_body.cold+0x8/0xd
 ? no_context+0x113/0x220
 ? exc_page_fault+0x6d/0x130
 ? asm_exc_page_fault+0x1e/0x30
 ? selinux_inode_permission+0x10e/0x1c0
 security_inode_permission+0x37/0x60
 link_path_walk.part.0.constprop.0+0xb5/0x360
 ? path_init+0x27d/0x3c0
 path_lookupat+0x3e/0x1a0
 filename_lookup+0xc0/0x1d0
 ? __check_object_size.part.0+0x123/0x150
 ? strncpy_from_user+0x4e/0x130
 ? getname_flags.part.0+0x4b/0x1c0
 vfs_statx+0x72/0x120
 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
 __do_sys_newlstat+0x39/0x70
 ? __x64_sys_ioctl+0x8d/0xd0
 do_syscall_64+0x30/0x40
 entry_SYSCALL_64_after_hwframe+0x62/0xc7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17546
2025-08-05 12:30:23 -04:00
Alexander Motin 347d68048a ZIL: Force writing of open LWB on suspend
Under parallel workloads ZIL may delay writes of open LWBs that
are not full enough.  On suspend we do not expect anything new to
appear since zil_get_commit_list() will not let it pass, only
returning TXG number to wait for.  But I suspect that waiting for
the TXG commit without having the last LWB issued may not wait for
its completion, resulting in panic described in #17509.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17521
2025-08-05 12:28:41 -04:00
Paul Dagnelie acf3871ef8 Correct weight recalculation of space-based metaslabs
Currently, after a failed allocation, the metaslab code recalculates the
weight for a metaslab. However, for space-based metaslabs, it uses the
maximum free segment size instead of the normal weighting
algorithm. This is presumably because the normal metaslab weight is
(roughly) intended to estimate the size of the largest free segment, but
it doesn't do that reliably at most fragmentation levels. This means
that recalculated metaslabs are forced to a weight that isn't really
using the same units as the rest of them, resulting in undesirable
behaviors. We switch this to use the normal space-weighting function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.
Closes #17531
2025-08-05 12:28:34 -04:00
Ameer Hamza 21d5f25724 Validate mountpoint on path-based unmount using statx
Use statx to verify that path-based unmounts proceed only if the
mountpoint reported by statx matches the MNTTAB entry reported by
libzfs, aborting the operation if they differ. Align
`zfs umount /path` behavior with `zfs umount dataset`.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17481
2025-08-05 12:27:25 -04:00
Paul Dagnelie 7e945a5b3f Fix other nonrot bugs
There are still a variety of bugs involving the vdev_nonrot property
that will cause problems if you try to run the test suite with
segment-based weighting disabled, and with other things in the weighting
code. Parents' nonrot property need to be updated when children are
added. When vdevs are expanded and more metaslabs are added, the weights
have to be recalculated (since the number of metaslabs is an input to
the lba bias function). When opening, faulted or unopenable children
should not be considered for whether a vdev is nonrot or not (since the
nonrot property is determined during a successful open, this can cause
false negatives). And draid spares need to have the nonrot property set
correctly.

Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17469
2025-08-05 12:25:26 -04:00
Alexander Motin 85ce6b8ab2 Polish db_rwlock scope
dbuf_verify(): Don't need the lock, since we only compare pointers.

dbuf_findbp(): Don't need the lock, since aside of unneeded assert
we only produce the pointer, but don't de-reference it.

dnode_next_offset_level(): When working on top level indirection
should lock dnode buffer's db_rwlock, since it is our parent.  If
dnode has no buffer, then it is meta-dnode or one of quotas and we
should lock the dataset's ds_bp_rwlock instead.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17441
2025-08-05 12:23:32 -04:00
Mariusz Zaborski 954894ee53 scrub: generate scrub_finish event
The `scn_min_txg` can now be used not only with resilver. Instead
of checking `scn_min_txg` to determine whether it’s a resilver or
a scrub, simply check which function is defined. Thanks to this
change, a scrub_finish event is generated when performing a scrub
from the saved txg.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #17432
2025-08-05 12:21:46 -04:00
Alexander Motin a4e775d2ca Some arc_release() cleanup
- Don't drop L2ARC header if we have more buffers in this header.
Since we leave them the header, leave them the L2ARC header also.
Honestly we are not required to drop it even if there are no other
buffers, but then we'd need to allocate it a separate header, which
we might drop soon if the old block is really deleted.  Multiple
buffers in a header likely mean active snapshots or dedup, so we
know that the block in L2ARC will remain valid.  It might be rare,
but why not?
 - Remove some impossible assertions and conditions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17126
2025-08-05 12:16:27 -04:00
Paul Dagnelie 661310ff5c FDT dedup log sync -- remove incremental
This PR condenses the FDT dedup log syncing into a single sync
pass. This reduces the overhead of modifying indirect blocks for the
dedup table multiple times per txg. In addition, changes were made to
the formula for how much to sync per txg. We now also consider the
backlog we have to clear, to prevent it from growing too large, or
remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Don Brady <don.brady@klarasystems.com>
Authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17038
2025-08-05 12:15:21 -04:00
Alexander Motin f9d59b579e ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush
defer, when we flush vdev caches only once all the write ZIOs has
completed.  But it was recently spotted that it serializes not only
ZIO completions handling, but also their ready stage.  It means ZIO
pipeline can't calculate checksums for the following ZIOs until all
the previous are checksumed, even though it is not required.  On a
systems where memory throughput of a single CPU core is limited,
it creates single-core CPU bottleneck, which is difficult to see
due to ZIO pipeline design with many taskqueue threads.

While it would be great to bypass the ready stage waits, it would
require changes to ZIO code, and I haven't found a clean way to do
it.  But I've noticed that we don't need any dependency between
the write ZIOs if the previous one has some waiters, which means
it won't defer any flushes and work as a barrier for the earlier
ones.

Bypassing it won't help large single-thread writes, since all the
write ZIOs except the last in that case won't have waiters, and
so will be dependent.  But in that case the ZIO processing might
not be a bottleneck, since there will be only one thread populating
the write buffers, that will likely be the bottleneck.

But bypassing the ZIO dependency on multi-threaded write workloads
really allows them to scale beyond the checksuming throughput of
one CPU core.

My tests with writing 12 files on a same dataset on a pool with
4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a
system with Xeon Silver 4114 CPU show total throughput increase
from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to
~70%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17458
2025-08-05 12:14:18 -04:00
173 changed files with 3988 additions and 1208 deletions
-21
View File
@@ -1,21 +0,0 @@
env:
CIRRUS_CLONE_DEPTH: 1
ARCH: amd64
build_task:
matrix:
freebsd_instance:
image_family: freebsd-13-5
freebsd_instance:
image_family: freebsd-14-2
freebsd_instance:
image_family: freebsd-15-0-snap
prepare_script:
- pkg install -y autoconf automake libtool gettext-runtime gmake ksh93 py311-packaging py311-cffi py311-sysctl
configure_script:
- env MAKE=gmake ./autogen.sh
- env MAKE=gmake ./configure --with-config="user" --with-python=3.11
build_script:
- gmake -j `sysctl -n kern.smp.cpus`
install_script:
- gmake install
+1 -1
View File
@@ -14,7 +14,7 @@ Please check our issue tracker before opening a new feature request.
Filling out the following template will help other contributors better understand your proposed feature.
-->
### Describe the feature would like to see added to OpenZFS
### Describe the feature you would like to see added to OpenZFS
<!--
Provide a clear and concise description of the feature.
-5
View File
@@ -2,11 +2,6 @@
<!--- Provide a general summary of your changes in the Title above -->
<!---
Documentation on ZFS Buildbot options can be found at
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html
-->
### Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
+1
View File
@@ -2,3 +2,4 @@ name: "Custom CodeQL Analysis"
queries:
- uses: ./.github/codeql/custom-queries/cpp/deprecatedFunctionUsage.ql
- uses: ./.github/codeql/custom-queries/cpp/dslDatasetHoldReleMismatch.ql
@@ -0,0 +1,34 @@
/**
* @name Detect mismatched dsl_dataset_hold/_rele pairs
* @description Flags instances of issue #12014 where
* - a dataset held with dsl_dataset_hold_obj() ends up in dsl_dataset_rele_flags(), or
* - a dataset held with dsl_dataset_hold_obj_flags() ends up in dsl_dataset_rele().
* @kind problem
* @severity error
* @tags correctness
* @id cpp/dslDatasetHoldReleMismatch
*/
import cpp
from Variable ds, Call holdCall, Call releCall, string message
where
ds.getType().toString() = "dsl_dataset_t *" and
holdCall.getASuccessor*() = releCall and
(
(holdCall.getTarget().getName() = "dsl_dataset_hold_obj_flags" and
holdCall.getArgument(4).(AddressOfExpr).getOperand().(VariableAccess).getTarget() = ds and
releCall.getTarget().getName() = "dsl_dataset_rele" and
releCall.getArgument(0).(VariableAccess).getTarget() = ds and
message = "Held with dsl_dataset_hold_obj_flags but released with dsl_dataset_rele")
or
(holdCall.getTarget().getName() = "dsl_dataset_hold_obj" and
holdCall.getArgument(3).(AddressOfExpr).getOperand().(VariableAccess).getTarget() = ds and
releCall.getTarget().getName() = "dsl_dataset_rele_flags" and
releCall.getArgument(0).(VariableAccess).getTarget() = ds and
message = "Held with dsl_dataset_hold_obj but released with dsl_dataset_rele_flags")
)
select releCall,
"Mismatched release: held with $@ but released with " + releCall.getTarget().getName() + " for dataset $@",
holdCall, holdCall.getTarget().getName(),
ds, ds.toString()
@@ -7,7 +7,7 @@ Prints "quick" if (explicity required by user):
- the *last* commit message contains 'ZFS-CI-Type: quick'
or if (heuristics):
- the files changed are not in the list of specified directories, and
- all commit messages do not contain 'ZFS-CI-Type: full'
- all commit messages do not contain 'ZFS-CI-Type: (full|linux|freebsd)'
Otherwise prints "full".
"""
@@ -65,12 +65,12 @@ if __name__ == '__main__':
# check last (HEAD) commit message
last_commit_message_raw = subprocess.run([
'git', 'show', '-s', '--format=%B', 'HEAD'
'git', 'show', '-s', '--format=%B', head
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in last_commit_message_raw.stdout.decode().splitlines():
if line.strip().lower() == 'zfs-ci-type: quick':
output_type('quick', f'explicitly requested by HEAD commit {head}')
output_type('quick', f'requested by HEAD commit {head}')
# check all commit messages
all_commit_message_raw = subprocess.run([
@@ -83,8 +83,12 @@ if __name__ == '__main__':
for line in all_commit_message:
if line.startswith('ZFS-CI-Commit:'):
commit_ref = line.lstrip('ZFS-CI-Commit:').rstrip()
if line.strip().lower() == 'zfs-ci-type: freebsd':
output_type('freebsd', f'requested by commit {commit_ref}')
if line.strip().lower() == 'zfs-ci-type: linux':
output_type('linux', f'requested by commit {commit_ref}')
if line.strip().lower() == 'zfs-ci-type: full':
output_type('full', f'explicitly requested by commit {commit_ref}')
output_type('full', f'requested by commit {commit_ref}')
# check changed files
changed_files_raw = subprocess.run([
+10
View File
@@ -6,6 +6,13 @@
set -eu
# We've been seeing this script take over 15min to run. This may or
# may not be normal. Just to get a little more insight, print out
# a message to stdout with the top running process, and do this every
# 30 seconds. We can delete this watchdog later once we get a better
# handle on what the timeout value should be.
(while [ 1 ] ; do sleep 30 && echo "[watchdog: $(ps -eo cmd --sort=-pcpu | head -n 2 | tail -n 1)}')]"; done) &
# install needed packages
export DEBIAN_FRONTEND="noninteractive"
sudo apt-get -y update
@@ -65,3 +72,6 @@ sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 -O relatime=off \
for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i
done
# Kill off our watchdog
kill $(jobs -p)
+121 -61
View File
@@ -12,10 +12,10 @@ OS="$1"
# OS variant (virt-install --os-variant list)
OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2025-04-13"
URLzs=""
# FreeBSD urls's
FREEBSD_REL="https://download.freebsd.org/releases/CI-IMAGES"
FREEBSD_SNAP="https://download.freebsd.org/snapshots/CI-IMAGES"
URLxz=""
# Ubuntu mirrors
UBMIRROR="https://cloud-images.ubuntu.com"
@@ -25,6 +25,10 @@ UBMIRROR="https://cloud-images.ubuntu.com"
# default nic model for vm's
NIC="virtio"
# additional options for virt-install
OPTS[0]=""
OPTS[1]=""
case "$OS" in
almalinux8)
OSNAME="AlmaLinux 8"
@@ -43,16 +47,15 @@ case "$OS" in
OSNAME="Archlinux"
URL="https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2"
;;
centos-stream10)
OSNAME="CentOS Stream 10"
# TODO: #16903 Overwrite OSv to stream9 for virt-install until it's added to osinfo
OSv="centos-stream9"
URL="https://cloud.centos.org/centos/10-stream/x86_64/images/CentOS-Stream-GenericCloud-10-latest.x86_64.qcow2"
;;
centos-stream9)
OSNAME="CentOS Stream 9"
URL="https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-latest.x86_64.qcow2"
;;
centos-stream10)
OSNAME="CentOS Stream 10"
OSv="centos-stream9"
URL="https://cloud.centos.org/centos/10-stream/x86_64/images/CentOS-Stream-GenericCloud-10-latest.x86_64.qcow2"
;;
debian11)
OSNAME="Debian 11"
URL="https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-amd64.qcow2"
@@ -61,6 +64,14 @@ case "$OS" in
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
debian13)
OSNAME="Debian 13"
# TODO: Overwrite OSv to debian13 for virt-install until it's added to osinfo
OSv="debian12"
URL="https://cloud.debian.org/images/cloud/trixie/latest/debian-13-generic-amd64.qcow2"
OPTS[0]="--boot"
OPTS[1]="uefi=on"
;;
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
@@ -71,50 +82,61 @@ case "$OS" in
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-42-1.1.x86_64.qcow2"
;;
freebsd13-4r)
OSNAME="FreeBSD 13.4-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
fedora43)
OSNAME="Fedora 43"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/43/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-43-1.6.x86_64.qcow2"
;;
freebsd13-5r)
OSNAME="FreeBSD 13.5-RELEASE"
FreeBSD="13.5-RELEASE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.5-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_REL/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_REL/../amd64/$FreeBSD/src.txz"
NIC="rtl8139"
;;
freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-2r)
OSNAME="FreeBSD 14.2-RELEASE"
FreeBSD="14.2-RELEASE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_REL/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_REL/../amd64/$FreeBSD/src.txz"
;;
freebsd14-3r)
FreeBSD="14.3-RELEASE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLxz="$FREEBSD_REL/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_REL/../amd64/$FreeBSD/src.txz"
;;
freebsd13-5s)
OSNAME="FreeBSD 13.5-STABLE"
FreeBSD="13.5-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.5-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
NIC="rtl8139"
;;
freebsd14-2s)
OSNAME="FreeBSD 14.2-STABLE"
freebsd14-3s)
FreeBSD="14.3-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
freebsd15-0c)
OSNAME="FreeBSD 15.0-CURRENT"
freebsd15-0s)
FreeBSD="15.0-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
freebsd16-0c)
FreeBSD="16.0-CURRENT"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
tumbleweed)
OSNAME="openSUSE Tumbleweed"
@@ -168,31 +190,37 @@ echo "CPU=\"$CPU\"" >> $ENV
sudo mkdir -p "/mnt/tests"
sudo chown -R $(whoami) /mnt/tests
DISK="/dev/zvol/zpool/openzfs"
sudo zfs create -ps -b 64k -V 80g zpool/openzfs
while true; do test -b $DISK && break; sleep 1; done
# we are downloading via axel, curl and wget are mostly slower and
# require more return value checking
IMG="/mnt/tests/cloudimg.qcow2"
if [ ! -z "$URLzs" ]; then
echo "Loading image $URLzs ..."
time axel -q -o "$IMG.zst" "$URLzs"
zstd -q -d --rm "$IMG.zst"
IMG="/mnt/tests/cloud-image"
if [ ! -z "$URLxz" ]; then
echo "Loading $URLxz ..."
time axel -q -o "$IMG" "$URLxz"
echo "Loading $KSRC ..."
time axel -q -o ~/src.txz $KSRC
else
echo "Loading image $URL ..."
echo "Loading $URL ..."
time axel -q -o "$IMG" "$URL"
fi
DISK="/dev/zvol/zpool/openzfs"
FORMAT="raw"
sudo zfs create -ps -b 64k -V 80g zpool/openzfs
while true; do test -b $DISK && break; sleep 1; done
echo "Importing VM image to zvol..."
sudo qemu-img dd -f qcow2 -O raw if=$IMG of=$DISK bs=4M
if [ ! -z "$URLxz" ]; then
xzcat -T0 $IMG | sudo dd of=$DISK bs=4M
else
sudo qemu-img dd -f qcow2 -O raw if=$IMG of=$DISK bs=4M
fi
rm -f $IMG
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
cat <<EOF > /tmp/user-data
if [ ${OS:0:7} != "freebsd" ]; then
cat <<EOF > /tmp/user-data
#cloud-config
fqdn: $OS
hostname: $OS
users:
- name: root
@@ -208,6 +236,19 @@ growpart:
devices: ['/']
ignore_growroot_disabled: false
EOF
else
cat <<EOF > /tmp/user-data
#cloud-config
hostname: $OS
# minimized config without sudo for nuageinit of FreeBSD
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
EOF
fi
sudo virsh net-update default add ip-dhcp-host \
"<host mac='52:54:00:83:79:00' ip='192.168.122.10'/>" --live --config
@@ -223,15 +264,8 @@ sudo virt-install \
--graphics none \
--network bridge=virbr0,model=$NIC,mac='52:54:00:83:79:00' \
--cloud-init user-data=/tmp/user-data \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
# enable KSM on Linux
if [ ${OS:0:7} != "freebsd" ]; then
sudo virsh dommemstat --domain "openzfs" --period 5
sudo virsh node-memory-tune 100 50 1
echo 1 | sudo tee /sys/kernel/mm/ksm/run > /dev/null
fi
--disk $DISK,bus=virtio,cache=none,format=raw,driver.discard=unmap \
--import --noautoconsole ${OPTS[0]} ${OPTS[1]} >/dev/null
# Give the VMs hostnames so we don't have to refer to them with
# hardcoded IP addresses.
@@ -252,3 +286,29 @@ StrictHostKeyChecking no
# small timeout, used in while loops later
ConnectTimeout 1
EOF
if [ ${OS:0:7} != "freebsd" ]; then
# enable KSM on Linux
sudo virsh dommemstat --domain "openzfs" --period 5
sudo virsh node-memory-tune 100 50 1
echo 1 | sudo tee /sys/kernel/mm/ksm/run > /dev/null
else
# on FreeBSD we need some more init stuff, because of nuageinit
BASH="/usr/local/bin/bash"
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null root@vm0 "uname -a" && break
done
ssh root@vm0 "env IGNORE_OSVERSION=yes pkg install -y bash ca_root_nss git qemu-guest-agent python3 py311-cloud-init"
ssh root@vm0 "chsh -s $BASH root"
ssh root@vm0 'sysrc qemu_guest_agent_enable="YES"'
ssh root@vm0 'sysrc cloudinit_enable="YES"'
ssh root@vm0 "pw add user zfs -w no -s $BASH"
ssh root@vm0 'mkdir -p ~zfs/.ssh'
ssh root@vm0 'echo "zfs ALL=(ALL:ALL) NOPASSWD: ALL" >> /usr/local/etc/sudoers'
ssh root@vm0 'echo "PubkeyAuthentication yes" >> /etc/ssh/sshd_config'
scp ~/.ssh/id_ed25519.pub "root@vm0:~zfs/.ssh/authorized_keys"
ssh root@vm0 'chown -R zfs ~zfs'
ssh root@vm0 'service sshd restart'
scp ~/src.txz "root@vm0:/tmp/src.txz"
ssh root@vm0 'tar -C / -zxf /tmp/src.txz'
fi
+9 -7
View File
@@ -20,7 +20,7 @@ function archlinux() {
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba sysstat rng-tools rsync wget xxhash
samba strace sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
@@ -28,6 +28,7 @@ function debian() {
export DEBIAN_FRONTEND="noninteractive"
echo "##[group]Running apt-get update+upgrade"
sudo sed -i '/[[:alpha:]]-backports/d' /etc/apt/sources.list
sudo apt-get update -y
sudo apt-get upgrade -y
echo "##[endgroup]"
@@ -40,9 +41,10 @@ function debian() {
libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \
python3-cffi python3-dev python3-distlib python3-packaging libtirpc-dev \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
rsync samba sysstat uuid-dev watchdog wget xfslibs-dev xxhash zlib1g-dev
rsync samba strace sysstat uuid-dev watchdog wget xfslibs-dev xxhash \
zlib1g-dev
echo "##[endgroup]"
}
@@ -51,7 +53,7 @@ function freebsd() {
echo "##[group]Install Development Tools"
sudo pkg install -y autoconf automake autotools base64 checkbashisms fio \
gdb gettext gettext-runtime git gmake gsed jq ksh93 lcov libtool lscpu \
gdb gettext gettext-runtime git gmake gsed jq ksh lcov libtool lscpu \
pkgconf python python3 pamtester pamtester qemu-guest-agent rsync xxhash
sudo pkg install -xy \
'^samba4[[:digit:]]+$' \
@@ -86,8 +88,8 @@ function rhel() {
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
zlib-devel
rpm-build rsync samba strace sysstat systemd watchdog wget xfsprogs-devel \
xxhash zlib-devel
echo "##[endgroup]"
}
@@ -103,7 +105,7 @@ function install_fedora_experimental_kernel {
our_version="$1"
sudo dnf -y copr enable @kernel-vanilla/stable
sudo dnf -y copr enable @kernel-vanilla/mainline
all="$(sudo dnf list --showduplicates kernel-*)"
all="$(sudo dnf list --showduplicates kernel-* python3-perf* perf* bpftool*)"
echo "Available versions:"
echo "$all"
+20 -3
View File
@@ -5,12 +5,13 @@
#
# Usage:
#
# qemu-4-build-vm.sh OS [--enable-debug][--dkms][--poweroff]
# [--release][--repo][--tarball]
# qemu-4-build-vm.sh OS [--enable-debug][--dkms][--patch-level NUM]
# [--poweroff][--release][--repo][--tarball]
#
# OS: OS name like 'fedora41'
# --enable-debug: Build RPMs with '--enable-debug' (for testing)
# --dkms: Build DKMS RPMs as well
# --patch-level NUM: Use a custom patch level number for packages.
# --poweroff: Power-off the VM after building
# --release Build zfs-release*.rpm as well
# --repo After building everything, copy RPMs into /tmp/repo
@@ -21,6 +22,7 @@
ENABLE_DEBUG=""
DKMS=""
PATCH_LEVEL=""
POWEROFF=""
RELEASE=""
REPO=""
@@ -35,6 +37,11 @@ while [[ $# -gt 0 ]]; do
DKMS=1
shift
;;
--patch-level)
PATCH_LEVEL=$2
shift
shift
;;
--poweroff)
POWEROFF=1
shift
@@ -215,6 +222,10 @@ function rpm_build_and_install() {
run ./autogen.sh
echo "##[endgroup]"
if [ -n "$PATCH_LEVEL" ] ; then
sed -i -E 's/(Release:\s+)1/\1'$PATCH_LEVEL'/g' META
fi
echo "##[group]Configure"
run ./configure --enable-debuginfo $extra
echo "##[endgroup]"
@@ -328,7 +339,13 @@ fi
# almalinux9.5
# fedora42
source /etc/os-release
sudo hostname "$ID$VERSION_ID"
if which hostnamectl &> /dev/null ; then
# Fedora 42+ use hostnamectl
sudo hostnamectl set-hostname "$ID$VERSION_ID"
sudo hostnamectl set-hostname --pretty "$ID$VERSION_ID"
else
sudo hostname "$ID$VERSION_ID"
fi
# save some sysinfo
uname -a > /var/tmp/uname.txt
+30 -9
View File
@@ -12,16 +12,26 @@ source /var/tmp/env.txt
# wait for poweroff to succeed
PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
sudo virsh undefine --nvram openzfs
# cpu pinning
CPUSET=("0,1" "2,3")
# additional options for virt-install
OPTS[0]=""
OPTS[1]=""
case "$OS" in
freebsd*)
# FreeBSD needs only 6GiB
RAM=6
;;
debian13)
RAM=8
# Boot Debian 13 with uefi=on and secureboot=off (ZFS Kernel Module not signed)
OPTS[0]="--boot"
OPTS[1]="firmware=efi,firmware.feature0.name=secure-boot,firmware.feature0.enabled=no"
;;
*)
# Linux needs more memory, but can be optimized to share it via KSM
RAM=8
@@ -79,7 +89,7 @@ EOF
--network bridge=virbr0,model=$NIC,mac="52:54:00:83:79:0$i" \
--disk $DISK-system,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-tests,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
--import --noautoconsole ${OPTS[0]} ${OPTS[1]}
done
# generate some memory stats
@@ -98,19 +108,30 @@ echo '*/5 * * * * /root/cronjob.sh' > crontab.txt
sudo crontab crontab.txt
rm crontab.txt
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for ((i=1; i<=VMs; i++)); do
.github/workflows/scripts/qemu-wait-for-vm.sh vm$i
done
echo "All $VMs VMs are up now."
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for ((i=1; i<=VMs; i++)); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
# Create the file so we can tail it, even if there's no output.
touch $RESPATH/vm$i/console.txt
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
# Write all VM boot lines to the console to aid in debugging failed boots.
# The boot lines from all the VMs will be munged together, so prepend each
# line with the vm hostname (like 'vm1:').
(while IFS=$'\n' read -r line; do echo "vm$i: $line" ; done < <(sudo tail -f $RESPATH/vm$i/console.txt)) &
done
echo "Console logging for ${VMs}x $OS started."
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for ((i=1; i<=VMs; i++)); do
.github/workflows/scripts/qemu-wait-for-vm.sh vm$i
done
echo "All $VMs VMs are up now."
+5 -3
View File
@@ -21,11 +21,13 @@ function prefix() {
S=$((DIFF-(M*60)))
CTR=$(cat /tmp/ctr)
echo $LINE| grep -q "^Test[: ]" && CTR=$((CTR+1)) && echo $CTR > /tmp/ctr
echo $LINE| grep -q '^\[.*] Test[: ]' && CTR=$((CTR+1)) && echo $CTR > /tmp/ctr
BASE="$HOME/work/zfs/zfs"
COLOR="$BASE/scripts/zfs-tests-color.sh"
CLINE=$(echo $LINE| grep "^Test[ :]" | sed -e 's|/usr/local|/usr|g' \
CLINE=$(echo $LINE| grep '^\[.*] Test[: ]' \
| sed -e 's|^\[.*] Test|Test|g' \
| sed -e 's|/usr/local|/usr|g' \
| sed -e 's| /usr/share/zfs/zfs-tests/tests/| |g' | $COLOR)
if [ -z "$CLINE" ]; then
printf "vm${ID}: %s\n" "$LINE"
@@ -109,7 +111,7 @@ fi
sudo dmesg -c > dmesg-prerun.txt
mount > mount.txt
df -h > df-prerun.txt
$TDIR/zfs-tests.sh -vK -s 3GB -T $TAGS
$TDIR/zfs-tests.sh -vKO -s 3GB -T $TAGS
RV=$?
df -h > df-postrun.txt
echo $RV > tests-exitcode.txt
+13 -2
View File
@@ -32,6 +32,11 @@ on:
options:
- "Build RPMs"
- "Test repo"
patch_level:
type: string
required: false
default: ""
description: "(optional) patch level number"
repo_url:
type: string
required: false
@@ -47,7 +52,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora41', 'fedora42']
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora41', 'fedora42', 'fedora43']
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
@@ -78,7 +83,13 @@ jobs:
mkdir -p /tmp/repo
ssh zfs@vm0 '$HOME/zfs/.github/workflows/scripts/qemu-test-repo-vm.sh' ${{ github.event.inputs.repo_url }}
else
.github/workflows/scripts/qemu-4-build.sh --repo --release --dkms --tarball ${{ matrix.os }}
EXTRA=""
if [ -n "${{ github.event.inputs.patch_level }}" ] ; then
EXTRA="--patch-level ${{ github.event.inputs.patch_level }}"
fi
.github/workflows/scripts/qemu-4-build.sh $EXTRA \
--repo --release --dkms --tarball ${{ matrix.os }}
fi
- name: Prepare artifacts
+35 -40
View File
@@ -5,16 +5,6 @@ on:
pull_request:
workflow_dispatch:
inputs:
include_stream9:
type: boolean
required: false
default: false
description: 'Test on CentOS 9 stream'
include_stream10:
type: boolean
required: false
default: false
description: 'Test on CentOS 10 stream'
fedora_kernel_ver:
type: string
required: false
@@ -39,41 +29,42 @@ jobs:
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "almalinux10", "debian11", "debian12", "fedora41", "fedora42", "freebsd13-4r", "freebsd14-2s", "freebsd15-0c", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "almalinux10", "debian12", "fedora42", "freebsd14-2r", "ubuntu24"]'
ci_type="default"
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
head=${{ github.event.pull_request.head.sha }}
base=${{ github.event.pull_request.base.sha }}
ci_type=$(python3 .github/workflows/scripts/generate-ci-type.py $head $base)
fi
if [ "$ci_type" == "quick" ]; then
os_selection="$QUICK_OS"
else
os_selection="$FULL_OS"
fi
if [ ${{ github.event.inputs.fedora_kernel_ver }} != "" ] ; then
# They specified a custom kernel version for Fedora. Use only
# Fedora runners.
case "$ci_type" in
quick)
os_selection='["almalinux8", "almalinux9", "almalinux10", "debian12", "fedora42", "freebsd15-0s", "ubuntu24"]'
;;
linux)
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian11", "debian12", "debian13", "fedora41", "fedora42", "fedora43", "ubuntu22", "ubuntu24"]'
;;
freebsd)
os_selection='["freebsd13-5r", "freebsd14-2r", "freebsd14-3r", "freebsd13-5s", "freebsd14-3s", "freebsd15-0s", "freebsd16-0c"]'
;;
*)
# default list
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian12", "debian13", "fedora42", "fedora43", "freebsd14-3r", "freebsd15-0s", "freebsd16-0c", "ubuntu22", "ubuntu24"]'
;;
esac
if ${{ github.event.inputs.fedora_kernel_ver != '' }}; then
# They specified a custom kernel version for Fedora.
# Use only Fedora runners.
os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]')
else
# Normal case
os_json=$(echo ${os_selection} | jq -c)
fi
# Add optional runners
if [ "${{ github.event.inputs.include_stream9 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream9"]')
fi
if [ "${{ github.event.inputs.include_stream10 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream10"]')
fi
echo $os_json
echo "os=$os_json" >> $GITHUB_OUTPUT
echo "ci_type=$ci_type" >> $GITHUB_OUTPUT
echo "os=$os_json" | tee -a $GITHUB_OUTPUT
echo "ci_type=$ci_type" | tee -a $GITHUB_OUTPUT
qemu-vm:
name: qemu-x86
@@ -81,13 +72,13 @@ jobs:
strategy:
fail-fast: false
matrix:
# rhl: almalinux8, almalinux9, centos-stream9, fedora41
# debian: debian11, debian12, ubuntu22, ubuntu24
# rhl: almalinux8, almalinux9, centos-streamX, fedora4x
# debian: debian12, debian13, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD variants of 2024-12:
# FreeBSD Release: freebsd13-4r, freebsd14-2r
# FreeBSD Stable: freebsd13-4s, freebsd14-2s
# FreeBSD Current: freebsd15-0c
# FreeBSD variants of november 2025:
# FreeBSD Release: freebsd13-5r, freebsd14-2r, freebsd14-3r
# FreeBSD Stable: freebsd13-5s, freebsd14-3s, freebsd15-0s
# FreeBSD Current: freebsd16-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04
steps:
@@ -96,8 +87,12 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
timeout-minutes: 10
run: .github/workflows/scripts/qemu-1-setup.sh
timeout-minutes: 20
run: |
# Add a timestamp to each line to debug timeouts
while IFS=$'\n' read -r line; do
echo "$(date +'%H:%M:%S') $line"
done < <(.github/workflows/scripts/qemu-1-setup.sh)
- name: Start build machine
timeout-minutes: 10
+12 -12
View File
@@ -12,7 +12,8 @@ jobs:
zloop:
runs-on: ubuntu-24.04
env:
TEST_DIR: /var/tmp/zloop
WORK_DIR: /mnt/zloop
CORE_DIR: /mnt/zloop/cores
steps:
- uses: actions/checkout@v4
with:
@@ -40,38 +41,37 @@ jobs:
sudo modprobe zfs
- name: Tests
run: |
sudo mkdir -p $TEST_DIR
# run for 10 minutes or at most 6 iterations for a maximum runner
# time of 60 minutes.
sudo /usr/share/zfs/zloop.sh -t 600 -I 6 -l -m 1 -- -T 120 -P 60
sudo truncate -s 256G /mnt/vdev
sudo zpool create cipool -m $WORK_DIR -O compression=on -o autotrim=on /mnt/vdev
sudo /usr/share/zfs/zloop.sh -t 600 -I 6 -l -m 1 -c $CORE_DIR -f $WORK_DIR -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R $TEST_DIR/
sudo chmod +r -R $WORK_DIR/
- name: Ztest log
if: failure()
run: |
grep -B10 -A1000 'ASSERT' $TEST_DIR/*/ztest.out || tail -n 1000 $TEST_DIR/*/ztest.out
grep -B10 -A1000 'ASSERT' $CORE_DIR/*/ztest.out || tail -n 1000 $CORE_DIR/*/ztest.out
- name: Gdb log
if: failure()
run: |
sed -n '/Backtraces (full)/q;p' $TEST_DIR/*/ztest.gdb
sed -n '/Backtraces (full)/q;p' $CORE_DIR/*/ztest.gdb
- name: Zdb log
if: failure()
run: |
cat $TEST_DIR/*/ztest.zdb
cat $CORE_DIR/*/ztest.zdb
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Logs
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
/mnt/zloop/*/
!/mnt/zloop/cores/*/vdev/
if-no-files-found: ignore
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Pool files
path: |
/var/tmp/zloop/*/vdev/
/mnt/zloop/cores/*/vdev/
if-no-files-found: ignore
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.3
Version: 2.3.5
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.15
Linux-Maximum: 6.17
Linux-Minimum: 4.18
+3
View File
@@ -559,6 +559,7 @@ def section_arc(kstats_dict):
print()
compressed_size = arc_stats['compressed_size']
uncompressed_size = arc_stats['uncompressed_size']
overhead_size = arc_stats['overhead_size']
bonus_size = arc_stats['bonus_size']
dnode_size = arc_stats['dnode_size']
@@ -671,6 +672,8 @@ def section_arc(kstats_dict):
print()
print('ARC misc:')
prt_i2('Uncompressed size:', f_perc(uncompressed_size, compressed_size),
f_bytes(uncompressed_size))
prt_i1('Memory throttles:', arc_stats['memory_throttle_count'])
prt_i1('Memory direct reclaims:', arc_stats['memory_direct_count'])
prt_i1('Memory indirect reclaims:', arc_stats['memory_indirect_count'])
+29 -22
View File
@@ -381,7 +381,7 @@ verify_livelist_allocs(metaslab_verify_t *mv, uint64_t txg,
sublivelist_verify_block_t svb = {{{0}}};
DVA_SET_VDEV(&svb.svb_dva, mv->mv_vdid);
DVA_SET_OFFSET(&svb.svb_dva, offset);
DVA_SET_ASIZE(&svb.svb_dva, size);
DVA_SET_ASIZE(&svb.svb_dva, 0);
zfs_btree_index_t where;
uint64_t end_offset = offset + size;
@@ -619,8 +619,9 @@ livelist_metaslab_validate(spa_t *spa)
metaslab_calculate_range_tree_type(vd, m,
&start, &shift);
metaslab_verify_t mv;
mv.mv_allocated = zfs_range_tree_create(NULL,
type, NULL, start, shift);
mv.mv_allocated = zfs_range_tree_create_flags(
NULL, type, NULL, start, shift,
0, "livelist_metaslab_validate:mv_allocated");
mv.mv_vdid = vd->vdev_id;
mv.mv_msid = m->ms_id;
mv.mv_start = m->ms_start;
@@ -2545,12 +2546,14 @@ snprintf_blkptr_compact(char *blkbuf, size_t buflen, const blkptr_t *bp,
blkbuf[0] = '\0';
for (i = 0; i < ndvas; i++)
for (i = 0; i < ndvas; i++) {
(void) snprintf(blkbuf + strlen(blkbuf),
buflen - strlen(blkbuf), "%llu:%llx:%llx ",
buflen - strlen(blkbuf), "%llu:%llx:%llx%s ",
(u_longlong_t)DVA_GET_VDEV(&dva[i]),
(u_longlong_t)DVA_GET_OFFSET(&dva[i]),
(u_longlong_t)DVA_GET_ASIZE(&dva[i]));
(u_longlong_t)DVA_GET_ASIZE(&dva[i]),
(DVA_GET_GANG(&dva[i]) ? "G" : ""));
}
if (BP_IS_HOLE(bp)) {
(void) snprintf(blkbuf + strlen(blkbuf),
@@ -6320,8 +6323,9 @@ zdb_claim_removing(spa_t *spa, zdb_cb_t *zcb)
ASSERT0(zfs_range_tree_space(svr->svr_allocd_segs));
zfs_range_tree_t *allocs = zfs_range_tree_create(NULL, ZFS_RANGE_SEG64,
NULL, 0, 0);
zfs_range_tree_t *allocs = zfs_range_tree_create_flags(
NULL, ZFS_RANGE_SEG64, NULL, 0, 0,
0, "zdb_claim_removing:allocs");
for (uint64_t msi = 0; msi < vd->vdev_ms_count; msi++) {
metaslab_t *msp = vd->vdev_ms[msi];
@@ -7704,7 +7708,8 @@ zdb_set_skip_mmp(char *target)
* applies to the new_path parameter if allocated.
*/
static char *
import_checkpointed_state(char *target, nvlist_t *cfg, char **new_path)
import_checkpointed_state(char *target, nvlist_t *cfg, boolean_t target_is_spa,
char **new_path)
{
int error = 0;
char *poolname, *bogus_name = NULL;
@@ -7712,11 +7717,11 @@ import_checkpointed_state(char *target, nvlist_t *cfg, char **new_path)
/* If the target is not a pool, the extract the pool name */
char *path_start = strchr(target, '/');
if (path_start != NULL) {
if (target_is_spa || path_start == NULL) {
poolname = target;
} else {
size_t poolname_len = path_start - target;
poolname = strndup(target, poolname_len);
} else {
poolname = target;
}
if (cfg == NULL) {
@@ -7747,10 +7752,11 @@ import_checkpointed_state(char *target, nvlist_t *cfg, char **new_path)
"with error %d\n", bogus_name, error);
}
if (new_path != NULL && path_start != NULL) {
if (asprintf(new_path, "%s%s", bogus_name, path_start) == -1) {
if (new_path != NULL && !target_is_spa) {
if (asprintf(new_path, "%s%s", bogus_name,
path_start != NULL ? path_start : "") == -1) {
free(bogus_name);
if (path_start != NULL)
if (!target_is_spa && path_start != NULL)
free(poolname);
return (NULL);
}
@@ -7979,7 +7985,7 @@ verify_checkpoint_blocks(spa_t *spa)
* name) so we can do verification on it against the current state
* of the pool.
*/
checkpoint_pool = import_checkpointed_state(spa->spa_name, NULL,
checkpoint_pool = import_checkpointed_state(spa->spa_name, NULL, B_TRUE,
NULL);
ASSERT(strcmp(spa->spa_name, checkpoint_pool) != 0);
@@ -8449,8 +8455,9 @@ dump_zpool(spa_t *spa)
if (dump_opt['d'] || dump_opt['i']) {
spa_feature_t f;
mos_refd_objs = zfs_range_tree_create(NULL, ZFS_RANGE_SEG64,
NULL, 0, 0);
mos_refd_objs = zfs_range_tree_create_flags(
NULL, ZFS_RANGE_SEG64, NULL, 0, 0,
0, "dump_zpool:mos_refd_objs");
dump_objset(dp->dp_meta_objset);
if (dump_opt['d'] >= 3) {
@@ -8981,7 +8988,7 @@ zdb_read_block(char *thing, spa_t *spa)
DVA_SET_VDEV(&dva[0], vd->vdev_id);
DVA_SET_OFFSET(&dva[0], offset);
DVA_SET_GANG(&dva[0], !!(flags & ZDB_FLAG_GBH));
DVA_SET_GANG(&dva[0], 0);
DVA_SET_ASIZE(&dva[0], vdev_psize_to_asize(vd, psize));
BP_SET_BIRTH(bp, TXG_INITIAL, TXG_INITIAL);
@@ -8996,7 +9003,7 @@ zdb_read_block(char *thing, spa_t *spa)
BP_SET_BYTEORDER(bp, ZFS_HOST_BYTEORDER);
spa_config_enter(spa, SCL_STATE, FTAG, RW_READER);
zio = zio_root(spa, NULL, NULL, 0);
zio = zio_root(spa, NULL, NULL, ZIO_FLAG_CANFAIL);
if (vd == vd->vdev_top) {
/*
@@ -9118,7 +9125,7 @@ zdb_read_block(char *thing, spa_t *spa)
ck_zio->io_offset =
DVA_GET_OFFSET(&bp->blk_dva[0]);
ck_zio->io_bp = bp;
zio_checksum_compute(ck_zio, ck, pabd, lsize);
zio_checksum_compute(ck_zio, ck, pabd, psize);
printf(
"%12s\t"
"cksum=%016llx:%016llx:%016llx:%016llx\n",
@@ -9695,7 +9702,7 @@ main(int argc, char **argv)
char *checkpoint_target = NULL;
if (dump_opt['k']) {
checkpoint_pool = import_checkpointed_state(target, cfg,
&checkpoint_target);
target_is_spa, &checkpoint_target);
if (checkpoint_target != NULL)
target = checkpoint_target;
+36 -31
View File
@@ -134,11 +134,13 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
* of blkid cache and L2ARC VDEV does not contain pool guid in its
* blkid, so this is a special case for L2ARC VDEV.
*/
else if (gsp->gs_vdev_guid != 0 && gsp->gs_devid == NULL &&
else if (gsp->gs_vdev_guid != 0 &&
nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &vdev_guid) == 0 &&
gsp->gs_vdev_guid == vdev_guid) {
(void) nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID,
&gsp->gs_devid);
if (gsp->gs_devid == NULL) {
(void) nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID,
&gsp->gs_devid);
}
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_EXPANSION_TIME,
&gsp->gs_vdev_expandtime);
return (B_TRUE);
@@ -156,22 +158,28 @@ zfs_agent_iter_pool(zpool_handle_t *zhp, void *arg)
/*
* For each vdev in this pool, look for a match by devid
*/
if ((config = zpool_get_config(zhp, NULL)) != NULL) {
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvl) == 0) {
(void) zfs_agent_iter_vdev(zhp, nvl, gsp);
}
}
/*
* if a match was found then grab the pool guid
*/
if (gsp->gs_vdev_guid && gsp->gs_devid) {
(void) nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&gsp->gs_pool_guid);
}
boolean_t found = B_FALSE;
uint64_t pool_guid;
/* Get pool configuration and extract pool GUID */
if ((config = zpool_get_config(zhp, NULL)) == NULL ||
nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&pool_guid) != 0)
goto out;
/* Skip this pool if we're looking for a specific pool */
if (gsp->gs_pool_guid != 0 && pool_guid != gsp->gs_pool_guid)
goto out;
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &nvl) == 0)
found = zfs_agent_iter_vdev(zhp, nvl, gsp);
if (found && gsp->gs_pool_guid == 0)
gsp->gs_pool_guid = pool_guid;
out:
zpool_close(zhp);
return (gsp->gs_devid != NULL && gsp->gs_vdev_guid != 0);
return (found);
}
void
@@ -233,20 +241,17 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
if (devid == NULL || pool_guid == 0 || vdev_guid == 0) {
if (devid == NULL)
search.gs_vdev_guid = vdev_guid;
else
search.gs_devid = devid;
zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
if (devid == NULL)
devid = search.gs_devid;
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
search.gs_devid = devid;
search.gs_vdev_guid = vdev_guid;
search.gs_pool_guid = pool_guid;
zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
if (devid == NULL)
devid = search.gs_devid;
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
/*
* We want to avoid reporting "remove" events coming from
+2 -1
View File
@@ -441,8 +441,9 @@ zed_notify_slack_webhook()
"${pathname}")"
# Construct the JSON message for posting.
# shellcheck disable=SC2016
#
msg_json="$(printf '{"text": "*%s*\\n%s"}' "${subject}" "${msg_body}" )"
msg_json="$(printf '{"text": "*%s*\\n```%s```"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#
+240 -11
View File
@@ -37,6 +37,7 @@
#include <assert.h>
#include <ctype.h>
#include <sys/debug.h>
#include <dirent.h>
#include <errno.h>
#include <getopt.h>
#include <libgen.h>
@@ -121,6 +122,7 @@ static int zfs_do_change_key(int argc, char **argv);
static int zfs_do_project(int argc, char **argv);
static int zfs_do_version(int argc, char **argv);
static int zfs_do_redact(int argc, char **argv);
static int zfs_do_rewrite(int argc, char **argv);
static int zfs_do_wait(int argc, char **argv);
#ifdef __FreeBSD__
@@ -193,6 +195,7 @@ typedef enum {
HELP_CHANGE_KEY,
HELP_VERSION,
HELP_REDACT,
HELP_REWRITE,
HELP_JAIL,
HELP_UNJAIL,
HELP_WAIT,
@@ -227,7 +230,7 @@ static zfs_command_t command_table[] = {
{ "promote", zfs_do_promote, HELP_PROMOTE },
{ "rename", zfs_do_rename, HELP_RENAME },
{ "bookmark", zfs_do_bookmark, HELP_BOOKMARK },
{ "program", zfs_do_channel_program, HELP_CHANNEL_PROGRAM },
{ "diff", zfs_do_diff, HELP_DIFF },
{ NULL },
{ "list", zfs_do_list, HELP_LIST },
{ NULL },
@@ -249,27 +252,31 @@ static zfs_command_t command_table[] = {
{ NULL },
{ "send", zfs_do_send, HELP_SEND },
{ "receive", zfs_do_receive, HELP_RECEIVE },
{ "redact", zfs_do_redact, HELP_REDACT },
{ NULL },
{ "allow", zfs_do_allow, HELP_ALLOW },
{ NULL },
{ "unallow", zfs_do_unallow, HELP_UNALLOW },
{ NULL },
{ "hold", zfs_do_hold, HELP_HOLD },
{ "holds", zfs_do_holds, HELP_HOLDS },
{ "release", zfs_do_release, HELP_RELEASE },
{ "diff", zfs_do_diff, HELP_DIFF },
{ NULL },
{ "load-key", zfs_do_load_key, HELP_LOAD_KEY },
{ "unload-key", zfs_do_unload_key, HELP_UNLOAD_KEY },
{ "change-key", zfs_do_change_key, HELP_CHANGE_KEY },
{ "redact", zfs_do_redact, HELP_REDACT },
{ NULL },
{ "program", zfs_do_channel_program, HELP_CHANNEL_PROGRAM },
{ "rewrite", zfs_do_rewrite, HELP_REWRITE },
{ "wait", zfs_do_wait, HELP_WAIT },
#ifdef __FreeBSD__
{ NULL },
{ "jail", zfs_do_jail, HELP_JAIL },
{ "unjail", zfs_do_unjail, HELP_UNJAIL },
#endif
#ifdef __linux__
{ NULL },
{ "zone", zfs_do_zone, HELP_ZONE },
{ "unzone", zfs_do_unzone, HELP_UNZONE },
#endif
@@ -432,6 +439,9 @@ get_usage(zfs_help_t idx)
case HELP_REDACT:
return (gettext("\tredact <snapshot> <bookmark> "
"<redaction_snapshot> ...\n"));
case HELP_REWRITE:
return (gettext("\trewrite [-rvx] [-o <offset>] [-l <length>] "
"<directory|file ...>\n"));
case HELP_JAIL:
return (gettext("\tjail <jailid|jailname> <filesystem>\n"));
case HELP_UNJAIL:
@@ -920,19 +930,15 @@ usage:
}
/*
* Return a default volblocksize for the pool which always uses more than
* half of the data sectors. This primarily applies to dRAID which always
* writes full stripe widths.
* Calculate the minimum allocation size based on the top-level vdevs.
*/
static uint64_t
default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
calculate_volblocksize(nvlist_t *config)
{
uint64_t volblocksize, asize = SPA_MINBLOCKSIZE;
uint64_t asize = SPA_MINBLOCKSIZE;
nvlist_t *tree, **vdevs;
uint_t nvdevs;
nvlist_t *config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &tree) != 0 ||
nvlist_lookup_nvlist_array(tree, ZPOOL_CONFIG_CHILDREN,
&vdevs, &nvdevs) != 0) {
@@ -963,6 +969,24 @@ default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
}
}
return (asize);
}
/*
* Return a default volblocksize for the pool which always uses more than
* half of the data sectors. This primarily applies to dRAID which always
* writes full stripe widths.
*/
static uint64_t
default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
{
uint64_t volblocksize, asize = SPA_MINBLOCKSIZE;
nvlist_t *config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_MAX_ALLOC, &asize) != 0)
asize = calculate_volblocksize(config);
/*
* Calculate the target volblocksize such that more than half
* of the asize is used. The following table is for 4k sectors.
@@ -7716,6 +7740,7 @@ unshare_unmount_path(int op, char *path, int flags, boolean_t is_manual)
struct extmnttab entry;
const char *cmdname = (op == OP_SHARE) ? "unshare" : "unmount";
ino_t path_inode;
char *zfs_mntpnt, *entry_mntpnt;
/*
* Search for the given (major,minor) pair in the mount table.
@@ -7757,6 +7782,24 @@ unshare_unmount_path(int op, char *path, int flags, boolean_t is_manual)
goto out;
}
/*
* If the filesystem is mounted, check that the mountpoint matches
* the one in the mnttab entry w.r.t. provided path. If it doesn't,
* then we should not proceed further.
*/
entry_mntpnt = strdup(entry.mnt_mountp);
if (zfs_is_mounted(zhp, &zfs_mntpnt)) {
if (strcmp(zfs_mntpnt, entry_mntpnt) != 0) {
(void) fprintf(stderr, gettext("cannot %s '%s': "
"not an original mountpoint\n"), cmdname, path);
free(zfs_mntpnt);
free(entry_mntpnt);
goto out;
}
free(zfs_mntpnt);
}
free(entry_mntpnt);
if (op == OP_SHARE) {
char nfs_mnt_prop[ZFS_MAXPROPLEN];
char smbshare_prop[ZFS_MAXPROPLEN];
@@ -9013,6 +9056,192 @@ zfs_do_project(int argc, char **argv)
return (ret);
}
static int
zfs_rewrite_file(const char *path, boolean_t verbose, zfs_rewrite_args_t *args)
{
int fd, ret = 0;
fd = open(path, O_WRONLY);
if (fd < 0) {
ret = errno;
(void) fprintf(stderr, gettext("failed to open %s: %s\n"),
path, strerror(errno));
return (ret);
}
if (ioctl(fd, ZFS_IOC_REWRITE, args) < 0) {
ret = errno;
(void) fprintf(stderr, gettext("failed to rewrite %s: %s\n"),
path, strerror(errno));
} else if (verbose) {
printf("%s\n", path);
}
close(fd);
return (ret);
}
static int
zfs_rewrite_dir(const char *path, boolean_t verbose, boolean_t xdev, dev_t dev,
zfs_rewrite_args_t *args, nvlist_t *dirs)
{
struct dirent *ent;
DIR *dir;
int ret = 0, err;
dir = opendir(path);
if (dir == NULL) {
if (errno == ENOENT)
return (0);
ret = errno;
(void) fprintf(stderr, gettext("failed to opendir %s: %s\n"),
path, strerror(errno));
return (ret);
}
size_t plen = strlen(path) + 1;
while ((ent = readdir(dir)) != NULL) {
char *fullname;
struct stat st;
if (ent->d_type != DT_REG && ent->d_type != DT_DIR)
continue;
if (strcmp(ent->d_name, ".") == 0 ||
strcmp(ent->d_name, "..") == 0)
continue;
if (plen + strlen(ent->d_name) >= PATH_MAX) {
(void) fprintf(stderr, gettext("path too long %s/%s\n"),
path, ent->d_name);
ret = ENAMETOOLONG;
continue;
}
if (asprintf(&fullname, "%s/%s", path, ent->d_name) == -1) {
(void) fprintf(stderr,
gettext("failed to allocate memory\n"));
ret = ENOMEM;
continue;
}
if (xdev) {
if (lstat(fullname, &st) < 0) {
ret = errno;
(void) fprintf(stderr,
gettext("failed to stat %s: %s\n"),
fullname, strerror(errno));
free(fullname);
continue;
}
if (st.st_dev != dev) {
free(fullname);
continue;
}
}
if (ent->d_type == DT_REG) {
err = zfs_rewrite_file(fullname, verbose, args);
if (err)
ret = err;
} else { /* DT_DIR */
fnvlist_add_uint64(dirs, fullname, dev);
}
free(fullname);
}
closedir(dir);
return (ret);
}
static int
zfs_rewrite_path(const char *path, boolean_t verbose, boolean_t recurse,
boolean_t xdev, zfs_rewrite_args_t *args, nvlist_t *dirs)
{
struct stat st;
int ret = 0;
if (lstat(path, &st) < 0) {
ret = errno;
(void) fprintf(stderr, gettext("failed to stat %s: %s\n"),
path, strerror(errno));
return (ret);
}
if (S_ISREG(st.st_mode)) {
ret = zfs_rewrite_file(path, verbose, args);
} else if (S_ISDIR(st.st_mode) && recurse) {
ret = zfs_rewrite_dir(path, verbose, xdev, st.st_dev, args,
dirs);
}
return (ret);
}
static int
zfs_do_rewrite(int argc, char **argv)
{
int ret = 0, err, c;
boolean_t recurse = B_FALSE, verbose = B_FALSE, xdev = B_FALSE;
if (argc < 2)
usage(B_FALSE);
zfs_rewrite_args_t args;
memset(&args, 0, sizeof (args));
while ((c = getopt(argc, argv, "l:o:rvx")) != -1) {
switch (c) {
case 'l':
args.len = strtoll(optarg, NULL, 0);
break;
case 'o':
args.off = strtoll(optarg, NULL, 0);
break;
case 'r':
recurse = B_TRUE;
break;
case 'v':
verbose = B_TRUE;
break;
case 'x':
xdev = B_TRUE;
break;
default:
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
usage(B_FALSE);
}
}
argv += optind;
argc -= optind;
if (argc == 0) {
(void) fprintf(stderr,
gettext("missing file or directory target(s)\n"));
usage(B_FALSE);
}
nvlist_t *dirs = fnvlist_alloc();
for (int i = 0; i < argc; i++) {
err = zfs_rewrite_path(argv[i], verbose, recurse, xdev, &args,
dirs);
if (err)
ret = err;
}
nvpair_t *dir;
while ((dir = nvlist_next_nvpair(dirs, NULL)) != NULL) {
err = zfs_rewrite_dir(nvpair_name(dir), verbose, xdev,
fnvpair_value_uint64(dir), &args, dirs);
if (err)
ret = err;
fnvlist_remove_nvpair(dirs, dir);
}
fnvlist_free(dirs);
return (ret);
}
static int
zfs_do_wait(int argc, char **argv)
{
+28 -8
View File
@@ -145,11 +145,11 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
switch (zpc->zpc_op) {
case ZFS_PROJECT_OP_LIST:
(void) printf("%5u %c %s\n", fsx.fsx_projid,
(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) ? 'P' : '-', name);
(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT) ? 'P' : '-', name);
goto out;
case ZFS_PROJECT_OP_CHECK:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
fsx.fsx_xflags & ZFS_PROJINHERIT_FL)
fsx.fsx_xflags & FS_XFLAG_PROJINHERIT)
goto out;
if (!zpc->zpc_newline) {
@@ -164,29 +164,30 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
"(%u/%u)\n", name, fsx.fsx_projid,
(uint32_t)zpc->zpc_expected_projid);
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
if (!(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT))
(void) printf("%s - project inherit flag is not set\n",
name);
goto out;
case ZFS_PROJECT_OP_CLEAR:
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) &&
if (!(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT) &&
(zpc->zpc_keep_projid ||
fsx.fsx_projid == ZFS_DEFAULT_PROJID))
goto out;
fsx.fsx_xflags &= ~ZFS_PROJINHERIT_FL;
fsx.fsx_xflags &= ~FS_XFLAG_PROJINHERIT;
if (!zpc->zpc_keep_projid)
fsx.fsx_projid = ZFS_DEFAULT_PROJID;
break;
case ZFS_PROJECT_OP_SET:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
(!zpc->zpc_set_flag || fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
(!zpc->zpc_set_flag ||
fsx.fsx_xflags & FS_XFLAG_PROJINHERIT))
goto out;
fsx.fsx_projid = zpc->zpc_expected_projid;
if (zpc->zpc_set_flag)
fsx.fsx_xflags |= ZFS_PROJINHERIT_FL;
fsx.fsx_xflags |= FS_XFLAG_PROJINHERIT;
break;
default:
ASSERT(0);
@@ -194,11 +195,30 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
}
ret = ioctl(fd, ZFS_IOC_FSSETXATTR, &fsx);
if (ret)
if (ret) {
(void) fprintf(stderr,
gettext("failed to set xattr for %s: %s\n"),
name, strerror(errno));
if (errno == ENOTSUP) {
char *kver = zfs_version_kernel();
/*
* Special case: a module/userspace version mismatch can
* return ENOTSUP due to us fixing the XFLAGs bits in
* #17884. In that case give a hint to the user that
* they should take action to make the versions match.
*/
if (strcmp(kver, ZFS_META_ALIAS) != 0) {
fprintf(stderr,
gettext("Warning: The zfs module version "
"(%s) and userspace\nversion (%s) do not "
"match up. This may be the\ncause of the "
"\"Operation not supported\" error.\n"),
kver, ZFS_META_ALIAS);
}
}
}
out:
close(fd);
return (ret);
+4
View File
@@ -363,10 +363,12 @@ feature_incr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg;
uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount + 1, tx);
spa_history_log_internal(spa, "zhack feature incr", tx,
"name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
}
static void
@@ -376,10 +378,12 @@ feature_decr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg;
uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount - 1, tx);
spa_history_log_internal(spa, "zhack feature decr", tx,
"name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
}
static void
+17 -11
View File
@@ -610,22 +610,28 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
ZPOOL_CONFIG_PATH, &path) == 0);
/*
* If we have a raidz/mirror that combines disks
* with files, report it as an error.
* Skip active spares they should never cause
* the pool to be evaluated as inconsistent.
*/
if (!dontreport && type != NULL &&
if (is_spare(NULL, path))
continue;
/*
* If we have a raidz/mirror that combines disks
* with files, only report it as an error when
* fatal is set to ensure all the replication
* checks aren't skipped in check_replication().
*/
if (fatal && !dontreport && type != NULL &&
strcmp(type, childtype) != 0) {
if (ret != NULL)
free(ret);
ret = NULL;
if (fatal)
vdev_error(gettext(
"mismatched replication "
"level: %s contains both "
"files and devices\n"),
rep.zprl_type);
else
return (NULL);
vdev_error(gettext(
"mismatched replication "
"level: %s contains both "
"files and devices\n"),
rep.zprl_type);
dontreport = B_TRUE;
}
+25 -18
View File
@@ -3881,7 +3881,7 @@ ztest_vdev_attach_detach(ztest_ds_t *zd, uint64_t id)
* If newvd is too small, it should fail with EOVERFLOW.
*
* If newvd is a distributed spare and it's being attached to a
* dRAID which is not its parent it should fail with EINVAL.
* dRAID which is not its parent it should fail with ENOTSUP.
*/
if (pvd->vdev_ops != &vdev_mirror_ops &&
pvd->vdev_ops != &vdev_root_ops && (!replacing ||
@@ -3900,7 +3900,7 @@ ztest_vdev_attach_detach(ztest_ds_t *zd, uint64_t id)
else if (ashift > oldvd->vdev_top->vdev_ashift)
expected_error = EDOM;
else if (newvd_is_dspare && pvd != vdev_draid_spare_get_parent(newvd))
expected_error = EINVAL;
expected_error = ENOTSUP;
else
expected_error = 0;
@@ -7812,6 +7812,9 @@ ztest_dataset_open(int d)
ztest_dataset_name(name, ztest_opts.zo_pool, d);
if (ztest_opts.zo_verbose >= 6)
(void) printf("Opening %s\n", name);
(void) pthread_rwlock_rdlock(&ztest_name_lock);
error = ztest_dataset_create(name);
@@ -8307,41 +8310,44 @@ static void
ztest_generic_run(ztest_shared_t *zs, spa_t *spa)
{
kthread_t **run_threads;
int t;
int i, ndatasets;
run_threads = umem_zalloc(ztest_opts.zo_threads * sizeof (kthread_t *),
UMEM_NOFAIL);
/*
* Actual number of datasets to be used.
*/
ndatasets = MIN(ztest_opts.zo_datasets, ztest_opts.zo_threads);
/*
* Prepare the datasets first.
*/
for (i = 0; i < ndatasets; i++)
VERIFY0(ztest_dataset_open(i));
/*
* Kick off all the tests that run in parallel.
*/
for (t = 0; t < ztest_opts.zo_threads; t++) {
if (t < ztest_opts.zo_datasets && ztest_dataset_open(t) != 0) {
umem_free(run_threads, ztest_opts.zo_threads *
sizeof (kthread_t *));
return;
}
run_threads[t] = thread_create(NULL, 0, ztest_thread,
(void *)(uintptr_t)t, 0, NULL, TS_RUN | TS_JOINABLE,
for (i = 0; i < ztest_opts.zo_threads; i++) {
run_threads[i] = thread_create(NULL, 0, ztest_thread,
(void *)(uintptr_t)i, 0, NULL, TS_RUN | TS_JOINABLE,
defclsyspri);
}
/*
* Wait for all of the tests to complete.
*/
for (t = 0; t < ztest_opts.zo_threads; t++)
VERIFY0(thread_join(run_threads[t]));
for (i = 0; i < ztest_opts.zo_threads; i++)
VERIFY0(thread_join(run_threads[i]));
/*
* Close all datasets. This must be done after all the threads
* are joined so we can be sure none of the datasets are in-use
* by any of the threads.
*/
for (t = 0; t < ztest_opts.zo_threads; t++) {
if (t < ztest_opts.zo_datasets)
ztest_dataset_close(t);
}
for (i = 0; i < ndatasets; i++)
ztest_dataset_close(i);
txg_wait_synced(spa_get_dsl(spa), 0);
@@ -8464,6 +8470,7 @@ ztest_run(ztest_shared_t *zs)
int d = ztest_random(ztest_opts.zo_datasets);
ztest_dataset_destroy(d);
txg_wait_synced(spa_get_dsl(spa), 0);
}
zs->zs_enospc_count = 0;
+2 -2
View File
@@ -72,7 +72,7 @@
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.
#serial 36
#serial 37
AU_ALIAS([AC_PYTHON_DEVEL], [AX_PYTHON_DEVEL])
AC_DEFUN([AX_PYTHON_DEVEL],[
@@ -316,7 +316,7 @@ EOD`
PYTHON_LIBS="-L$ac_python_libdir -lpython$ac_python_version"
fi
if test -z "PYTHON_LIBS"; then
if test -z "$PYTHON_LIBS"; then
AC_MSG_WARN([
Cannot determine location of your Python DSO. Please check it was installed with
dynamic libraries enabled, or try setting PYTHON_LIBS by hand.
+7 -3
View File
@@ -24,6 +24,9 @@ dnl #
dnl # 2.6.38 API change
dnl # Added d_set_d_op() helper function.
dnl #
dnl # 6.17 API change
dnl # d_set_d_op() removed. No direct replacement.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_D_SET_D_OP], [
ZFS_LINUX_TEST_SRC([d_set_d_op], [
#include <linux/dcache.h>
@@ -34,11 +37,12 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_D_SET_D_OP], [
AC_DEFUN([ZFS_AC_KERNEL_D_SET_D_OP], [
AC_MSG_CHECKING([whether d_set_d_op() is available])
ZFS_LINUX_TEST_RESULT_SYMBOL([d_set_d_op],
[d_set_d_op], [fs/dcache.c], [
ZFS_LINUX_TEST_RESULT([d_set_d_op], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_D_SET_D_OP, 1,
[Define if d_set_d_op() is available])
], [
ZFS_LINUX_TEST_ERROR([d_set_d_op])
AC_MSG_RESULT(no)
])
])
+24
View File
@@ -0,0 +1,24 @@
dnl #
dnl # Linux 5.2 API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SOPS_FREE_INODE], [
ZFS_LINUX_TEST_SRC([super_operations_free_inode], [
#include <linux/fs.h>
static void free_inode(struct inode *) { }
static struct super_operations sops __attribute__ ((unused)) = {
.free_inode = free_inode,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_SOPS_FREE_INODE], [
AC_MSG_CHECKING([whether sops->free_inode() exists])
ZFS_LINUX_TEST_RESULT([super_operations_free_inode], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SOPS_FREE_INODE, 1, [sops->free_inode() exists])
],[
AC_MSG_RESULT(no)
])
])
+17
View File
@@ -49,6 +49,15 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_OBJTOOL], [
#error "STACK_FRAME_NON_STANDARD is not defined."
#endif
])
dnl # 6.15 made CONFIG_OBJTOOL_WERROR=y the default. We need to handle
dnl # this or our build will fail.
ZFS_LINUX_TEST_SRC([config_objtool_werror], [
#if !defined(CONFIG_OBJTOOL_WERROR)
#error "CONFIG_OBJTOOL_WERROR is not defined."
#endif
])
])
AC_DEFUN([ZFS_AC_KERNEL_OBJTOOL], [
@@ -84,6 +93,14 @@ AC_DEFUN([ZFS_AC_KERNEL_OBJTOOL], [
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether CONFIG_OBJTOOL_WERROR is defined])
ZFS_LINUX_TEST_RESULT([config_objtool_werror],[
AC_MSG_RESULT(yes)
CONFIG_OBJTOOL_WERROR_DEFINED=yes
],[
AC_MSG_RESULT(no)
])
],[
AC_MSG_RESULT(no)
])
+23
View File
@@ -0,0 +1,23 @@
dnl #
dnl # Linux 6.16 removed readahead_page
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_PAGEMAP_READAHEAD_PAGE], [
ZFS_LINUX_TEST_SRC([pagemap_has_readahead_page], [
#include <linux/pagemap.h>
], [
struct page *p __attribute__ ((unused)) = NULL;
struct readahead_control *ractl __attribute__ ((unused)) = NULL;
p = readahead_page(ractl);
])
])
AC_DEFUN([ZFS_AC_KERNEL_PAGEMAP_READAHEAD_PAGE], [
AC_MSG_CHECKING([whether readahead_page() exists])
ZFS_LINUX_TEST_RESULT([pagemap_has_readahead_page], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_PAGEMAP_READAHEAD_PAGE, 1,
[readahead_page() exists])
],[
AC_MSG_RESULT([no])
])
])
+24
View File
@@ -0,0 +1,24 @@
dnl #
dnl # Linux 6.16 removes address_space_operations ->writepage
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_WRITEPAGE], [
ZFS_LINUX_TEST_SRC([vfs_has_writepage], [
#include <linux/fs.h>
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.writepage = NULL,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_WRITEPAGE], [
AC_MSG_CHECKING([whether aops->writepage exists])
ZFS_LINUX_TEST_RESULT([vfs_has_writepage], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_WRITEPAGE, 1,
[address_space_operations->writepage exists])
],[
AC_MSG_RESULT([no])
])
])
+6
View File
@@ -82,6 +82,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_VFS_MIGRATEPAGE
ZFS_AC_KERNEL_SRC_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_SRC_VFS_READPAGES
ZFS_AC_KERNEL_SRC_VFS_WRITEPAGE
ZFS_AC_KERNEL_SRC_VFS_SET_PAGE_DIRTY_NOBUFFERS
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
@@ -111,6 +112,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_STANDALONE_LINUX_STDARG
ZFS_AC_KERNEL_SRC_STRLCPY
ZFS_AC_KERNEL_SRC_PAGEMAP_FOLIO_WAIT_BIT
ZFS_AC_KERNEL_SRC_PAGEMAP_READAHEAD_PAGE
ZFS_AC_KERNEL_SRC_ADD_DISK
ZFS_AC_KERNEL_SRC_KTHREAD
ZFS_AC_KERNEL_SRC_ZERO_PAGE
@@ -132,6 +134,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_PIN_USER_PAGES
ZFS_AC_KERNEL_SRC_TIMER
ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_WB_ERR
ZFS_AC_KERNEL_SRC_SOPS_FREE_INODE
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@@ -197,6 +200,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_VFS_MIGRATEPAGE
ZFS_AC_KERNEL_VFS_FSYNC_2ARGS
ZFS_AC_KERNEL_VFS_READPAGES
ZFS_AC_KERNEL_VFS_WRITEPAGE
ZFS_AC_KERNEL_VFS_SET_PAGE_DIRTY_NOBUFFERS
ZFS_AC_KERNEL_VFS_IOV_ITER
ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
@@ -226,6 +230,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_STANDALONE_LINUX_STDARG
ZFS_AC_KERNEL_STRLCPY
ZFS_AC_KERNEL_PAGEMAP_FOLIO_WAIT_BIT
ZFS_AC_KERNEL_PAGEMAP_READAHEAD_PAGE
ZFS_AC_KERNEL_ADD_DISK
ZFS_AC_KERNEL_KTHREAD
ZFS_AC_KERNEL_ZERO_PAGE
@@ -248,6 +253,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_PIN_USER_PAGES
ZFS_AC_KERNEL_TIMER
ZFS_AC_KERNEL_SUPER_BLOCK_S_WB_ERR
ZFS_AC_KERNEL_SOPS_FREE_INODE
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
+46 -23
View File
@@ -38,9 +38,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE], [
AC_MSG_CHECKING([whether host toolchain supports SSE])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
__asm__ __volatile__("xorps %xmm0, %xmm1");
return (0);
}
]])], [
AC_DEFINE([HAVE_SSE], 1, [Define if host toolchain supports SSE])
@@ -57,9 +58,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2], [
AC_MSG_CHECKING([whether host toolchain supports SSE2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
__asm__ __volatile__("pxor %xmm0, %xmm1");
return (0);
}
]])], [
AC_DEFINE([HAVE_SSE2], 1, [Define if host toolchain supports SSE2])
@@ -76,10 +78,11 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
char v[16];
__asm__ __volatile__("lddqu %0,%%xmm0" :: "m"(v[0]));
return (0);
}
]])], [
AC_DEFINE([HAVE_SSE3], 1, [Define if host toolchain supports SSE3])
@@ -96,9 +99,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
__asm__ __volatile__("pshufb %xmm0,%xmm1");
return (0);
}
]])], [
AC_DEFINE([HAVE_SSSE3], 1, [Define if host toolchain supports SSSE3])
@@ -115,9 +119,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.1])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
__asm__ __volatile__("pmaxsb %xmm0,%xmm1");
return (0);
}
]])], [
AC_DEFINE([HAVE_SSE4_1], 1, [Define if host toolchain supports SSE4.1])
@@ -134,9 +139,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
__asm__ __volatile__("pcmpgtq %xmm0, %xmm1");
return (0);
}
]])], [
AC_DEFINE([HAVE_SSE4_2], 1, [Define if host toolchain supports SSE4.2])
@@ -153,10 +159,11 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX], [
AC_MSG_CHECKING([whether host toolchain supports AVX])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
int main()
{
char v[32];
__asm__ __volatile__("vmovdqa %0,%%ymm0" :: "m"(v[0]));
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -174,9 +181,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpshufb %ymm0,%ymm1,%ymm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -194,9 +202,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512F], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpandd %zmm0,%zmm1,%zmm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -214,9 +223,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512CD], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vplzcntd %zmm0,%zmm1");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -234,9 +244,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512DQ], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vandpd %zmm0,%zmm1,%zmm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -254,9 +265,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512BW], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpshufb %zmm0,%zmm1,%zmm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -274,9 +286,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512IFMA], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpmadd52luq %zmm0,%zmm1,%zmm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -294,9 +307,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VBMI], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpermb %zmm0,%zmm1,%zmm2");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -314,9 +328,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512PF], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vgatherpf0dps (%rsi,%zmm0,4){%k1}");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -334,9 +349,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512ER], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vexp2pd %zmm0,%zmm1");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -354,9 +370,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VL], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("vpabsq %zmm0,%zmm1");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -374,9 +391,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AES], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("aesenc %xmm0, %xmm1");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -394,9 +412,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_PCLMULQDQ], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("pclmulqdq %0, %%xmm0, %%xmm1" :: "i"(0));
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -414,9 +433,10 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_MOVBE], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
__asm__ __volatile__("movbe 0(%eax), %eax");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -434,10 +454,11 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVE], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
char b[4096] __attribute__ ((aligned (64)));
__asm__ __volatile__("xsave %[b]\n" : : [b] "m" (*b) : "memory");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -455,10 +476,11 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVEOPT], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
char b[4096] __attribute__ ((aligned (64)));
__asm__ __volatile__("xsaveopt %[b]\n" : : [b] "m" (*b) : "memory");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
@@ -476,10 +498,11 @@ AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_XSAVES], [
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
int main()
{
char b[4096] __attribute__ ((aligned (64)));
__asm__ __volatile__("xsaves %[b]\n" : : [b] "m" (*b) : "memory");
return (0);
}
]])], [
AC_MSG_RESULT([yes])
+34
View File
@@ -0,0 +1,34 @@
dnl #
dnl # Check for statx() function and STATX_MNT_ID availability
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
AC_CHECK_HEADERS([linux/stat.h],
[have_stat_headers=yes],
[have_stat_headers=no])
AS_IF([test "x$have_stat_headers" = "xyes"], [
AC_CHECK_FUNC([statx], [
AC_DEFINE([HAVE_STATX], [1], [statx() is available])
dnl Check for STATX_MNT_ID availability
AC_MSG_CHECKING([for STATX_MNT_ID])
AC_COMPILE_IFELSE([
AC_LANG_PROGRAM([[
#include <linux/stat.h>
]], [[
struct statx stx;
int mask = STATX_MNT_ID;
(void)mask;
(void)stx.stx_mnt_id;
]])
], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_STATX_MNT_ID], [1], [STATX_MNT_ID is available])
], [
AC_MSG_RESULT([no])
])
])
], [
AC_MSG_WARN([linux/stat.h not found; skipping statx support])
])
]) dnl end AC_DEFUN
+1
View File
@@ -17,6 +17,7 @@ AC_DEFUN([ZFS_AC_CONFIG_USER], [
ZFS_AC_CONFIG_USER_LIBUDEV
ZFS_AC_CONFIG_USER_LIBUUID
ZFS_AC_CONFIG_USER_LIBBLKID
ZFS_AC_CONFIG_USER_STATX
])
ZFS_AC_CONFIG_USER_LIBTIRPC
ZFS_AC_CONFIG_USER_LIBCRYPTO
+40
View File
@@ -205,6 +205,46 @@ AC_DEFUN([ZFS_AC_DEBUG_INVARIANTS], [
AC_MSG_RESULT([$enable_invariants])
])
dnl # Disabled by default. If enabled allows a configured "turn objtools
dnl # warnings into errors" (CONFIG_OBJTOOL_WERROR) behavior to take effect.
dnl # If disabled, objtool warnings are never turned into errors. It can't
dnl # be enabled if the kernel wasn't compiled with CONFIG_OBJTOOL_WERROR=y.
dnl #
AC_DEFUN([ZFS_AC_OBJTOOL_WERROR], [
AC_MSG_CHECKING([whether objtool error on warning behavior is enabled])
AC_ARG_ENABLE([objtool-werror],
[AS_HELP_STRING([--enable-objtool-werror],
[Enable objtool's error on warning behaviour if present @<:@default=no@:>@])],
[enable_objtool_werror=$enableval],
[enable_objtool_werror=no])
AC_MSG_RESULT([$enable_objtool_werror])
AS_IF([test x$CONFIG_OBJTOOL_WERROR_DEFINED = xyes],[
AS_IF([test x$enable_objtool_werror = xyes],[
AC_MSG_NOTICE([enable-objtool-werror defined, keeping -Werror ])
],[
AC_MSG_NOTICE([enable-objtool-werror undefined, disabling -Werror ])
OBJTOOL_DISABLE_WERROR=y
abs_objtool_binary=$kernelsrc/tools/objtool/objtool
AS_IF([test -x $abs_objtool_binary],[],[
AC_MSG_ERROR([*** objtool binary $abs_objtool_binary not found])
])
dnl # The path to the wrapper is defined in modules/Makefile.in.
])
],[
dnl # We can't enable --Werror if it's not there.
AS_IF([test x$enable_objtool_werror = xyes],[
AC_MSG_ERROR([
*** Cannot enable objtool-werror,
*** a kernel built with CONFIG_OBJTOOL_WERROR=y is required.
])
],[])
])
AC_SUBST(OBJTOOL_DISABLE_WERROR)
AC_SUBST(abs_objtool_binary)
])
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS], [
AX_COUNT_CPUS([])
AC_SUBST(CPU_COUNT)
+2
View File
@@ -65,6 +65,7 @@ ZFS_AC_DEBUGINFO
ZFS_AC_DEBUG_KMEM
ZFS_AC_DEBUG_KMEM_TRACKING
ZFS_AC_DEBUG_INVARIANTS
ZFS_AC_OBJTOOL_WERROR
AC_CONFIG_FILES([
contrib/debian/rules
@@ -86,6 +87,7 @@ AC_CONFIG_FILES([
zfs.release
])
AC_CONFIG_FILES([scripts/objtool-wrapper], [chmod +x scripts/objtool-wrapper])
AC_OUTPUT
+4 -4
View File
@@ -100,8 +100,8 @@ Depends: ${misc:Depends}, ${shlibs:Depends}
# The libcurl4 is loaded through dlopen("libcurl.so.4").
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988521
Recommends: libcurl4
Breaks: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Replaces: libzfs2, libzfs4, libzfs4linux, libzfs6linux
Breaks: libzfs2, libzfs4, libzfs4linux, libzfs6linux, openzfs-libzfs4
Replaces: libzfs2, libzfs4, libzfs4linux, libzfs6linux, openzfs-libzfs4
Conflicts: libzfs6linux
Description: OpenZFS filesystem library for Linux - general support
OpenZFS is a storage platform that encompasses the functionality of
@@ -128,8 +128,8 @@ Package: openzfs-libzpool6
Section: contrib/libs
Architecture: linux-any
Depends: ${misc:Depends}, ${shlibs:Depends}
Breaks: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Replaces: libzpool2, libzpool5, libzpool5linux, libzpool6linux
Breaks: libzpool2, libzpool5, libzpool6linux
Replaces: libzpool2, libzpool5, libzpool6linux
Conflicts: libzpool6linux
Description: OpenZFS pool library for Linux
OpenZFS is a storage platform that encompasses the functionality of
+2
View File
@@ -8,6 +8,7 @@ lib/systemd/system/zfs-import-scan.service
lib/systemd/system/zfs-import.target
lib/systemd/system/zfs-load-key.service
lib/systemd/system/zfs-mount.service
lib/systemd/system/zfs-mount@.service
lib/systemd/system/zfs-scrub-monthly@.timer
lib/systemd/system/zfs-scrub-weekly@.timer
lib/systemd/system/zfs-scrub@.service
@@ -73,6 +74,7 @@ usr/share/man/man8/zfs-recv.8
usr/share/man/man8/zfs-redact.8
usr/share/man/man8/zfs-release.8
usr/share/man/man8/zfs-rename.8
usr/share/man/man8/zfs-rewrite.8
usr/share/man/man8/zfs-rollback.8
usr/share/man/man8/zfs-send.8
usr/share/man/man8/zfs-set.8
+3 -3
View File
@@ -93,7 +93,7 @@ override_dh_auto_install:
@# Install the DKMS source.
@# We only want the files needed to build the modules
install -D -t '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/scripts' \
'$(CURDIR)/scripts/dkms.postbuild'
'$(CURDIR)/scripts/dkms.postbuild' '$(CURDIR)/scripts/objtool-wrapper.in'
$(foreach file,$(DKMSFILES),mv '$(CURDIR)/$(NAME)-$(DEB_VERSION_UPSTREAM)/$(file)' '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)' || exit 1;)
@# Only ever build Linux modules
@@ -108,8 +108,8 @@ override_dh_auto_install:
@# - zfs.release$
@# * Takes care of spaces and tabs
@# * Remove reference to ZFS_AC_PACKAGE
awk '/^AC_CONFIG_FILES\(\[/,/^\]\)/ {\
if ($$0 !~ /^(AC_CONFIG_FILES\(\[([ \t]+)?$$|\]\)([ \t]+)?$$|([ \t]+)?(include\/(Makefile|sys|os\/(Makefile|linux))|module\/|Makefile([ \t]+)?$$|zfs\.release([ \t]+)?$$))/) \
awk '/^AC_CONFIG_FILES\(\[/,/\]\)/ {\
if ($$0 !~ /^(AC_CONFIG_FILES\(\[([ \t]+)?$$|\]\)([ \t]+)?$$|([ \t]+)?(include\/(Makefile|sys|os\/(Makefile|linux))|module\/|Makefile([ \t]+)?$$|zfs\.release([ \t]+)?$$))|scripts\/objtool-wrapper.*\]\)$$/) \
{next} } {print}' \
'$(CURDIR)/$(NAME)-$(DEB_VERSION_UPSTREAM)/configure.ac' | sed '/ZFS_AC_PACKAGE/d' > '$(CURDIR)/debian/tmp/usr/src/$(NAME)-$(DEB_VERSION_UPSTREAM)/configure.ac'
@# Set "SUBDIRS = module include" for CONFIG_KERNEL and remove SUBDIRS for all other configs.
+2 -1
View File
@@ -979,7 +979,8 @@ mountroot()
touch /run/zfs_unlock_complete
if [ -e /run/zfs_unlock_complete_notify ]; then
read -r < /run/zfs_unlock_complete_notify
# shellcheck disable=SC2034
read -r zfs_unlock_complete_notify < /run/zfs_unlock_complete_notify
fi
# ------------
+1 -1
View File
@@ -8,7 +8,7 @@ This contrib contains community compatibility patches to get Intel QAT working o
These patches are based on the following Intel QAT version:
[1.7.l.4.10.0-00014](https://01.org/sites/default/files/downloads/qat1.7.l.4.10.0-00014.tar.gz)
When using QAT with above kernels versions, the following patches needs to be applied using:
When using QAT with the above kernel versions, the following patches need to be applied using:
patch -p1 < _$PATCH_
_Where $PATCH refers to the path of the patch in question_
@@ -4223,7 +4223,7 @@ class _TempPool(object):
self.getRoot().reset()
return
# On the Buildbot builders this may fail with "pool is busy"
# On the CI builders this may fail with "pool is busy"
# Retry 5 times before raising an error
retry = 0
while True:
+1
View File
@@ -56,6 +56,7 @@ systemdunit_DATA = \
%D%/systemd/system/zfs-import-scan.service \
%D%/systemd/system/zfs-import.target \
%D%/systemd/system/zfs-mount.service \
%D%/systemd/system/zfs-mount@.service \
%D%/systemd/system/zfs-scrub-monthly@.timer \
%D%/systemd/system/zfs-scrub-weekly@.timer \
%D%/systemd/system/zfs-scrub@.service \
+1 -1
View File
@@ -1,5 +1,5 @@
DESCRIPTION
These script were written with the primary intention of being portable and
These scripts were written with the primary intention of being portable and
usable on as many systems as possible.
This is, in practice, usually not possible. But the intention is there.
+26
View File
@@ -0,0 +1,26 @@
[Unit]
Description=Mount ZFS filesystem %I
Documentation=man:zfs(8)
DefaultDependencies=no
After=systemd-udev-settle.service
After=zfs-import.target
After=zfs-mount.service
After=systemd-remount-fs.service
Before=local-fs.target
ConditionPathIsDirectory=/sys/module/zfs
# This merely tells the service manager
# that unmounting everything undoes the
# effect of this service. No extra logic
# is ran as a result of these settings.
Conflicts=umount.target
Before=umount.target
[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=-@initconfdir@/zfs
ExecStart=@sbindir@/zfs mount -R %I
[Install]
WantedBy=zfs.target
+5
View File
@@ -56,4 +56,9 @@ struct opensolaris_utsname {
#define task_io_account_read(n)
#define task_io_account_write(n)
/*
* Check if the current thread is a memory reclaim thread.
*/
extern int current_is_reclaim_thread(void);
#endif /* _OPENSOLARIS_SYS_MISC_H_ */
+3 -1
View File
@@ -45,7 +45,9 @@
#ifdef _KERNEL
#define CPU curcpu
#define minclsyspri PRIBIO
#define defclsyspri minclsyspri
#define defclsyspri minclsyspri
/* Write issue taskq priority. */
#define wtqclsyspri ((PVM + PRIBIO) / 2)
#define maxclsyspri PVM
#define max_ncpus (mp_maxid + 1)
#define boot_max_ncpus (mp_maxid + 1)
+1
View File
@@ -8,6 +8,7 @@ kernel_linux_HEADERS = \
%D%/kernel/linux/mm_compat.h \
%D%/kernel/linux/mod_compat.h \
%D%/kernel/linux/page_compat.h \
%D%/kernel/linux/pagemap_compat.h \
%D%/kernel/linux/simd.h \
%D%/kernel/linux/simd_aarch64.h \
%D%/kernel/linux/simd_arm.h \
@@ -542,24 +542,6 @@ blk_generic_alloc_queue(make_request_fn make_request, int node_id)
}
#endif /* !HAVE_SUBMIT_BIO_IN_BLOCK_DEVICE_OPERATIONS */
/*
* All the io_*() helper functions below can operate on a bio, or a rq, but
* not both. The older submit_bio() codepath will pass a bio, and the
* newer blk-mq codepath will pass a rq.
*/
static inline int
io_data_dir(struct bio *bio, struct request *rq)
{
if (rq != NULL) {
if (op_is_write(req_op(rq))) {
return (WRITE);
} else {
return (READ);
}
}
return (bio_data_dir(bio));
}
static inline int
io_is_flush(struct bio *bio, struct request *rq)
{
@@ -60,32 +60,6 @@
} while (0)
#endif
/*
* 2.6.30 API change,
* The const keyword was added to the 'struct dentry_operations' in
* the dentry structure. To handle this we define an appropriate
* dentry_operations_t typedef which can be used.
*/
typedef const struct dentry_operations dentry_operations_t;
/*
* 2.6.38 API addition,
* Added d_clear_d_op() helper function which clears some flags and the
* registered dentry->d_op table. This is required because d_set_d_op()
* issues a warning when the dentry operations table is already set.
* For the .zfs control directory to work properly we must be able to
* override the default operations table and register custom .d_automount
* and .d_revalidate callbacks.
*/
static inline void
d_clear_d_op(struct dentry *dentry)
{
dentry->d_op = NULL;
dentry->d_flags &= ~(
DCACHE_OP_HASH | DCACHE_OP_COMPARE |
DCACHE_OP_REVALIDATE | DCACHE_OP_DELETE);
}
/*
* Walk and invalidate all dentry aliases of an inode
* unless it's a mountpoint
@@ -0,0 +1,36 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
#ifndef _ZFS_PAGEMAP_COMPAT_H
#define _ZFS_PAGEMAP_COMPAT_H
#include <linux/pagemap.h>
#ifndef HAVE_PAGEMAP_READAHEAD_PAGE
#define readahead_page(ractl) (&(__readahead_folio(ractl)->page))
#endif
#endif
+10 -11
View File
@@ -139,15 +139,6 @@
*/
#if defined(HAVE_KERNEL_FPU_INTERNAL)
/*
* For kernels not exporting *kfpu_{begin,end} we have to use inline assembly
* with the XSAVE{,OPT,S} instructions, so we need the toolchain to support at
* least XSAVE.
*/
#if !defined(HAVE_XSAVE)
#error "Toolchain needs to support the XSAVE assembler instruction"
#endif
#ifndef XFEATURE_MASK_XTILE
/*
* For kernels where this doesn't exist yet, we still don't want to break
@@ -335,9 +326,13 @@ kfpu_begin(void)
return;
}
#endif
#if defined(HAVE_XSAVE)
if (static_cpu_has(X86_FEATURE_XSAVE)) {
kfpu_do_xsave("xsave", state, ~XFEATURE_MASK_XTILE);
} else if (static_cpu_has(X86_FEATURE_FXSR)) {
return;
}
#endif
if (static_cpu_has(X86_FEATURE_FXSR)) {
kfpu_save_fxsr(state);
} else {
kfpu_save_fsave(state);
@@ -390,9 +385,13 @@ kfpu_end(void)
goto out;
}
#endif
#if defined(HAVE_XSAVE)
if (static_cpu_has(X86_FEATURE_XSAVE)) {
kfpu_do_xrstor("xrstor", state, ~XFEATURE_MASK_XTILE);
} else if (static_cpu_has(X86_FEATURE_FXSR)) {
goto out;
}
#endif
if (static_cpu_has(X86_FEATURE_FXSR)) {
kfpu_restore_fxsr(state);
} else {
kfpu_restore_fsave(state);
+6
View File
@@ -24,7 +24,13 @@
#define _OS_LINUX_SPL_MISC_H
#include <linux/kobject.h>
#include <linux/swap.h>
extern void spl_signal_kobj_evt(struct block_device *bdev);
/*
* Check if the current thread is a memory reclaim thread.
*/
extern int current_is_reclaim_thread(void);
#endif
+3 -1
View File
@@ -92,8 +92,10 @@
* Treat shim tasks as SCHED_NORMAL tasks
*/
#define minclsyspri (MAX_PRIO-1)
#define maxclsyspri (MAX_RT_PRIO)
#define defclsyspri (DEFAULT_PRIO)
/* Write issue taskq priority. */
#define wtqclsyspri (MAX_RT_PRIO + 1)
#define maxclsyspri (MAX_RT_PRIO)
#ifndef NICE_TO_PRIO
#define NICE_TO_PRIO(nice) (MAX_RT_PRIO + (nice) + 20)
+1 -6
View File
@@ -59,8 +59,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__field(uint64_t, z_size)
__field(uint64_t, z_pflags)
__field(uint32_t, z_sync_cnt)
__field(uint32_t, z_sync_writes_cnt)
__field(uint32_t, z_async_writes_cnt)
__field(mode_t, z_mode)
__field(boolean_t, z_is_sa)
__field(boolean_t, z_is_ctldir)
@@ -92,8 +90,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_size = zn->z_size;
__entry->z_pflags = zn->z_pflags;
__entry->z_sync_cnt = zn->z_sync_cnt;
__entry->z_sync_writes_cnt = zn->z_sync_writes_cnt;
__entry->z_async_writes_cnt = zn->z_async_writes_cnt;
__entry->z_mode = zn->z_mode;
__entry->z_is_sa = zn->z_is_sa;
__entry->z_is_ctldir = zn->z_is_ctldir;
@@ -117,7 +113,7 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
TP_printk("zn { id %llu unlinked %u atime_dirty %u "
"zn_prefetch %u blksz %u seq %u "
"mapcnt %llu size %llu pflags %llu "
"sync_cnt %u sync_writes_cnt %u async_writes_cnt %u "
"sync_cnt %u "
"mode 0x%x is_sa %d is_ctldir %d "
"inode { uid %u gid %u ino %lu nlink %u size %lli "
"blkbits %u bytes %u mode 0x%x generation %x } } "
@@ -126,7 +122,6 @@ DECLARE_EVENT_CLASS(zfs_ace_class,
__entry->z_zn_prefetch, __entry->z_blksz,
__entry->z_seq, __entry->z_mapcnt, __entry->z_size,
__entry->z_pflags, __entry->z_sync_cnt,
__entry->z_sync_writes_cnt, __entry->z_async_writes_cnt,
__entry->z_mode, __entry->z_is_sa, __entry->z_is_ctldir,
__entry->i_uid, __entry->i_gid, __entry->i_ino, __entry->i_nlink,
__entry->i_size, __entry->i_blkbits,
@@ -157,6 +157,7 @@ struct znode;
extern int zfs_sync(struct super_block *, int, cred_t *);
extern int zfs_inode_alloc(struct super_block *, struct inode **ip);
extern void zfs_inode_free(struct inode *);
extern void zfs_inode_destroy(struct inode *);
extern void zfs_mark_inode_dirty(struct inode *);
extern boolean_t zfs_relatime_need_update(const struct inode *);
+1 -1
View File
@@ -954,7 +954,7 @@ typedef struct arc_sums {
wmsum_t arcstat_data_size;
wmsum_t arcstat_metadata_size;
wmsum_t arcstat_dbuf_size;
wmsum_t arcstat_dnode_size;
aggsum_t arcstat_dnode_size;
wmsum_t arcstat_bonus_size;
wmsum_t arcstat_l2_hits;
wmsum_t arcstat_l2_misses;
+1 -1
View File
@@ -65,7 +65,7 @@ _Static_assert(BRT_RANGESIZE / SPA_MINBLOCKSIZE <= UINT16_MAX,
*/
#define BRT_BLOCKSIZE (32 * 1024)
#define BRT_RANGESIZE_TO_NBLOCKS(size) \
(((size) - 1) / BRT_BLOCKSIZE / sizeof (uint16_t) + 1)
(((size) - 1) / (BRT_BLOCKSIZE / sizeof (uint16_t)) + 1)
#define BRT_LITTLE_ENDIAN 0
#define BRT_BIG_ENDIAN 1
+1
View File
@@ -174,6 +174,7 @@ typedef struct dbuf_dirty_record {
arc_buf_t *dr_data;
override_states_t dr_override_state;
uint8_t dr_copies;
uint8_t dr_gang_copies;
boolean_t dr_nopwrite;
boolean_t dr_brtwrite;
boolean_t dr_diowrite;
+2 -5
View File
@@ -286,14 +286,11 @@ typedef struct {
ddt_log_t *ddt_log_active; /* pointers into ddt_log */
ddt_log_t *ddt_log_flushing; /* swapped when flush starts */
hrtime_t ddt_flush_start; /* log flush start this txg */
uint32_t ddt_flush_pass; /* log flush pass this txg */
int32_t ddt_flush_count; /* entries flushed this txg */
int32_t ddt_flush_min; /* min rem entries to flush */
int32_t ddt_log_ingest_rate; /* rolling log ingest rate */
int32_t ddt_log_flush_rate; /* rolling log flush rate */
int32_t ddt_log_flush_time_rate; /* avg time spent flushing */
uint32_t ddt_log_flush_pressure; /* pressure to apply for cap */
uint32_t ddt_log_flush_prev_backlog; /* prev backlog size */
uint64_t ddt_flush_force_txg; /* flush hard before this txg */
+2 -2
View File
@@ -144,9 +144,9 @@ typedef enum dmu_object_byteswap {
#define DMU_OT_IS_DDT(ot) \
((ot) == DMU_OT_DDT_ZAP)
#define DMU_OT_IS_CRITICAL(ot) \
#define DMU_OT_IS_CRITICAL(ot, level) \
(DMU_OT_IS_METADATA(ot) && \
(ot) != DMU_OT_DNODE && \
((ot) != DMU_OT_DNODE || (level) > 0) && \
(ot) != DMU_OT_DIRECTORY_CONTENTS && \
(ot) != DMU_OT_SA)
+11
View File
@@ -740,6 +740,8 @@ typedef struct zpool_load_policy {
#define ZPOOL_CONFIG_METASLAB_SHIFT "metaslab_shift"
#define ZPOOL_CONFIG_ASHIFT "ashift"
#define ZPOOL_CONFIG_ASIZE "asize"
#define ZPOOL_CONFIG_MIN_ALLOC "min_alloc"
#define ZPOOL_CONFIG_MAX_ALLOC "max_alloc"
#define ZPOOL_CONFIG_DTL "DTL"
#define ZPOOL_CONFIG_SCAN_STATS "scan_stats" /* not stored on disk */
#define ZPOOL_CONFIG_REMOVAL_STATS "removal_stats" /* not stored on disk */
@@ -1614,6 +1616,15 @@ typedef enum zfs_ioc {
#endif
typedef struct zfs_rewrite_args {
uint64_t off;
uint64_t len;
uint64_t flags;
uint64_t arg;
} zfs_rewrite_args_t;
#define ZFS_IOC_REWRITE _IOW(0x83, 3, zfs_rewrite_args_t)
/*
* ZFS-specific error codes used for returning descriptive errors
* to the userland through zfs ioctls.
+2
View File
@@ -568,6 +568,8 @@ typedef struct metaslab_unflushed_phys {
uint64_t msp_unflushed_txg;
} metaslab_unflushed_phys_t;
char *metaslab_rt_name(metaslab_group_t *, metaslab_t *, const char *);
#ifdef __cplusplus
}
#endif
+9
View File
@@ -49,6 +49,9 @@ typedef enum zfs_range_seg_type {
ZFS_RANGE_SEG_NUM_TYPES,
} zfs_range_seg_type_t;
#define ZFS_RT_NAME(rt) (((rt)->rt_name != NULL) ? (rt)->rt_name : "")
#define ZFS_RT_F_DYN_NAME (1ULL << 0) /* if rt_name must be freed */
/*
* Note: the range_tree may not be accessed concurrently; consumers
* must provide external locking if required.
@@ -68,6 +71,9 @@ typedef struct zfs_range_tree {
void *rt_arg;
uint64_t rt_gap; /* allowable inter-segment gap */
uint64_t rt_flags;
const char *rt_name; /* details for debugging */
/*
* The rt_histogram maintains a histogram of ranges. Each bucket,
* rt_histogram[i], contains the number of ranges whose size is:
@@ -281,6 +287,9 @@ zfs_range_tree_t *zfs_range_tree_create_gap(const zfs_range_tree_ops_t *ops,
uint64_t gap);
zfs_range_tree_t *zfs_range_tree_create(const zfs_range_tree_ops_t *ops,
zfs_range_seg_type_t type, void *arg, uint64_t start, uint64_t shift);
zfs_range_tree_t *zfs_range_tree_create_flags(const zfs_range_tree_ops_t *ops,
zfs_range_seg_type_t type, void *arg, uint64_t start, uint64_t shift,
uint64_t flags, const char *name);
void zfs_range_tree_destroy(zfs_range_tree_t *rt);
boolean_t zfs_range_tree_contains(zfs_range_tree_t *rt, uint64_t start,
uint64_t size);
+1
View File
@@ -1055,6 +1055,7 @@ extern pool_state_t spa_state(spa_t *spa);
extern spa_load_state_t spa_load_state(spa_t *spa);
extern uint64_t spa_freeze_txg(spa_t *spa);
extern uint64_t spa_get_worst_case_asize(spa_t *spa, uint64_t lsize);
extern void spa_get_min_alloc_range(spa_t *spa, uint64_t *min, uint64_t *max);
extern uint64_t spa_get_dspace(spa_t *spa);
extern uint64_t spa_get_checkpoint_space(spa_t *spa);
extern uint64_t spa_get_slop_space(spa_t *spa);
+1
View File
@@ -267,6 +267,7 @@ struct spa {
uint64_t spa_min_ashift; /* of vdevs in normal class */
uint64_t spa_max_ashift; /* of vdevs in normal class */
uint64_t spa_min_alloc; /* of vdevs in normal class */
uint64_t spa_max_alloc; /* of vdevs in normal class */
uint64_t spa_gcd_alloc; /* of vdevs in normal class */
uint64_t spa_config_guid; /* config pool guid */
uint64_t spa_load_guid; /* spa_load initialized guid */
+1
View File
@@ -173,6 +173,7 @@ extern void vdev_queue_change_io_priority(zio_t *zio, zio_priority_t priority);
extern uint32_t vdev_queue_length(vdev_t *vd);
extern uint64_t vdev_queue_last_offset(vdev_t *vd);
extern uint64_t vdev_queue_class_length(vdev_t *vq, zio_priority_t p);
extern boolean_t vdev_queue_pool_busy(spa_t *spa);
extern void vdev_config_dirty(vdev_t *vd);
extern void vdev_config_clean(vdev_t *vd);
+1
View File
@@ -651,6 +651,7 @@ uint64_t vdev_best_ashift(uint64_t logical, uint64_t a, uint64_t b);
int param_get_raidz_impl(char *buf, zfs_kernel_param_t *kp);
#endif
int param_set_raidz_impl(ZFS_MODULE_PARAM_ARGS);
char *vdev_rt_name(vdev_t *vd, const char *name);
/*
* Vdev ashift optimization tunables
+8 -1
View File
@@ -236,6 +236,11 @@ typedef pthread_t kthread_t;
#define thread_join(t) pthread_join((pthread_t)(t), NULL)
#define newproc(f, a, cid, pri, ctp, pid) (ENOSYS)
/*
* Check if the current thread is a memory reclaim thread.
* Always returns false in userspace (no memory reclaim thread).
*/
#define current_is_reclaim_thread() (0)
/* in libzpool, p0 exists only to have its address taken */
typedef struct proc {
@@ -623,8 +628,10 @@ extern void delay(clock_t ticks);
* Process priorities as defined by setpriority(2) and getpriority(2).
*/
#define minclsyspri 19
#define maxclsyspri -20
#define defclsyspri 0
/* Write issue taskq priority. */
#define wtqclsyspri -19
#define maxclsyspri -20
#define CPU_SEQID ((uintptr_t)pthread_self() & (max_ncpus - 1))
#define CPU_SEQID_UNSTABLE CPU_SEQID
+1
View File
@@ -60,6 +60,7 @@ extern int zfs_dbgmsg_enable;
#define ZFS_DEBUG_METASLAB_ALLOC (1 << 13)
#define ZFS_DEBUG_BRT (1 << 14)
#define ZFS_DEBUG_RAIDZ_RECONSTRUCT (1 << 15)
#define ZFS_DEBUG_DDT (1 << 16)
extern void __set_error(const char *file, const char *func, int line, int err);
extern void __zfs_dbgmsg(char *buf);
+4 -6
View File
@@ -35,18 +35,16 @@
#include <sys/vfs.h>
#ifdef FS_PROJINHERIT_FL
#define ZFS_PROJINHERIT_FL FS_PROJINHERIT_FL
#else
#define ZFS_PROJINHERIT_FL 0x20000000
#endif
#ifdef FS_IOC_FSGETXATTR
typedef struct fsxattr zfsxattr_t;
#define ZFS_IOC_FSGETXATTR FS_IOC_FSGETXATTR
#define ZFS_IOC_FSSETXATTR FS_IOC_FSSETXATTR
#else
/* Non-Linux OS */
#define FS_PROJINHERIT_FL 0x20000000
#define FS_XFLAG_PROJINHERIT FS_PROJINHERIT_FL
struct zfsxattr {
uint32_t fsx_xflags; /* xflags field value (get/set) */
uint32_t fsx_extsize; /* extsize field value (get/set) */
+1
View File
@@ -40,6 +40,7 @@ extern int zfs_clone_range(znode_t *, uint64_t *, znode_t *, uint64_t *,
uint64_t *, cred_t *);
extern int zfs_clone_range_replay(znode_t *, uint64_t, uint64_t, uint64_t,
const blkptr_t *, size_t);
extern int zfs_rewrite(znode_t *, uint64_t, uint64_t, uint64_t, uint64_t);
extern int zfs_getsecattr(znode_t *, vsecattr_t *, int, cred_t *);
extern int zfs_setsecattr(znode_t *, vsecattr_t *, int, cred_t *);
-2
View File
@@ -201,8 +201,6 @@ typedef struct znode {
uint64_t z_size; /* file size (cached) */
uint64_t z_pflags; /* pflags (cached) */
uint32_t z_sync_cnt; /* synchronous open count */
uint32_t z_sync_writes_cnt; /* synchronous write count */
uint32_t z_async_writes_cnt; /* asynchronous write count */
mode_t z_mode; /* mode (cached) */
kmutex_t z_acl_lock; /* acl data lock */
zfs_acl_t *z_acl_cached; /* cached acl */
+2 -1
View File
@@ -350,6 +350,7 @@ typedef struct zio_prop {
uint8_t zp_complevel;
uint8_t zp_level;
uint8_t zp_copies;
uint8_t zp_gang_copies;
dmu_object_type_t zp_type;
boolean_t zp_dedup;
boolean_t zp_dedup_verify;
@@ -575,7 +576,7 @@ extern zio_t *zio_rewrite(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
zio_priority_t priority, zio_flag_t flags, zbookmark_phys_t *zb);
extern void zio_write_override(zio_t *zio, blkptr_t *bp, int copies,
boolean_t nopwrite, boolean_t brtwrite);
int gang_copies, boolean_t nopwrite, boolean_t brtwrite);
extern void zio_free(spa_t *spa, uint64_t txg, const blkptr_t *bp);
+1
View File
@@ -76,6 +76,7 @@ libspl_sys_HEADERS += \
%D%/os/linux/sys/param.h \
%D%/os/linux/sys/stat.h \
%D%/os/linux/sys/sysmacros.h \
%D%/os/linux/sys/vfs.h \
%D%/os/linux/sys/zfs_context_os.h
libspl_ia32_HEADERS = \
+5
View File
@@ -31,6 +31,11 @@
#include <sys/mount.h> /* for BLKGETSIZE64 */
#ifdef HAVE_STATX
#include <fcntl.h>
#include <linux/stat.h>
#endif
/*
* Emulate Solaris' behavior of returning the block device size in fstat64().
*/
+33
View File
@@ -0,0 +1,33 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License, Version 1.0 only
* (the "License"). You may not use this file except in compliance
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/* Copyright 2025 by Lawrence Livermore National Security, LLC. */
/* This is the Linux userspace version of include/os/linux/spl/sys/vfs.h */
#ifndef _LIBSPL_SYS_VFS_H
#define _LIBSPL_SYS_VFS_H
#include <linux/fs.h>
#include <sys/statfs.h>
#endif
+30 -6
View File
@@ -85,13 +85,21 @@ _sol_getmntent(FILE *fp, struct mnttab *mgetp)
}
static int
getextmntent_impl(FILE *fp, struct extmnttab *mp)
getextmntent_impl(FILE *fp, struct extmnttab *mp, uint64_t *mnt_id)
{
int ret;
struct stat64 st;
*mnt_id = 0;
ret = _sol_getmntent(fp, (struct mnttab *)mp);
if (ret == 0) {
#ifdef HAVE_STATX_MNT_ID
struct statx stx;
if (statx(AT_FDCWD, mp->mnt_mountp,
AT_STATX_SYNC_AS_STAT | AT_SYMLINK_NOFOLLOW,
STATX_MNT_ID, &stx) == 0 && (stx.stx_mask & STATX_MNT_ID))
*mnt_id = stx.stx_mnt_id;
#endif
if (stat64(mp->mnt_mountp, &st) != 0) {
mp->mnt_major = 0;
mp->mnt_minor = 0;
@@ -110,6 +118,12 @@ getextmntent(const char *path, struct extmnttab *entry, struct stat64 *statbuf)
struct stat64 st;
FILE *fp;
int match;
boolean_t have_mnt_id = B_FALSE;
uint64_t target_mnt_id = 0;
uint64_t entry_mnt_id;
#ifdef HAVE_STATX_MNT_ID
struct statx stx;
#endif
if (strlen(path) >= MAXPATHLEN) {
(void) fprintf(stderr, "invalid object; pathname too long\n");
@@ -128,6 +142,13 @@ getextmntent(const char *path, struct extmnttab *entry, struct stat64 *statbuf)
return (-1);
}
#ifdef HAVE_STATX_MNT_ID
if (statx(AT_FDCWD, path, AT_STATX_SYNC_AS_STAT | AT_SYMLINK_NOFOLLOW,
STATX_MNT_ID, &stx) == 0 && (stx.stx_mask & STATX_MNT_ID)) {
have_mnt_id = B_TRUE;
target_mnt_id = stx.stx_mnt_id;
}
#endif
if ((fp = fopen(MNTTAB, "re")) == NULL) {
(void) fprintf(stderr, "cannot open %s\n", MNTTAB);
@@ -139,12 +160,15 @@ getextmntent(const char *path, struct extmnttab *entry, struct stat64 *statbuf)
*/
match = 0;
while (getextmntent_impl(fp, entry) == 0) {
if (makedev(entry->mnt_major, entry->mnt_minor) ==
statbuf->st_dev) {
match = 1;
break;
while (getextmntent_impl(fp, entry, &entry_mnt_id) == 0) {
if (have_mnt_id) {
match = (entry_mnt_id == target_mnt_id);
} else {
match = makedev(entry->mnt_major, entry->mnt_minor) ==
statbuf->st_dev;
}
if (match)
break;
}
(void) fclose(fp);
+1
View File
@@ -50,6 +50,7 @@ dist_man_MANS = \
%D%/man8/zfs-redact.8 \
%D%/man8/zfs-release.8 \
%D%/man8/zfs-rename.8 \
%D%/man8/zfs-rewrite.8 \
%D%/man8/zfs-rollback.8 \
%D%/man8/zfs-send.8 \
%D%/man8/zfs-set.8 \
+58 -38
View File
@@ -1057,27 +1057,6 @@ milliseconds until the operation completes.
.It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
Enable prefetching dedup-ed blocks which are going to be freed.
.
.It Sy zfs_dedup_log_flush_passes_max Ns = Ns Sy 8 Ns Pq uint
Maximum number of dedup log flush passes (iterations) each transaction.
.Pp
At the start of each transaction, OpenZFS will estimate how many entries it
needs to flush out to keep up with the change rate, taking the amount and time
taken to flush on previous txgs into account (see
.Sy zfs_dedup_log_flush_flow_rate_txgs ) .
It will spread this amount into a number of passes.
At each pass, it will use the amount already flushed and the total time taken
by flushing and by other IO to recompute how much it should do for the remainder
of the txg.
.Pp
Reducing the max number of passes will make flushing more aggressive, flushing
out more entries on each pass.
This can be faster, but also more likely to compete with other IO.
Increasing the max number of passes will put fewer entries onto each pass,
keeping the overhead of dedup changes to a minimum but possibly causing a large
number of changes to be dumped on the last pass, which can blow out the txg
sync time beyond
.Sy zfs_txg_timeout .
.
.It Sy zfs_dedup_log_flush_min_time_ms Ns = Ns Sy 1000 Ns Pq uint
Minimum time to spend on dedup log flush each transaction.
.Pp
@@ -1087,22 +1066,58 @@ up to
This occurs even if doing so would delay the transaction, that is, other IO
completes under this time.
.
.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 1000 Ns Pq uint
.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 100 Ns Pq uint
Flush at least this many entries each transaction.
.Pp
OpenZFS will estimate how many entries it needs to flush each transaction to
keep up with the ingest rate (see
.Sy zfs_dedup_log_flush_flow_rate_txgs ) .
This sets the minimum for that estimate.
Raising it can force OpenZFS to flush more aggressively, keeping the log small
and so reducing pool import times, but can make it less able to back off if
log flushing would compete with other IO too much.
OpenZFS will flush a fraction of the log every TXG, to keep the size
proportional to the ingest rate (see
.Sy zfs_dedup_log_flush_txgs ) .
This sets the minimum for that estimate, which prevents the backlog from
completely draining if the ingest rate falls.
Raising it can force OpenZFS to flush more aggressively, reducing the backlog
to zero more quickly, but can make it less able to back off if log
flushing would compete with other IO too much.
.
.It Sy zfs_dedup_log_flush_entries_max Ns = Ns Sy UINT_MAX Ns Pq uint
Flush at most this many entries each transaction.
.Pp
Mostly used for debugging purposes.
.It Sy zfs_dedup_log_flush_txgs Ns = Ns Sy 100 Ns Pq uint
Target number of TXGs to process the whole dedup log.
.Pp
Every TXG, OpenZFS will process the inverse of this number times the size
of the DDT backlog.
This will keep the backlog at a size roughly equal to the ingest rate
times this value.
This offers a balance between a more efficient DDT log, with better
aggregation, and shorter import times, which increase as the size of the
DDT log increases.
Increasing this value will result in a more efficient DDT log, but longer
import times.
.It Sy zfs_dedup_log_cap Ns = Ns Sy UINT_MAX Ns Pq uint
Soft cap for the size of the current dedup log.
.Pp
If the log is larger than this size, we increase the aggressiveness of
the flushing to try to bring it back down to the soft cap.
Setting it will reduce import times, but will reduce the efficiency of
the DDT log, increasing the expected number of IOs required to flush the same
amount of data.
.It Sy zfs_dedup_log_hard_cap Ns = Ns Sy 0 Ns | Ns 1 Pq uint
Whether to treat the log cap as a firm cap or not.
.Pp
When set to 0 (the default), the
.Sy zfs_dedup_log_cap
will increase the maximum number of log entries we flush in a given txg.
This will bring the backlog size down towards the cap, but not at the expense
of making TXG syncs take longer.
If this is set to 1, the cap acts more like a hard cap than a soft cap; it will
also increase the minimum number of log entries we flush per TXG.
Enabling it will reduce worst-case import times, at the cost of increased TXG
sync times.
.It Sy zfs_dedup_log_flush_flow_rate_txgs Ns = Ns Sy 10 Ns Pq uint
Number of transactions to use to compute the flow rate.
.Pp
OpenZFS will estimate how many entries it needs to flush each transaction by
monitoring the number of entries changed (ingest rate), number of entries
OpenZFS will estimate number of entries changed (ingest rate), number of entries
flushed (flush rate) and time spent flushing (flush time rate) and combining
these into an overall "flow rate".
It will use an exponential weighted moving average over some number of recent
@@ -1369,14 +1384,15 @@ If this setting is 0, then even if feature@block_cloning is enabled,
using functions and system calls that attempt to clone blocks will act as
though the feature is disabled.
.
.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int
When set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be
written to disk.
This allows the clone operation to reliably succeed when a file is
.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 1 Ns | Ns 0 Pq int
When set to 1 the FICLONE and FICLONERANGE ioctls will wait for any dirty
data to be written to disk before proceeding.
This ensures that the clone operation reliably succeeds, even if a file is
modified and then immediately cloned.
For small files this may be slower than making a copy of the file.
Therefore, this setting defaults to 0 which causes a clone operation to
immediately fail when encountering a dirty block.
Note that for small files this may be slower than simply copying the file.
When set to 0 the clone operation will immediately fail if it encounters
any dirty blocks.
By default waiting is enabled.
.
.It Sy zfs_blake3_impl Ns = Ns Sy fastest Pq string
Select a BLAKE3 implementation.
@@ -1638,6 +1654,10 @@ _
2048 ZFS_DEBUG_TRIM Verify TRIM ranges are always within the allocatable range tree.
4096 ZFS_DEBUG_LOG_SPACEMAP Verify that the log summary is consistent with the spacemap log
and enable \fBzfs_dbgmsgs\fP for metaslab loading and flushing.
8192 ZFS_DEBUG_METASLAB_ALLOC Enable debugging messages when allocations fail.
16384 ZFS_DEBUG_BRT Enable BRT-related debugging messages.
32768 ZFS_DEBUG_RAIDZ_RECONSTRUCT Enabled debugging messages for raidz reconstruction.
65536 ZFS_DEBUG_DDT Enable DDT-related debugging messages.
.TE
.Sy \& * No Requires debug build .
.
+2 -1
View File
@@ -1596,7 +1596,8 @@ When set to
ZFS stores an extra copy of only critical metadata.
This can improve file create performance since less metadata
needs to be written.
If a single on-disk block is corrupt, at worst a single user file can be lost.
If a single on-disk block is corrupt, multiple user files or directories
can be lost.
.Pp
When set to
.Sy none ,
+76
View File
@@ -0,0 +1,76 @@
.\" SPDX-License-Identifier: CDDL-1.0
.\"
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or https://opensource.org/licenses/CDDL-1.0.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\"
.\" Copyright (c) 2025 iXsystems, Inc.
.\"
.Dd May 6, 2025
.Dt ZFS-REWRITE 8
.Os
.
.Sh NAME
.Nm zfs-rewrite
.Nd rewrite specified files without modification
.Sh SYNOPSIS
.Nm zfs
.Cm rewrite
.Oo Fl rvx Ns Oc
.Op Fl l Ar length
.Op Fl o Ar offset
.Ar file Ns | Ns Ar directory Ns …
.
.Sh DESCRIPTION
Rewrite blocks of specified
.Ar file
as is without modification at a new location and possibly with new
properties, such as checksum, compression, dedup, copies, etc,
as if they were atomically read and written back.
.Bl -tag -width "-r"
.It Fl l Ar length
Rewrite at most this number of bytes.
.It Fl o Ar offset
Start at this offset in bytes.
.It Fl r
Recurse into directories.
.It Fl v
Print names of all successfully rewritten files.
.It Fl x
Don't cross file system mount points when recursing.
.El
.Sh NOTES
Rewrite of cloned blocks and blocks that are part of any snapshots,
same as some property changes may increase pool space usage.
Holes that were never written or were previously zero-compressed are
not rewritten and will remain holes even if compression is disabled.
.Pp
Rewritten blocks will be seen as modified in next snapshot and as such
included into the incremental
.Nm zfs Cm send
stream.
.Pp
If a
.Fl l
or
.Fl o
value request a rewrite to regions past the end of the file, then those
regions are silently ignored, and no error is reported.
.
.Sh SEE ALSO
.Xr zfsprops 7
+7 -1
View File
@@ -37,7 +37,7 @@
.\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc.
.\"
.Dd May 12, 2022
.Dd April 18, 2025
.Dt ZFS 8
.Os
.
@@ -299,6 +299,12 @@ Execute ZFS administrative operations
programmatically via a Lua script-language channel program.
.El
.
.Ss Data rewrite
.Bl -tag -width ""
.It Xr zfs-rewrite 8
Rewrite specified files without modification.
.El
.
.Ss Jails
.Bl -tag -width ""
.It Xr zfs-jail 8
+32 -2
View File
@@ -293,10 +293,9 @@ ZSTD_UPSTREAM_OBJS := \
zfs-objs += $(addprefix zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS))
# Disable aarch64 neon SIMD instructions for kernel mode
$(addprefix $(obj)/zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS)) : ccflags-y += -I$(zstd_include) $(ZFS_ZSTD_FLAGS)
$(addprefix $(obj)/zstd/,$(ZSTD_OBJS) $(ZSTD_UPSTREAM_OBJS)) : asflags-y += -I$(zstd_include)
$(addprefix $(obj)/zstd/,$(ZSTD_UPSTREAM_OBJS)) : ccflags-y += -include $(zstd_include)/aarch64_compat.h -include $(zstd_include)/zstd_compat_wrapper.h -Wp,-w
$(addprefix $(obj)/zstd/,$(ZSTD_UPSTREAM_OBJS)) : ccflags-y += -include $(zstd_include)/zstd_compat_wrapper.h -Wp,-w
$(obj)/zstd/zfs_zstd.o : ccflags-y += -include $(zstd_include)/zstd_compat_wrapper.h
@@ -494,3 +493,34 @@ UBSAN_SANITIZE_zfs/sa.o := n
ifeq ($(CONFIG_ALTIVEC),y)
$(obj)/zfs/vdev_raidz_math_powerpc_altivec.o : c_flags += -maltivec
endif
# The following recipes attempt to fix out of src-tree builds, where $(src) != $(obj), so that the
# subdir %.c/%.S -> %.o targets will work as expected. The in-kernel pattern targets do not seem to
# be working on subdirs since about ~6.10
zobjdirs = $(dir $(zfs-objs)) $(dir $(spl-objs)) \
$(dir $(zfs-$(CONFIG_X86))) $(dir $(zfs-$(CONFIG_UML_X86))) $(dir $(zfs-$(CONFIG_ARM64))) \
$(dir $(zfs-$(CONFIG_PPC64))) $(dir $(zfs-$(CONFIG_PPC)))
z_cdirs = $(sort $(filter-out lua/setjmp/ $(addprefix icp/asm-aarch64/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-x86_64/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-ppc/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-ppc64/, aes/ blake3/ modes/ sha2/), $(zobjdirs)))
z_sdirs = $(sort $(filter lua/setjmp/ $(addprefix icp/asm-aarch64/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-x86_64/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-ppc/, aes/ blake3/ modes/ sha2/) \
$(addprefix icp/asm-ppc64/, aes/ blake3/ modes/ sha2/), $(zobjdirs)))
define ZKMOD_C_O_MAKE_TARGET
$1%.o: $(src)/$1%.c FORCE
$$(call if_changed_rule,cc_o_c)
$$(call cmd,force_checksrc)
endef
define ZKMOD_S_O_MAKE_TARGET
$1%.o: $(src)/$1%.S FORCE
$$(call if_changed_rule,as_o_S)
$$(call cmd,force_checksrc)
endef
$(foreach target,$(z_cdirs), $(eval $(call ZKMOD_C_O_MAKE_TARGET,$(target))))
$(foreach target,$(z_sdirs), $(eval $(call ZKMOD_S_O_MAKE_TARGET,$(target))))
-24
View File
@@ -521,30 +521,6 @@ CFLAGS.zstd_ldm.c= -U__BMI__ -fno-tree-vectorize ${NO_WBITWISE_INSTEAD_OF_LOGICA
CFLAGS.zstd_opt.c= -U__BMI__ -fno-tree-vectorize ${NO_WBITWISE_INSTEAD_OF_LOGICAL}
.if ${MACHINE_ARCH} == "aarch64"
__ZFS_ZSTD_AARCH64_FLAGS= -include ${SRCDIR}/zstd/include/aarch64_compat.h
CFLAGS.zstd.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.entropy_common.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.error_private.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.fse_compress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.fse_decompress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.hist.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.huf_compress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.huf_decompress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.pool.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.xxhash.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_common.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_compress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_compress_literals.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_compress_sequences.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_compress_superblock.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_ddict.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_decompress.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_decompress_block.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_double_fast.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_fast.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_lazy.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_ldm.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
CFLAGS.zstd_opt.c+= ${__ZFS_ZSTD_AARCH64_FLAGS}
sha256-armv8.o: sha256-armv8.S
${CC} -c ${CFLAGS:N-mgeneral-regs-only} ${WERROR} ${.IMPSRC} \
+1
View File
@@ -57,6 +57,7 @@ modules-Linux:
$(if @KERNEL_LD@,LD=@KERNEL_LD@) $(if @KERNEL_LLVM@,LLVM=@KERNEL_LLVM@) \
$(if @KERNEL_CROSS_COMPILE@,CROSS_COMPILE=@KERNEL_CROSS_COMPILE@) \
$(if @KERNEL_ARCH@,ARCH=@KERNEL_ARCH@) \
$(if @OBJTOOL_DISABLE_WERROR@,objtool=@abs_top_builddir@/scripts/objtool-wrapper) \
M="$$PWD" @KERNEL_MAKE@ CONFIG_ZFS=m modules
modules-FreeBSD:
+9
View File
@@ -101,6 +101,15 @@ spl_panic(const char *file, const char *func, int line, const char *fmt, ...)
va_end(ap);
}
/*
* Check if the current thread is a memory reclaim thread.
* Returns true if curproc is pageproc (FreeBSD's page daemon).
*/
int
current_is_reclaim_thread(void)
{
return (curproc == pageproc);
}
SYSINIT(opensolaris_utsname_init, SI_SUB_TUNABLES, SI_ORDER_ANY,
opensolaris_utsname_init, NULL);
+1 -1
View File
@@ -1175,7 +1175,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx)
int count = 0;
zfs_acl_phys_t acl_phys;
if (zp->z_zfsvfs->z_replay == B_FALSE) {
if (ZTOV(zp) != NULL && zp->z_zfsvfs->z_replay == B_FALSE) {
ASSERT_VOP_IN_SEQC(ZTOV(zp));
}
+55 -9
View File
@@ -306,6 +306,18 @@ zfs_ioctl(vnode_t *vp, ulong_t com, intptr_t data, int flag, cred_t *cred,
*(offset_t *)data = off;
return (0);
}
case ZFS_IOC_REWRITE: {
zfs_rewrite_args_t *args = (zfs_rewrite_args_t *)data;
if ((flag & FWRITE) == 0)
return (SET_ERROR(EBADF));
error = vn_lock(vp, LK_SHARED);
if (error)
return (error);
error = zfs_rewrite(VTOZ(vp), args->off, args->len,
args->flags, args->arg);
VOP_UNLOCK(vp);
return (error);
}
}
return (SET_ERROR(ENOTTY));
}
@@ -4228,17 +4240,46 @@ zfs_putpages(struct vnode *vp, vm_page_t *ma, size_t len, int flags,
err = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
ASSERT0(err);
putpage_commit_arg_t *pca = kmem_alloc(
offsetof(putpage_commit_arg_t, pca_pages[ncount]),
KM_SLEEP);
pca->pca_npages = ncount;
memcpy(pca->pca_pages, ma, sizeof (vm_page_t) * ncount);
if (commit) {
/*
* Caller requested that we commit immediately. We set
* a callback on the log entry, to be called once its
* on disk after the call to zil_commit() below. The
* pages will be undirtied and unbusied there.
*/
putpage_commit_arg_t *pca = kmem_alloc(
offsetof(putpage_commit_arg_t, pca_pages[ncount]),
KM_SLEEP);
pca->pca_npages = ncount;
memcpy(pca->pca_pages, ma, sizeof (vm_page_t) * ncount);
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp,
off, len, commit, B_FALSE, zfs_putpage_commit_cb, pca);
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off, len,
B_TRUE, B_FALSE, zfs_putpage_commit_cb, pca);
for (i = 0; i < ncount; i++)
rtvals[i] = zfs_vm_pagerret_pend;
for (i = 0; i < ncount; i++)
rtvals[i] = zfs_vm_pagerret_pend;
} else {
/*
* Caller just wants the page written back somewhere,
* but doesn't need it committed yet. We've already
* written it back to the DMU, so we just need to put
* it on the async log, then undirty the page and
* return.
*
* We cannot use a callback here, because it would keep
* the page busy (locked) until it is eventually
* written down at txg sync.
*/
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off, len,
B_FALSE, B_FALSE, NULL, NULL);
zfs_vmobject_wlock(object);
for (i = 0; i < ncount; i++) {
rtvals[i] = zfs_vm_pagerret_ok;
vm_page_undirty(ma[i]);
}
zfs_vmobject_wunlock(object);
}
VM_CNT_INC(v_vnodeout);
VM_CNT_ADD(v_vnodepgsout, ncount);
@@ -5201,6 +5242,11 @@ zfs_freebsd_pathconf(struct vop_pathconf_args *ap)
return (0);
}
return (EINVAL);
#ifdef _PC_HAS_HIDDENSYSTEM
case _PC_HAS_HIDDENSYSTEM:
*ap->a_retval = 1;
return (0);
#endif
default:
return (vop_stdpathconf(ap));
}
+7 -8
View File
@@ -150,8 +150,6 @@ zfs_znode_cache_constructor(void *buf, void *arg, int kmflags)
zp->z_xattr_cached = NULL;
zp->z_xattr_parent = 0;
zp->z_vnode = NULL;
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
return (0);
}
@@ -172,9 +170,6 @@ zfs_znode_cache_destructor(void *buf, void *arg)
ASSERT3P(zp->z_acl_cached, ==, NULL);
ASSERT3P(zp->z_xattr_cached, ==, NULL);
ASSERT0(atomic_load_32(&zp->z_sync_writes_cnt));
ASSERT0(atomic_load_32(&zp->z_async_writes_cnt));
}
@@ -293,6 +288,7 @@ zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx)
sharezp->z_atime_dirty = 0;
sharezp->z_zfsvfs = zfsvfs;
sharezp->z_is_sa = zfsvfs->z_use_sa;
sharezp->z_pflags = 0;
VERIFY0(zfs_acl_ids_create(sharezp, IS_ROOT_NODE, &vattr,
kcred, NULL, &acl_ids, NULL));
@@ -455,8 +451,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
zp->z_blksz = blksz;
zp->z_seq = 0x7A4653;
zp->z_sync_cnt = 0;
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
atomic_store_ptr(&zp->z_cached_symlink, NULL);
zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
@@ -800,6 +794,10 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr,
(*zpp)->z_mode = mode;
(*zpp)->z_dnodesize = dnodesize;
vnode_t *vp = ZTOV(*zpp);
if (!(flag & IS_ROOT_NODE))
vn_seqc_write_begin(vp);
if (vap->va_mask & AT_XVATTR)
zfs_xvattr_set(*zpp, (xvattr_t *)vap, tx);
@@ -808,7 +806,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr,
VERIFY0(zfs_aclset_common(*zpp, acl_ids->z_aclp, cr, tx));
}
if (!(flag & IS_ROOT_NODE)) {
vnode_t *vp = ZTOV(*zpp);
vn_seqc_write_end(vp);
vp->v_vflag |= VV_FORCEINSMQ;
int err = insmntque(vp, zfsvfs->z_vfs);
vp->v_vflag &= ~VV_FORCEINSMQ;
@@ -1729,6 +1727,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *zplprops, dmu_tx_t *tx)
rootzp->z_unlinked = 0;
rootzp->z_atime_dirty = 0;
rootzp->z_is_sa = USE_SA(version, os);
rootzp->z_pflags = 0;
zfsvfs->z_os = os;
zfsvfs->z_parent = zfsvfs;
+12
View File
@@ -28,6 +28,7 @@
#include <sys/kmem.h>
#include <sys/tsd.h>
#include <sys/string.h>
#include <sys/misc.h>
/*
* Thread interfaces
@@ -197,3 +198,14 @@ issig(void)
}
EXPORT_SYMBOL(issig);
/*
* Check if the current thread is a memory reclaim thread.
* Returns true if current thread is kswapd.
*/
int
current_is_reclaim_thread(void)
{
return (current_is_kswapd());
}
EXPORT_SYMBOL(current_is_reclaim_thread);
-2
View File
@@ -511,8 +511,6 @@ zfsctl_inode_alloc(zfsvfs_t *zfsvfs, uint64_t id,
zp->z_pflags = 0;
zp->z_mode = 0;
zp->z_sync_cnt = 0;
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
ip->i_generation = 0;
ip->i_ino = id;
ip->i_mode = (S_IFDIR | S_IRWXUGO);
+65
View File
@@ -1176,6 +1176,63 @@ zfs_root(zfsvfs_t *zfsvfs, struct inode **ipp)
return (error);
}
/*
* Dentry and inode caches referenced by a task in non-root memcg are
* not going to be scanned by the kernel-provided shrinker. So, if
* kernel prunes nothing, fall back to this manual walk to free dnodes.
* To avoid scanning the same znodes multiple times they are always rotated
* to the end of the z_all_znodes list. New znodes are inserted at the
* end of the list so we're always scanning the oldest znodes first.
*/
static int
zfs_prune_aliases(zfsvfs_t *zfsvfs, unsigned long nr_to_scan)
{
znode_t **zp_array, *zp;
int max_array = MIN(nr_to_scan, PAGE_SIZE * 8 / sizeof (znode_t *));
int objects = 0;
int i = 0, j = 0;
zp_array = vmem_zalloc(max_array * sizeof (znode_t *), KM_SLEEP);
mutex_enter(&zfsvfs->z_znodes_lock);
while ((zp = list_head(&zfsvfs->z_all_znodes)) != NULL) {
if ((i++ > nr_to_scan) || (j >= max_array))
break;
ASSERT(list_link_active(&zp->z_link_node));
list_remove(&zfsvfs->z_all_znodes, zp);
list_insert_tail(&zfsvfs->z_all_znodes, zp);
/* Skip active znodes and .zfs entries */
if (MUTEX_HELD(&zp->z_lock) || zp->z_is_ctldir)
continue;
if (igrab(ZTOI(zp)) == NULL)
continue;
zp_array[j] = zp;
j++;
}
mutex_exit(&zfsvfs->z_znodes_lock);
for (i = 0; i < j; i++) {
zp = zp_array[i];
ASSERT3P(zp, !=, NULL);
d_prune_aliases(ZTOI(zp));
if (atomic_read(&ZTOI(zp)->i_count) == 1)
objects++;
zrele(zp);
}
vmem_free(zp_array, max_array * sizeof (znode_t *));
return (objects);
}
/*
* The ARC has requested that the filesystem drop entries from the dentry
* and inode caches. This can occur when the ARC needs to free meta data
@@ -1227,6 +1284,14 @@ zfs_prune(struct super_block *sb, unsigned long nr_to_scan, int *objects)
*objects = (*shrinker->scan_objects)(shrinker, &sc);
#endif
/*
* Fall back to zfs_prune_aliases if kernel's shrinker did nothing
* due to dentry and inode caches being referenced by a task running
* in non-root memcg.
*/
if (*objects == 0)
*objects = zfs_prune_aliases(zfsvfs, nr_to_scan);
zfs_exit(zfsvfs, FTAG);
dprintf_ds(zfsvfs->z_os->os_dsl_dataset,
+51 -48
View File
@@ -25,6 +25,7 @@
* Copyright (c) 2012, 2018 by Delphix. All rights reserved.
* Copyright (c) 2015 by Chunwei Chen. All rights reserved.
* Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2025, Klara, Inc.
*/
/* Portions Copyright 2007 Jeremy Teo */
@@ -3682,7 +3683,7 @@ top:
}
static void
zfs_putpage_sync_commit_cb(void *arg)
zfs_putpage_commit_cb(void *arg)
{
struct page *pp = arg;
@@ -3690,17 +3691,6 @@ zfs_putpage_sync_commit_cb(void *arg)
end_page_writeback(pp);
}
static void
zfs_putpage_async_commit_cb(void *arg)
{
struct page *pp = arg;
znode_t *zp = ITOZ(pp->mapping->host);
ClearPageError(pp);
end_page_writeback(pp);
atomic_dec_32(&zp->z_async_writes_cnt);
}
/*
* Push a page out to disk, once the page is on stable storage the
* registered commit callback will be run as notification of completion.
@@ -3818,15 +3808,6 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
zfs_rangelock_exit(lr);
if (wbc->sync_mode != WB_SYNC_NONE) {
/*
* Speed up any non-sync page writebacks since
* they may take several seconds to complete.
* Refer to the comment in zpl_fsync() for details.
*/
if (atomic_load_32(&zp->z_async_writes_cnt) > 0) {
zil_commit(zfsvfs->z_log, zp->z_id);
}
if (PageWriteback(pp))
#ifdef HAVE_PAGEMAP_FOLIO_WAIT_BIT
folio_wait_bit(page_folio(pp), PG_writeback);
@@ -3852,8 +3833,6 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
* was in fact not skipped and should not be counted as if it were.
*/
wbc->pages_skipped--;
if (!for_sync)
atomic_inc_32(&zp->z_async_writes_cnt);
set_page_writeback(pp);
unlock_page(pp);
@@ -3872,8 +3851,6 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
#endif
ClearPageError(pp);
end_page_writeback(pp);
if (!for_sync)
atomic_dec_32(&zp->z_async_writes_cnt);
zfs_rangelock_exit(lr);
zfs_exit(zfsvfs, FTAG);
return (err);
@@ -3899,35 +3876,61 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
err = sa_bulk_update(zp->z_sa_hdl, bulk, cnt, tx);
boolean_t commit = B_FALSE;
if (wbc->sync_mode != WB_SYNC_NONE) {
/*
* Note that this is rarely called under writepages(), because
* writepages() normally handles the entire commit for
* performance reasons.
*/
commit = B_TRUE;
} else if (!for_sync && atomic_load_32(&zp->z_sync_writes_cnt) > 0) {
/*
* If the caller does not intend to wait synchronously
* for this page writeback to complete and there are active
* synchronous calls on this file, do a commit so that
* the latter don't accidentally end up waiting for
* our writeback to complete. Refer to the comment in
* zpl_fsync() (when HAVE_FSYNC_RANGE is defined) for details.
*/
commit = B_TRUE;
}
/*
* A note about for_sync vs wbc->sync_mode.
*
* for_sync indicates that this is a syncing writeback, that is, kernel
* caller expects the data to be durably stored before being notified.
* Often, but not always, the call was triggered by a userspace syncing
* op (eg fsync(), msync(MS_SYNC)). For our purposes, for_sync==TRUE
* means that that page should remain "locked" (in the writeback state)
* until it is definitely on disk (ie zil_commit() or spa_sync()).
* Otherwise, we can unlock and return as soon as it is on the
* in-memory ZIL.
*
* wbc->sync_mode has similar meaning. wbc is passed from the kernel to
* zpl_writepages()/zpl_writepage(); wbc->sync_mode==WB_SYNC_NONE
* indicates this a regular async writeback (eg a cache eviction) and
* so does not need a durability guarantee, while WB_SYNC_ALL indicates
* a syncing op that must be waited on (by convention, we test for
* !WB_SYNC_NONE rather than WB_SYNC_ALL, to prefer durability over
* performance should there ever be a new mode that we have not yet
* added support for).
*
* So, why a separate for_sync field? This is because zpl_writepages()
* calls zfs_putpage() multiple times for a single "logical" operation.
* It wants all the individual pages to be for_sync==TRUE ie only
* unlocked once durably stored, but it only wants one call to
* zil_commit() at the very end, once all the pages are synced. So,
* it repurposes sync_mode slightly to indicate who issue and wait for
* the IO: for NONE, the caller to zfs_putpage() will do it, while for
* ALL, zfs_putpage should do it.
*
* Summary:
* for_sync: 0=unlock immediately; 1 unlock once on disk
* sync_mode: NONE=caller will commit; ALL=we will commit
*/
boolean_t need_commit = (wbc->sync_mode != WB_SYNC_NONE);
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, pgoff, pglen, commit,
B_FALSE, for_sync ? zfs_putpage_sync_commit_cb :
zfs_putpage_async_commit_cb, pp);
/*
* We use for_sync as the "commit" arg to zfs_log_write() (arg 7)
* because it is a policy flag that indicates "someone will call
* zil_commit() soon". for_sync=TRUE means exactly that; the only
* question is whether it will be us, or zpl_writepages().
*/
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, pgoff, pglen, for_sync,
B_FALSE, for_sync ? zfs_putpage_commit_cb : NULL, pp);
if (!for_sync) {
ClearPageError(pp);
end_page_writeback(pp);
}
dmu_tx_commit(tx);
zfs_rangelock_exit(lr);
if (commit)
if (need_commit)
zil_commit(zfsvfs->z_log, zp->z_id);
dataset_kstats_update_write_kstats(&zfsvfs->z_kstat, pglen);
+15 -9
View File
@@ -126,8 +126,6 @@ zfs_znode_cache_constructor(void *buf, void *arg, int kmflags)
zp->z_acl_cached = NULL;
zp->z_xattr_cached = NULL;
zp->z_xattr_parent = 0;
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
return (0);
}
@@ -149,9 +147,6 @@ zfs_znode_cache_destructor(void *buf, void *arg)
ASSERT3P(zp->z_dirlocks, ==, NULL);
ASSERT3P(zp->z_acl_cached, ==, NULL);
ASSERT3P(zp->z_xattr_cached, ==, NULL);
ASSERT0(atomic_load_32(&zp->z_sync_writes_cnt));
ASSERT0(atomic_load_32(&zp->z_async_writes_cnt));
}
static int
@@ -371,6 +366,12 @@ zfs_inode_alloc(struct super_block *sb, struct inode **ip)
return (0);
}
void
zfs_inode_free(struct inode *ip)
{
kmem_cache_free(znode_cache, ITOZ(ip));
}
/*
* Called in multiple places when an inode should be destroyed.
*/
@@ -395,8 +396,15 @@ zfs_inode_destroy(struct inode *ip)
nvlist_free(zp->z_xattr_cached);
zp->z_xattr_cached = NULL;
}
kmem_cache_free(znode_cache, zp);
#ifndef HAVE_SOPS_FREE_INODE
/*
* inode needs to be freed in RCU callback. If we have
* super_operations->free_inode, Linux kernel will do call_rcu
* for us. But if we don't have it, since call_rcu is GPL-only
* symbol, we can only free synchronously and accept the risk.
*/
zfs_inode_free(ip);
#endif
}
static void
@@ -535,8 +543,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
zp->z_blksz = blksz;
zp->z_seq = 0x7A4653;
zp->z_sync_cnt = 0;
zp->z_sync_writes_cnt = 0;
zp->z_async_writes_cnt = 0;
zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
+48 -7
View File
@@ -202,7 +202,7 @@ zpl_snapdir_revalidate(struct dentry *dentry, unsigned int flags)
return (!!dentry->d_inode);
}
static dentry_operations_t zpl_dops_snapdirs = {
static const struct dentry_operations zpl_dops_snapdirs = {
/*
* Auto mounting of snapshots is only supported for 2.6.37 and
* newer kernels. Prior to this kernel the ops->follow_link()
@@ -215,6 +215,51 @@ static dentry_operations_t zpl_dops_snapdirs = {
.d_revalidate = zpl_snapdir_revalidate,
};
/*
* For the .zfs control directory to work properly we must be able to override
* the default operations table and register custom .d_automount and
* .d_revalidate callbacks.
*/
static void
set_snapdir_dentry_ops(struct dentry *dentry, unsigned int extraflags) {
static const unsigned int op_flags =
DCACHE_OP_HASH | DCACHE_OP_COMPARE |
DCACHE_OP_REVALIDATE | DCACHE_OP_DELETE |
DCACHE_OP_PRUNE | DCACHE_OP_WEAK_REVALIDATE | DCACHE_OP_REAL;
#ifdef HAVE_D_SET_D_OP
/*
* d_set_d_op() will set the DCACHE_OP_ flags according to what it
* finds in the passed dentry_operations, so we don't have to.
*
* We clear the flags and the old op table before calling d_set_d_op()
* because issues a warning when the dentry operations table is already
* set.
*/
dentry->d_op = NULL;
dentry->d_flags &= ~op_flags;
d_set_d_op(dentry, &zpl_dops_snapdirs);
dentry->d_flags |= extraflags;
#else
/*
* Since 6.17 there's no exported way to modify dentry ops, so we have
* to reach in and do it ourselves. This should be safe for our very
* narrow use case, which is to create or splice in an entry to give
* access to a snapshot.
*
* We need to set the op flags directly. We hardcode
* DCACHE_OP_REVALIDATE because that's the only operation we have; if
* we ever extend zpl_dops_snapdirs we will need to update the op flags
* to match.
*/
spin_lock(&dentry->d_lock);
dentry->d_op = &zpl_dops_snapdirs;
dentry->d_flags &= ~op_flags;
dentry->d_flags |= DCACHE_OP_REVALIDATE | extraflags;
spin_unlock(&dentry->d_lock);
#endif
}
static struct dentry *
zpl_snapdir_lookup(struct inode *dip, struct dentry *dentry,
unsigned int flags)
@@ -236,10 +281,7 @@ zpl_snapdir_lookup(struct inode *dip, struct dentry *dentry,
return (ERR_PTR(error));
ASSERT(error == 0 || ip == NULL);
d_clear_d_op(dentry);
d_set_d_op(dentry, &zpl_dops_snapdirs);
dentry->d_flags |= DCACHE_NEED_AUTOMOUNT;
set_snapdir_dentry_ops(dentry, DCACHE_NEED_AUTOMOUNT);
return (d_splice_alias(ip, dentry));
}
@@ -373,8 +415,7 @@ zpl_snapdir_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
error = -zfsctl_snapdir_mkdir(dip, dname(dentry), vap, &ip, cr, 0);
if (error == 0) {
d_clear_d_op(dentry);
d_set_d_op(dentry, &zpl_dops_snapdirs);
set_snapdir_dentry_ops(dentry, 0);
d_instantiate(dentry, ip);
}
+105 -61
View File
@@ -36,10 +36,7 @@
#include <sys/zfs_vfsops.h>
#include <sys/zfs_vnops.h>
#include <sys/zfs_project.h>
#if defined(HAVE_VFS_SET_PAGE_DIRTY_NOBUFFERS) || \
defined(HAVE_VFS_FILEMAP_DIRTY_FOLIO)
#include <linux/pagemap.h>
#endif
#include <linux/pagemap_compat.h>
#include <linux/fadvise.h>
#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO
#include <linux/writeback.h>
@@ -114,52 +111,11 @@ zpl_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
{
struct inode *inode = filp->f_mapping->host;
znode_t *zp = ITOZ(inode);
zfsvfs_t *zfsvfs = ITOZSB(inode);
cred_t *cr = CRED();
int error;
fstrans_cookie_t cookie;
/*
* The variables z_sync_writes_cnt and z_async_writes_cnt work in
* tandem so that sync writes can detect if there are any non-sync
* writes going on and vice-versa. The "vice-versa" part to this logic
* is located in zfs_putpage() where non-sync writes check if there are
* any ongoing sync writes. If any sync and non-sync writes overlap,
* we do a commit to complete the non-sync writes since the latter can
* potentially take several seconds to complete and thus block sync
* writes in the upcoming call to filemap_write_and_wait_range().
*/
atomic_inc_32(&zp->z_sync_writes_cnt);
/*
* If the following check does not detect an overlapping non-sync write
* (say because it's just about to start), then it is guaranteed that
* the non-sync write will detect this sync write. This is because we
* always increment z_sync_writes_cnt / z_async_writes_cnt before doing
* the check on z_async_writes_cnt / z_sync_writes_cnt here and in
* zfs_putpage() respectively.
*/
if (atomic_load_32(&zp->z_async_writes_cnt) > 0) {
if ((error = zpl_enter(zfsvfs, FTAG)) != 0) {
atomic_dec_32(&zp->z_sync_writes_cnt);
return (error);
}
zil_commit(zfsvfs->z_log, zp->z_id);
zpl_exit(zfsvfs, FTAG);
}
error = filemap_write_and_wait_range(inode->i_mapping, start, end);
/*
* The sync write is not complete yet but we decrement
* z_sync_writes_cnt since zfs_fsync() increments and decrements
* it internally. If a non-sync write starts just after the decrement
* operation but before we call zfs_fsync(), it may not detect this
* overlapping sync write but it does not matter since we have already
* gone past filemap_write_and_wait_range() and we won't block due to
* the non-sync write.
*/
atomic_dec_32(&zp->z_sync_writes_cnt);
if (error)
return (error);
@@ -555,6 +511,7 @@ zpl_writepages(struct address_space *mapping, struct writeback_control *wbc)
return (result);
}
#ifdef HAVE_VFS_WRITEPAGE
/*
* Write out dirty pages to the ARC, this function is only required to
* support mmap(2). Mapped pages may be dirtied by memory operations
@@ -571,6 +528,7 @@ zpl_writepage(struct page *pp, struct writeback_control *wbc)
return (zpl_putpage(pp, wbc, &for_sync));
}
#endif
/*
* The flag combination which matches the behavior of zfs_space() is
@@ -726,28 +684,44 @@ zpl_fadvise(struct file *filp, loff_t offset, loff_t len, int advice)
return (error);
}
#define ZFS_FL_USER_VISIBLE (FS_FL_USER_VISIBLE | ZFS_PROJINHERIT_FL)
#define ZFS_FL_USER_MODIFIABLE (FS_FL_USER_MODIFIABLE | ZFS_PROJINHERIT_FL)
#define ZFS_FL_USER_VISIBLE (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL)
#define ZFS_FL_USER_MODIFIABLE (FS_FL_USER_MODIFIABLE | FS_PROJINHERIT_FL)
static struct {
uint64_t zfs_flag;
uint32_t fs_flag;
uint32_t xflag;
} flags_lookup[] = {
{ZFS_IMMUTABLE, FS_IMMUTABLE_FL, FS_XFLAG_IMMUTABLE},
{ZFS_APPENDONLY, FS_APPEND_FL, FS_XFLAG_APPEND},
{ZFS_NODUMP, FS_NODUMP_FL, FS_XFLAG_NODUMP},
{ZFS_PROJINHERIT, FS_PROJINHERIT_FL, FS_XFLAG_PROJINHERIT}
};
static uint32_t
__zpl_ioctl_getflags(struct inode *ip)
{
uint64_t zfs_flags = ITOZ(ip)->z_pflags;
uint32_t ioctl_flags = 0;
for (int i = 0; i < ARRAY_SIZE(flags_lookup); i++)
if (zfs_flags & flags_lookup[i].zfs_flag)
ioctl_flags |= flags_lookup[i].fs_flag;
if (zfs_flags & ZFS_IMMUTABLE)
ioctl_flags |= FS_IMMUTABLE_FL;
return (ioctl_flags);
}
if (zfs_flags & ZFS_APPENDONLY)
ioctl_flags |= FS_APPEND_FL;
static uint32_t
__zpl_ioctl_getxflags(struct inode *ip)
{
uint64_t zfs_flags = ITOZ(ip)->z_pflags;
uint32_t ioctl_flags = 0;
if (zfs_flags & ZFS_NODUMP)
ioctl_flags |= FS_NODUMP_FL;
for (int i = 0; i < ARRAY_SIZE(flags_lookup); i++)
if (zfs_flags & flags_lookup[i].zfs_flag)
ioctl_flags |= flags_lookup[i].xflag;
if (zfs_flags & ZFS_PROJINHERIT)
ioctl_flags |= ZFS_PROJINHERIT_FL;
return (ioctl_flags & ZFS_FL_USER_VISIBLE);
return (ioctl_flags);
}
/*
@@ -761,6 +735,7 @@ zpl_ioctl_getflags(struct file *filp, void __user *arg)
int err;
flags = __zpl_ioctl_getflags(file_inode(filp));
flags = flags & ZFS_FL_USER_VISIBLE;
err = copy_to_user(arg, &flags, sizeof (flags));
return (err);
@@ -784,7 +759,7 @@ __zpl_ioctl_setflags(struct inode *ip, uint32_t ioctl_flags, xvattr_t *xva)
xoptattr_t *xoap;
if (ioctl_flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL |
ZFS_PROJINHERIT_FL))
FS_PROJINHERIT_FL))
return (-EOPNOTSUPP);
if (ioctl_flags & ~ZFS_FL_USER_MODIFIABLE)
@@ -815,7 +790,51 @@ __zpl_ioctl_setflags(struct inode *ip, uint32_t ioctl_flags, xvattr_t *xva)
xoap->xoa_appendonly);
FLAG_CHANGE(FS_NODUMP_FL, ZFS_NODUMP, XAT_NODUMP,
xoap->xoa_nodump);
FLAG_CHANGE(ZFS_PROJINHERIT_FL, ZFS_PROJINHERIT, XAT_PROJINHERIT,
FLAG_CHANGE(FS_PROJINHERIT_FL, ZFS_PROJINHERIT, XAT_PROJINHERIT,
xoap->xoa_projinherit);
#undef FLAG_CHANGE
return (0);
}
static int
__zpl_ioctl_setxflags(struct inode *ip, uint32_t ioctl_flags, xvattr_t *xva)
{
uint64_t zfs_flags = ITOZ(ip)->z_pflags;
xoptattr_t *xoap;
if (ioctl_flags & ~(FS_XFLAG_IMMUTABLE | FS_XFLAG_APPEND |
FS_XFLAG_NODUMP | FS_XFLAG_PROJINHERIT))
return (-EOPNOTSUPP);
if ((fchange(ioctl_flags, zfs_flags, FS_XFLAG_IMMUTABLE,
ZFS_IMMUTABLE) ||
fchange(ioctl_flags, zfs_flags, FS_XFLAG_APPEND, ZFS_APPENDONLY)) &&
!capable(CAP_LINUX_IMMUTABLE))
return (-EPERM);
if (!zpl_inode_owner_or_capable(zfs_init_idmap, ip))
return (-EACCES);
xva_init(xva);
xoap = xva_getxoptattr(xva);
#define FLAG_CHANGE(iflag, zflag, xflag, xfield) do { \
if (((ioctl_flags & (iflag)) && !(zfs_flags & (zflag))) || \
((zfs_flags & (zflag)) && !(ioctl_flags & (iflag)))) { \
XVA_SET_REQ(xva, (xflag)); \
(xfield) = ((ioctl_flags & (iflag)) != 0); \
} \
} while (0)
FLAG_CHANGE(FS_XFLAG_IMMUTABLE, ZFS_IMMUTABLE, XAT_IMMUTABLE,
xoap->xoa_immutable);
FLAG_CHANGE(FS_XFLAG_APPEND, ZFS_APPENDONLY, XAT_APPENDONLY,
xoap->xoa_appendonly);
FLAG_CHANGE(FS_XFLAG_NODUMP, ZFS_NODUMP, XAT_NODUMP,
xoap->xoa_nodump);
FLAG_CHANGE(FS_XFLAG_PROJINHERIT, ZFS_PROJINHERIT, XAT_PROJINHERIT,
xoap->xoa_projinherit);
#undef FLAG_CHANGE
@@ -856,7 +875,7 @@ zpl_ioctl_getxattr(struct file *filp, void __user *arg)
struct inode *ip = file_inode(filp);
int err;
fsx.fsx_xflags = __zpl_ioctl_getflags(ip);
fsx.fsx_xflags = __zpl_ioctl_getxflags(ip);
fsx.fsx_projid = ITOZ(ip)->z_projid;
err = copy_to_user(arg, &fsx, sizeof (fsx));
@@ -880,7 +899,7 @@ zpl_ioctl_setxattr(struct file *filp, void __user *arg)
if (!zpl_is_valid_projid(fsx.fsx_projid))
return (-EINVAL);
err = __zpl_ioctl_setflags(ip, fsx.fsx_xflags, &xva);
err = __zpl_ioctl_setxflags(ip, fsx.fsx_xflags, &xva);
if (err)
return (err);
@@ -985,6 +1004,27 @@ zpl_ioctl_setdosflags(struct file *filp, void __user *arg)
return (err);
}
static int
zpl_ioctl_rewrite(struct file *filp, void __user *arg)
{
struct inode *ip = file_inode(filp);
zfs_rewrite_args_t args;
fstrans_cookie_t cookie;
int err;
if (copy_from_user(&args, arg, sizeof (args)))
return (-EFAULT);
if (unlikely(!(filp->f_mode & FMODE_WRITE)))
return (-EBADF);
cookie = spl_fstrans_mark();
err = -zfs_rewrite(ITOZ(ip), args.off, args.len, args.flags, args.arg);
spl_fstrans_unmark(cookie);
return (err);
}
static long
zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
@@ -1003,6 +1043,8 @@ zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return (zpl_ioctl_getdosflags(filp, (void *)arg));
case ZFS_IOC_SETDOSFLAGS:
return (zpl_ioctl_setdosflags(filp, (void *)arg));
case ZFS_IOC_REWRITE:
return (zpl_ioctl_rewrite(filp, (void *)arg));
default:
return (-ENOTTY);
}
@@ -1040,7 +1082,9 @@ const struct address_space_operations zpl_address_space_operations = {
#else
.readpage = zpl_readpage,
#endif
#ifdef HAVE_VFS_WRITEPAGE
.writepage = zpl_writepage,
#endif
.writepages = zpl_writepages,
.direct_IO = zpl_direct_IO,
#ifdef HAVE_VFS_SET_PAGE_DIRTY_NOBUFFERS
+12
View File
@@ -45,6 +45,15 @@ zpl_inode_alloc(struct super_block *sb)
return (ip);
}
#ifdef HAVE_SOPS_FREE_INODE
static void
zpl_inode_free(struct inode *ip)
{
ASSERT(atomic_read(&ip->i_count) == 0);
zfs_inode_free(ip);
}
#endif
static void
zpl_inode_destroy(struct inode *ip)
{
@@ -455,6 +464,9 @@ zpl_prune_sb(uint64_t nr_to_scan, void *arg)
const struct super_operations zpl_super_operations = {
.alloc_inode = zpl_inode_alloc,
#ifdef HAVE_SOPS_FREE_INODE
.free_inode = zpl_inode_free,
#endif
.destroy_inode = zpl_inode_destroy,
.dirty_inode = zpl_dirty_inode,
.write_inode = NULL,

Some files were not shown because too many files have changed in this diff Show More