Compare commits

...

1060 Commits

Author SHA1 Message Date
Tony Hutter
b44a3ecf4a
zpool: Change zpool offline spares policy
The zpool offline man page says that you cannot use 'zpool offline'
on spares.  However, testing found that you could in fact force fault
(zpool offline -f) spares.

Change the policy to:
1. You can never force-fault or offline dRAID spares.
2. You can only force-fault or offline traditional spares if they're
   active.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18282
2026-03-25 11:08:55 -07:00
Alek P
931deb290c
Prevent range tree corruption race by updating dnode_sync()
Switch to incremental range tree processing in dnode_sync() to avoid
unsafe lock dropping during zfs_range_tree_walk(). This also ensures
the free ranges remain visible to dnode_block_freed() throughout the
sync process, preventing potential stale data reads.

This patch:
 - Keeps the range tree attached during processing for visibility.
 - Processes segments one-by-one by restarting from the tree head.
 - Uses zfs_range_tree_clear() to safely handle ranges that may have
   been modified while the lock was dropped.
 - adds ASSERT()s to document that we don't expect dn_free_ranges
   modification outside of sync context.

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Issue #18186
Closes #18235
2026-03-23 18:34:19 -07:00
Christos Longros
08d7a62673
ci: update FreeBSD CI images from 14.3 to 14.4
Update FreeBSD CI targets from 14.3 to 14.4 in both the QEMU
start script and the workflow configuration.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18362
2026-03-23 10:34:10 -07:00
Rob Norris
ef47c3a0be
pyzfs: update license tags/classifiers
The standard for package license metadata[1] is a SPDX identifier in the
the `license` and that's all. So, updating that, remove the deprecated
license classifier, and adding a tag at the top of the file for
spdxcheck to find.

1. https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18356
2026-03-23 09:56:25 -07:00
Christos Longros
a45929021a
zpool-list.8: clarify that only imported pools are listed (#18352)
The man page stated "all pools in the system are listed" which is
misleading, as only imported pools are shown. Clarify this and
add a cross-reference to zpool-import(8) for discovering pools
available for import.

Signed-off-by: Christos Longros <chris.longros@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2026-03-23 09:21:08 -07:00
John Cabaj
8518e3e809
Linux 7.0: autoconf: Remove copy-from-user-inatomic API checks (#18348) (#18354)
This function was removed in c6442bd3b6: "Removing old code outside
of 4.18 kernsls", but fails at present on PowerPC builds due to the
recent inclusion of 6bc9c0a90522: "powerpc: fix KUAP warning in VMX
usercopy path" in the upstream kernel, which introduces a use of
cpu_feature_keys[], which is a GPL-only symbol. Removing the API
check as it doesn't appear necessary.

Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
2026-03-23 09:19:41 -07:00
Tony Hutter
3f3cadc525
CI: Add ARM builder
Do a ZFS build inside of an ARM runner.  This only does a simple
build, it does not run the test suite.  The build runs on the
runner itself rather than in a VM, since nesting is not supported on
Github ARM runners.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18343
2026-03-19 12:22:32 -07:00
Ameer Hamza
4627c57f27
CI: Support repository variable override for ZTS OS selection
Allow restricting ZTS OS targets by setting the vars.ZTS_OS_OVERRIDE
repository variable (e.g. '["debian13"]') to reduce shared runner
contention when running the full OS matrix is unnecessary. When unset,
the existing ci_type-based OS selection is used unchanged.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18342
2026-03-19 12:21:45 -07:00
Rob Norris
3ee08abd2f linux/super: flatten zpl_fill_super into zpl_get_tree
Target of opportunity; with no other callers, there's no need for it to
be a static function.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-03-18 12:00:38 -07:00
Rob Norris
96a0b20201 linux/super: flatten zpl_mount_impl into zpl_get_tree
Target of opportunity; with no other callers, there's no need for it to
be a static function.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-03-18 12:00:33 -07:00
Rob Norris
5ebb9ff914 linux/super: flatten mount/remount into get_tree/reconfigure
With the old API gone, there's no need to massage new-style calls into
its shape and call another function; we can just make those handlers
work directly.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-03-18 12:00:29 -07:00
Rob Norris
f828aeb62a linux/super: remove support for old mount API
Removing the HAVE_FS_CONTEXT gates and anything that would be used if it
wasn't set.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-03-18 12:00:24 -07:00
Rob Norris
188888ac37 config: refuse to build without fs_context
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18339
2026-03-18 11:59:57 -07:00
Christos Longros
f259a47c74
zpool-iostat.8: clarify first report shows per-second averages
The first report printed by zpool iostat shows average per-second
values since boot, not cumulative totals. Clarify this in the man
page to avoid user confusion.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18337
2026-03-18 10:48:04 -07:00
Rob Norris
d8c08a1cea
Linux 7.0: also set setlease handler on directories (#18331)
It turns out the kernel can also take directory leases, most notably in
the NFS server. Without a setlease handler on the directory file ops,
attempts to open a directory over NFS can fail with EINVAL.

Adding a directory setlease handler was missed in 168023b603. This fixes
that, allowing directories to be properly accessed over NFS.

Sponsored-by: TrueNAS
Reported-by: Satadru Pramanik <satadru@gmail.com>

Signed-off-by: Rob Norris <rob.norris@truenas.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2026-03-17 15:28:30 -07:00
Ameer Hamza
4655bdd8ab
ZTS: Fix L2ARC test reliability
Disable depth cap (L2ARC_EXT_HEADROOM_PCT=0) in DWPD and multidev tests
that rely on predictable marker advancement during fill and measurement.

Rework multidev_throughput to verify sustained throughput across three
consecutive windows instead of asserting an absolute rate. Use larger
cache devices (8GB) to avoid frequent global marker resets
(smallest_capacity/8), fill ARC before attaching caches to provide a
stable evictable buffer pool, and lower write_max to 8MB/s to avoid
exhausting data within the measurement period.

Use destroy_pool (log_must_busy) instead of log_must zpool destroy
to handle transient EBUSY during teardown.

Remove l2arc_multidev_throughput_pos from the expected-fail list in
zts-report.py.in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18321
2026-03-16 14:58:25 -07:00
Christos Longros
f80338facc
zarcsummary: add man page
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18330
2026-03-16 11:08:05 -07:00
Christos Longros
f9d26b88b7
zilstat: add AUTHORS section to man page
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18329
2026-03-16 11:07:27 -07:00
Brian Behlendorf
488f04e276
ZTS: Add back redundancy_draid_spare3 exception
Observed again in the CI.  Put the maybe exception back in place
and reference a newly created issue for this sporadic failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18320
2026-03-16 10:39:01 -07:00
Aditya Gollamudi
b481a8bbbf
Make zpool status dedup table support raw bytes -p output
Check if -p flag is enabled, and if so print dedup table with raw
bytes. Restructure the logic in zutil_pool to check if -p flag is
enabled before printing either the bytes or raw numbers.

Calls to print the data for DDT now all use
zfs_nicenum_format(). Increased DDT histogram column buffers
to 32 bytes to prevent truncation when -p is enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #11626
Closes #17926
2026-03-13 09:53:56 -07:00
Graham Perrin
4d78b14c14
README: add FreeBSD 14.4-RELEASE alongside 15.0
14.4 was announced yesterday.  Also link to part of
FreeBSD's security page.

https://www.freebsd.org/security/#sup
presents a fairly comprehensive table of supported releases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Graham Perrin <grahamperrin@gmail.com>
Closes #18304
2026-03-12 19:02:45 -07:00
Brian Behlendorf
5b2923c2e0
ZTS: redundancy_draid_spare{1,3} exceptions
Update the redundancy_draid_spare1 exception to reference an issue
which describes the failure.

Remove the exception for the redundancy_draid_spare3 test.  I have
not observed it in local testing.  If it reproduces in the CI we
can create a new issue for it and put back the exception.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18308
2026-03-12 18:37:27 -07:00
Alexander Motin
3583fa38e8 ZVOL: Restrict cloning with different properties
While technically its not a problem to clone between ZVOLs with
different properties, it might create expectation of new properties
being applied during data move, while actually it won't happen.
For copies and checksum it may mean incorrect safety expectations.
For dedup, compression and special_small_blocks -- performance and
space usage.

This is a replica of #18180 from FS.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18315
2026-03-12 18:30:58 -07:00
Alexander Motin
15e37e0919 ZVOL: Add encryption key check for block cloning
Somehow during block cloning porting from file systems was missed
the check for identical encryption keys.  As result, blocks cloned
between unrelated ZVOLs produced authentication errors on later
reads.  Having same or different encryption root does not matter.

This patch copies dmu_objset_crypto_key_equal() call from FS side.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18315
2026-03-12 18:30:26 -07:00
Sean Eric Fagan
06b0abfe62
Fix the send --exclude option to work with encryption
When using --exclude, filtering needs to take place in two places:
in zfs_main.c via the callback previously added to support the
options, and in libzfs_sendrecv.c because it generates the nvlist
during a first pass, and that results in it complaining if the
excluded dataset is not available for sending. (eg, excluding an
encrypted dataset so you don't have to use --raw wouldn't work,
because the first pass would look at the dataset and decide you
couldn't use it.) Add send --exclude tests, including one that tests
excluding an encrypted hierarchy.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sean Eric Fagan <sef@kithrup.ie>
Closes #18278
2026-03-12 15:10:28 -07:00
Alan Somers
753f1e1e21
zstream: add a drop_record subcommand
It can be used to drop extraneous records in a send stream caused by a
corrupt dataset, as in issue #18239.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #18275
2026-03-12 15:08:58 -07:00
siv0
7f65e04abd
libzfs: scrub: only include start and end nv pairs if needed for scrub
This patch addresses running `zpool scrub <pool>` with ZFS 2.4 userspace
while the loaded kernel module is still 2.3, failing with:
```
cannot scrub <pool>: the loaded zfs module does not support an option
for this operation. A reboot may be required to enable this option.
```

Checking for the source of the message via `strace` showed the scrub
ioctl failing and setting errno to ZFS_ERR_IOC_ARG_UNAVAIL[0]. With
that and the comments in `module/zfs/zfs_ioctl.c`[1] commit: 894edd084
seemed like a likely cause for the backward incompatibility.

The corresponding kernelspace code in `module/zfs/zfs_ioctl.c` defaults
to a setting of 0 if either parameter is not set, so not providing the
nvpairs in case both are 0 should not make a semantic difference.

Tested by:
* loading zfs.ko in version 2.3.6
* running `zpool scrub testpool` with zpool from master (error occurs)
* running `zpool scrub testpool` with this patch applied (scrub is
  started)

This should help users who are still stuck on an older kernel module,
while their distribution ships newer ZFS userspace.

This was observed in the Proxmox community forum:
https://forum.proxmox.com/threads/.180467/

[0] d35951b18d/include/sys/fs/zfs.h (L1762)
[1] d35951b18d/module/zfs/zfs_ioctl.c (L7799)
Fixes: 894edd084 ("Add TXG timestamp database")

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Co-authored-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #18314
2026-03-12 15:06:23 -07:00
Sean Eric Fagan
f109c7bb98
Add the --file-layout (-f) option to zdb(8)
Displays the physical raidz block layout for a given file.
This leverages the internal vdev_raidz_map_alloc() function to find
the map of how the block data is laid out across the child disks.

The column entry for each row looks like:
+------------+
|  D2     43 |
|     6020da |
+------------+
representing here the logical data column 2 that is 43 sectors high
starting at sector 0x6020da.

With -H, the output is a list of disks, LBAs, and block counts,
given in 512 byte block values.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: sean.fagan@klarasystems.com
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Sean Fagan <sean.fagan@klarasystems.com>
Closes #18264
2026-03-12 14:41:23 -07:00
Rob Norris
2b930f63f8
config: fix STATX_MNT_ID detection
statx(2) requires _GNU_SOURCE to be defined in order for sys/stat.h to
produce a definition for struct statx and the STATX_* defines. We get
that at compile time because we pass -D_GNU_SOURCE through to
everything, but in the configure check we aren't setting _GNU_SOURCE, so
we don't find STATX_MNT_ID, and so don't set HAVE_STATX_MNT_ID.

(This was fine before ccf5a8a6fc, because linux/stat.h does not require
_GNU_SOURCE).

Simple fix: in the check, define _GNU_SOURCE before including
sys/stat.h.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18312
2026-03-12 09:58:54 -07:00
Rob Norris
cff853cec5
contrib/debian: add zilstat.1 manpage to installation list
Missed in 65165df129, which caused Debian packaging to fail.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18313
2026-03-12 09:17:32 -07:00
Christos Longros
d35951b18d
zpool clear: remove undocumented rewind flags
Remove the -F, -n, and -X flags from zpool clear.  These flags were
inherited from OpenSolaris but are not applicable in this context.
Unlike zpool import, where the pool is not yet loaded and a specific
TXG can be selected, zpool clear operates on an already imported pool
whose in-memory state is ahead of what is on disk.  Rewinding
transactions would require force-exporting the pool first.

The rewind policy passed to zpool_clear() is now always
ZPOOL_NO_REWIND.

Tested on FreeBSD 16.0-CURRENT (amd64).  Verified that -F, -n, and
-X are properly rejected as invalid options and that the usage output
reflects the change.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #13825
Closes #18300
2026-03-11 15:15:45 -07:00
Christos Longros
65165df129
zilstat: add man page
The zilstat command has no man page. Add zilstat.1 documenting all
options and field definitions based on the source in cmd/zilstat.in.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18303
2026-03-11 15:11:41 -07:00
Andriy Tkachuk
b403040c4c
draid: fix data corruption after disk clear
Currently, when there there are several faulted disks with attached
dRAID spares, and one of those disks is cleared from errors (zpool
clear), followed by its spare being detached, the data in all the
remaining spares that were attached while the cleared disk was in
FAULTED state might get corrupted (which can be seen by running scrub).
In some cases, when too many disks get cleared at a time, this can
result in data corruption/loss.

dRAID spare is a virtual device whose blocks are distributed among
other disks. Those disks can be also in FAULTED state with attached
spares on their own. When a disk gets sequentially resilvered (rebuilt),
the changes made by that resilvering won't get captured in the DTL
(Dirty Time Log) of other FAULTED disks with the attached spares to
which the data is written during the resilvering (as it would normally
be done for the changes made by the user if a new file is written or
some existing one is deleted). It is because sequential resilvering
works on the block level, without touching or looking into metadata,
so it doesn't know anything about the old BPs or transactions groups
that it is resilvering. So later on, when that disk gets cleared
from errors and healing resilvering is trying to sync all the data
from its spare onto it, all the changes made on its spare during the
resilvering of other disks will be missed because they won't be
captured in its DTL. That's why other dRAID spares may get corrupted.

Here's another way to explain it that might be helpful. Imagine a
scenario:

1. d1 fails and gets resilvered to some spare s1 - OK.
2. d2 fails and gets sequentially resilvered on draid spare s2. Now,
   in some slices, s2 would map to d1, which is failed. But d1 has s1
   spare attached, so the data from that resilvering goes to s1, but
   not recorded in d1's DTL.
3. Now, d1 gets cleared and its s1 gets detached. All the changes
   done by the user (writes or deletions) have their txgs captured
   in d1's DTL, so they will be resilvered by the healing resilver
   from its spare (s1) - that part works fine. But the data which
   was written during resilvering of d2 and went to s1 - that one
   will be missed from d1's DTL and won't get resilvered to it. So
   here we are:
4. s2 under d2 is corrupted in the slices which map to d1, because
   d1 doesn't have that data resilvered from s1.

Now, if there are more failed disks with draid spares attached which
were sequentially resilvered while d1 was failed, d3+s3, d4+s4 and
so on - all their spares will be corrupted. Because, in some slices,
each of them will map to d1 which will miss their data.

Solution: add all known txgs starting from TXG_INITIAL to DTLs of
non-writable devices during sequential resilvering so when healing
resilver starts on disk clear, it would be able to check and heal
blocks from all txgs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18286
Closes #18294
2026-03-11 14:54:20 -07:00
Rob Norris
62fa8bcb3c abi: updates for mnttab cleanup
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
a59e712d25 libspl/mnttab: remove struct extmnttab
The two additional fields are never used by calling code, and we can
replace their sole internal use with an extra stack param.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
f64f12079c libspl/mnttab: remove getmntany()
Only used for when the mount cache was disabled, but since its always
enabled now, we don't need it.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
143f410e99 libspl/mnttab: make mnttab source filenames consistent
FreeBSD's getextmntent.c is only separate because it has a different
license to mnttab.c, otherwise it would go there too.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
c0ea89db9f libzfs/mnttab: shorten names, reorg a bit
We can't change the public interface, but internally we don't need so
much redundant naming.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
f43cb1fef6 libzfs/mnttab: lift node alloc/free
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
0ecf5e3f62 libzfs/mnttab: always enable the cache
There's no real reason not to enable it always; the `zfs` command always
enables it anyway, and right now there's multiple places that do mount
work that don't go through the cache anyway. Having it always be on lets
us remove a bunch of the fallback code.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
b5637fba1c libzfs/mnttab: use SPL mutexes
More consistent, less typing, and we can check ownership.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
02224bca40 libzfs/mnttab: lift mnttab cache into separate file
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Alek P
ae7fcd5f92
fix libzfs diff mem leak in an error path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18301
2026-03-10 12:39:49 -07:00
Ameer Hamza
5b93d1a218 L2ARC: Fix prev_hdr use-after-free in l2arc_write_sublist
prev_hdr is dereferenced after the sublist lock is dropped for write I/O
but nothing prevents it from being freed during that window. Eliminate
prev_hdr entirely and simplify persistent marker repositioning logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:23 -07:00
Ameer Hamza
be5d36919a man: Update L2ARC documentation for depth cap and write budget fairness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:18 -07:00
Ameer Hamza
b27a87f399 L2ARC: Write budget fairness for metadata monopolization
Under heavy metadata load, metadata passes can monopolize the write
budget every cycle while data passes get nothing written. Track
consecutive monopolized cycles per device in l2ad_meta_cycles. After
l2arc_meta_cycles (default 2) consecutive cycles where metadata fills
the write budget, skip metadata for one cycle to let data run.  Reset
the counter when nothing is written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:14 -07:00
Ameer Hamza
62ca8f721b L2ARC: Scan-based depth cap for persistent markers
With persistent markers and inclusive scanning, the marker traverses the
entire ARC state across many feed cycles, writing buffers far from the
tail that may no longer be relevant.

Track cumulative bytes scanned per pass in l2arc_ext_scanned. When scans
reach l2arc_ext_headroom_pct (default 25%) of the ARC state size, reset
the pass markers to the tail via lazy reset flags. This keeps markers
focused on the tail zone where buffers soon to be evicted have the most
value for L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:08 -07:00
Ameer Hamza
15fc3d64c8 L2ARC: Lazy sublist reset flags for persistent markers
Replace direct marker-to-tail manipulation with per-sublist boolean
flags consumed lazily by feed threads.  Each scanning thread resets its
own marker when it sees the flag, rather than having another thread
manipulate the marker directly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:01 -07:00
Ameer Hamza
22fdaf0b1f L2ARC: Even sublist headroom distribution with round-robin selection
The dynamic headroom redistribution formula gave later sublists
progressively larger scanning budgets, and random sublist selection
caused uneven coverage across sublists. For depth cap to work
effectively, each sublist should be equally and fairly treated.
Use equal per-sublist headroom (headroom / num_sublists) for even
distribution and deterministic round-robin selection for fair
coverage across cycles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 10:59:41 -07:00
Rob Norris
0b0971f82f README: describe specific kernels/distros we target
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:18 -07:00
Rob Norris
97e080c496 config: remove minimum kernel version check
The autoconf checks are more than enough to decide whether or not we can
work with this kernel or not.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:01 -07:00
Ivan Shapovalov
8531621aba zfs_main: create, clone, rename: accept -pp for non-mountable parents
Teach `zfs {create,clone,rename}` to accept a doubled `-p` flag (`-pp`)
to create non-existing ancestor datasets with `canmount=off`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:50:18 -07:00
Ivan Shapovalov
2f3f1ab1ba libzfs: teach zfs_create_ancestors() to accept properties
This will be used to support creating non-mountable ancestors in zfs(8).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:49:52 -07:00
Ameer Hamza
1eace59060
libzfs: use mount_setattr for selective remount including legacy mounts
When a namespace property is changed via zfs set, libzfs remounts the
filesystem to propagate the new VFS mount flags. The current approach
uses mount(2) with MS_REMOUNT, which reads all namespace properties
from ZFS and applies them together. This has two problems:

1. Linux VFS resets unspecified per-mount flags on remount. If an
   administrator sets a temporary flag (e.g. mount -o remount,noatime),
   a subsequent zfs set on any namespace property clobbers it.

2. Two concurrent zfs set operations on different namespace properties
   can overwrite each other's mount flags.

Additionally, legacy datasets (mountpoint=legacy) were never remounted
on namespace property changes since zfs_is_mountable() returns false
for them.

Add zfs_mount_setattr() which uses mount_setattr(2) to selectively
update only the mount flags that correspond to the changed property.
For legacy datasets, /proc/mounts is iterated to update all
mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy
datasets fall back to a full remount; legacy mounts are skipped to
avoid clobbering temporary flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18257
2026-03-09 11:06:22 -07:00
Alexander Ziaee
d45c8d6489
FreeBSD: Improve dmesg kernel message prefix
Provide intuitive log search keywords and increased system consistency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Ziaee <ziaee@FreeBSD.org>
Closes #18290
2026-03-09 10:17:23 -07:00
Christos Longros
304de7f19b
libzfs: handle EDOM error in zpool_create
When creating a pool with devices that have incompatible block sizes,
the kernel returns EDOM. However, zpool_create() did not handle this
errno, falling through to zpool_standard_error() which produced a
confusing message about invalid property values.

Add a case EDOM handler in zpool_create() to return EZFS_BADDEV with
a descriptive auxiliary message, consistent with the existing EDOM
handler in zpool_vdev_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18268
2026-03-08 12:59:10 -07:00
Andrew Walker
c5905b2cb7
Implement lzc_send_progress
This commit adds an implementation of lzc_send_progress, which
existed in the libzfs_core header, but not in ABI and lacked
an actual implementation. The libzfs_send_progress function
is altered so that it wraps around the lzc operation. This
fills a functional gap in libzfs core.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18288
2026-03-06 11:05:58 -08:00
Juhyung Park
c58b8b7dc2
Fix check for .cfi_negate_ra_state on aarch64
Checking for LD_VERSION in unreliable as not all distros define it on
the compiler's preprocessor.

Explicitly check it via autoconf.

This fixes support for Ubuntu 18.04 on arm64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Closes #18262
2026-03-06 11:04:37 -08:00
Rob Norris
e73ada771d
libzpool: lift zfs_file ops out to separate source file
So its easier to remove and replace on non-Unix platforms.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18281
2026-03-05 18:07:46 -08:00
Garth Snyder
d979457760
zstream: consolidate shared code
zstream currently contains three identical copies of dump_record(),
which appear to all be drawn from libzfs_sendrecv.c. The original
is marked internal.

This PR adds zstream_util.[hc] and puts the shared code there along with
a couple of other items in common.

No functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
Closes #18284
2026-03-05 15:33:03 -08:00
Idefix2020
5dad9459d5
Add --no-preserve-encryption flag
* Add an option to send datasets with params or replicate
without preserving encryption
* Add a test case for the new functionality

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Chris Jacobs <idefix2020dev@gmail.com>
Closes #18240
2026-03-05 15:08:17 -08:00
Rob Norris
c329530e6b Add simd_config.h and HAVE_SIMD() selector
We need to select which SIMD variable to check based on the compilation
target: HAVE_KERNEL_xxx for the Linux kernel, HAVE_TOOLCHAIN_xxx for
other platforms.

This adds a HAVE_SIMD() macro returns the right result depending on the
definedness or value of the variable for this target.

The macro is in simd_config.h, which is forcibly included in every
compiler call (like zfs_config.h), to ensure that it can be used
directly without further includes.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:42 -08:00
Rob Norris
35f74f84e6 Convert all HAVE_<name> SIMD gates to HAVE_SIMD(<name>)
The original names no longer exist, and the new ones will need to be
selectable based on the current compilation target.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:37 -08:00
Rob Norris
92a6ab405f config: also do SIMD checks on the kernel toolchain
The kernel may be built with a different compiler, and also includes
objtool, which may fail on unknwon instructions sequences. So, we want
to run the checks a second time for that toolchain too.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:32 -08:00
Rob Norris
c183268019 config: generate SIMD checks from table
No need to repeat all that boilerplate each time!

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:26 -08:00
Rob Norris
23bd583830 config: remove checks for unused SIMD gates
Specifically, we don't have any code gated on:

    HAVE_SSE
    HAVE_SSE3
    HAVE_SSE4_2
    HAVE_AVX512CD
    HAVE_AVX512DQ
    HAVE_AVX512IFMA
    HAVE_AVX512VBMI
    HAVE_AVX512PF
    HAVE_AVX512ER

So we can remove them and the checks that probe and generate them.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:20 -08:00
Rob Norris
e4b8d6a56f linux/simd_x86: remove obsolete kernel feature gates
Most of the X86_FEATURE_* defines we use were introduced in kernels much
older than those we support, so there's no need to check for them.

For the history, these are the ones being removed, and the kernel
versions/commits where they were introduced:

    <4.6  torvalds/linux@cd4d09ec6f (refactor/consolidation commit)
        OSXSAVE
        BMI1
        BMI2
        AES
        PCLMULQDQ
        MOVBE
        SHA_NI
        AVX512F
        AVX512CD
        AVX512ER
        AVX512PF

    4.6   torvalds/linux@d050049442
        AVX512BW
        AVX512DQ
        AVX512VL

    4.10  torvalds/linux@a8d9df5a50
        AVX512IFMA
        AVX512VBMI

    4.15  torvalds/linux@c128dbfa0f
        VAES
        VPCLMULQDQ

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:00:43 -08:00
Alexander Motin
1e1d64d665
Fix log vdev removal issues
When we clear the log, we should clear all the fields, not only
zh_log.  Otherwise remaining ZIL_REPLAY_NEEDED will prevent the
vdev removal.  Handle it also from the other side, when zh_log
is already cleared, while zh_flags is not.

spa_vdev_remove_log() asserts that allocated space on removed log
device is zero.  While it should be so in perfect world, it might
be not if space leaked at any point.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18277
2026-03-04 09:12:14 -05:00
Brian Behlendorf
f6205fdf64
ZTS: Adjust mmp_on_uberblocks threshold
Decrease the number of required uberblock blocks write slightly due
to observed variation when running in the CI.  This should help
avoid future false positives.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18280
2026-03-03 13:11:51 -08:00
Brian Behlendorf
75659a4e50
ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18274
2026-03-03 11:18:46 -08:00
Rob Norris
1e2c94a043
More consistent use of TREE_* macros in AVL comparators
Where is it appropriate and obvious, use TREE_CMP(), TREE_ISIGN() and
TREE_PCMP() instead or direct comparisons. It can make the code a lot
smaller, less error prone, and easier to read.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18259
2026-03-03 09:08:23 -08:00
Brian Behlendorf
0f90a797dd
Fix vdev_rebuild_range() tx commit
The spa_sync thread waits on ->spa_txg_zio and will set ZIO_WAIT_DONE
before running the sync tasks.  The dmu_tx_commit() call must be done
after we add the child zio to the ->spa_txg_zio parent otherwise its
possible the child is added after txg_sync has waited.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18276
2026-03-03 09:05:34 -08:00
Ryan Moeller
ac0fd40c8c Add zpool properties for allocation class space
The existing zpool properties accounting pool space (size, allocated,
fragmentation, expandsize, free, capacity) are based on the normal
metaslab class or are cumulative properties of several classes combined.

Add properties reporting the space accounting metrics for each metaslab
class individually.

Also introduce pool-wide AVAIL, USABLE, and USED properties reporting
values corresponding to FREE, SIZE, and ALLOC deflated for raidz.

Update ZTS to recognize the new properties and validate reported values.

While in zpool_get_parsable.cfg, add "fragmentation" to the list of
parsable properties.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Cloes #18238
2026-03-02 15:50:23 -08:00
Ryan Moeller
6ba3f915d0 zcommon: Fix description of vdev capacity format
Capacity is reported as a percentage not a size.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Closes #18238
2026-03-02 15:49:23 -08:00
Akash B
f8e5af53e9
Fix redundant declaration of dsl_pool_t
Remove redundant dsl_pool variable and duplicate spa_get_dsl()
call in vdev_rebuild_thread.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #18263
2026-02-27 10:39:52 -08:00
Andriy Tkachuk
f8457fbdc4
Fix deadlock on dmu_tx_assign() from vdev_rebuild()
vdev_rebuild() is always called with spa_config_lock held in
RW_WRITER mode. However, when it tries to call dmu_tx_assign()
the latter may hang on dmu_tx_wait() waiting for available txg.
But that available txg may not happen because txg_sync takes
spa_config_lock in order to process the current txg. So we have
a deadlock case here:

 - dmu_tx_assign() waits for txg holding spa_config_lock;
 - txg_sync waits for spa_config_lock not progressing with txg.

Here are the stacks:

    __schedule+0x24e/0x590
    schedule+0x69/0x110
    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    dmu_tx_wait+0x8e/0x1e0 [zfs]
    dmu_tx_assign+0x49/0x80 [zfs]
    vdev_rebuild_initiate+0x39/0xc0 [zfs]
    vdev_rebuild+0x84/0x90 [zfs]
    spa_vdev_attach+0x305/0x680 [zfs]
    zfs_ioc_vdev_attach+0xc7/0xe0 [zfs]

    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    spa_config_enter+0xf9/0x120 [zfs]
    spa_sync+0x6d/0x5b0 [zfs]
    txg_sync_thread+0x266/0x2f0 [zfs]

The solution is to pass txg returned by spa_vdev_enter(spa)
at the top of spa_vdev_attach() to vdev_rebuild() and call
dmu_tx_create_assigned(txg) which doesn't wait for txg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18210
Closes #18258
2026-02-26 11:18:02 -08:00
Rob Norris
f3d4c79496
zpl_super: prefer "new" mount API when available
This API has been available since kernel 5.2, and having it available
(almost) everywhere should give us a lot more flexibility for mount
management in the future.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18260
2026-02-25 13:17:33 -08:00
Rob Norris
09c27a14a3 icp: add SHA512 implementation using Intel SHA512 extensions
Generated from crypto/sha/asm/sha512-x86_64.pl in
openssl/openssl@241d4826f8.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:48:30 -08:00
Rob Norris
3547a358fd simd: detect and surface support for Intel SHA512 extensions
Recent Intel CPUs (starting with Arrow Lake and Lunar Lake) include new
vectorised SHA512 instructions. Detect them and make them available to
the rest of the system.

Note the internal name "sha512ext". This is to disambiguate from other
uses of "sha512".

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:47:48 -08:00
clefru
6495dafd58
range_tree: use zfs_panic_recover() for partial-overlap remove
zfs_range_tree_remove_impl() used a bare panic() when a segment to be
removed was not completely overlapped by an existing tree entry.  Every
other consistency check in range_tree.c uses zfs_panic_recover(), which
respects the zfs_recover tunable and allows pools with on-disk
corruption to be imported and recovered.  This one call was
inconsistent, making the partial-overlap case unrecoverable regardless
of zfs_recover.

Replace panic() with zfs_panic_recover() so that operators can set
zfs_recover=1 to import a corrupted pool and reclaim data, consistent
with all other range tree error paths.

Related-to: https://github.com/openzfs/zfs/issues/13483
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #18255
2026-02-25 11:26:10 -08:00
Tony Hutter
4da3f059a3
CI: Remove deprecated Fedora 41
Fedora 41 was deprecated on Dec 15 2025.  Remove it from CI tests.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18261
2026-02-25 11:20:23 -08:00
Alexander Motin
991fc56fae
Introduce dedupused/dedupsaved pool properties
Currently there is only a dedup ratio reported via pool properties.
If dedup is enabled only for some datasets, it is impossible to say
how much space the ratio actually covers.  Fix this by introducing
dedupused/dedupsaved pool properties, similar to earlier added
block cloning ones.  Combined with work to expose allocation classes
stats, it should give user-space enough visibility to correlate
`zpool list` and `zfs list` space numbers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18245
2026-02-25 09:41:38 -05:00
Mateusz Piotrowski
3408332d71
zhack: Fix importing large allocation profiles on small pools (#18256)
This patch fixes a segmentation fault in zhack metaslab leak which might
be triggered by feeding zhack with a fragmentation profile that's
exported from a pool larger than the target pool.

Fixes: 8f15d2e4d5
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
2026-02-24 10:24:22 -08:00
Rob Norris
0f608aa6ca Linux 7.0: add shims for the fs_context-based mount API
The traditional mount API has been removed, so detect when its not
available and instead use a small adapter to allow our existing mount
functions to keep working.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:45:12 -08:00
Rob Norris
d34fd6cff3 Linux 7.0: posix_acl_to_xattr() now allocates memory
Kernel devs noted that almost all callers to posix_acl_to_xattr() would
check the ACL value size and allocate a buffer before make the call. To
reduce the repetition, they've changed it to allocate this buffer
internally and return it.

Unfortunately that's not true for us; most of our calls are from
xattr_handler->get() to convert a stored ACL to an xattr, and that call
provides a buffer. For now we have no other option, so this commit
detects the new version and wraps to copy the value back into the
provided buffer and then free it.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:48 -08:00
Rob Norris
204de946eb Linux 7.0: blk_queue_nonrot() renamed to blk_queue_rot()
It does exactly the same thing, just inverts the return. Detect its
presence or absence and call the right one.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:20 -08:00
Attila Fülöp
7744f04962
SIMD: libspl: test the correct CPUID bit for AVX512VL
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18254
2026-02-23 09:42:25 -08:00
Christos Longros
6a717f31e6
Improve misleading error messages for ZPOOL_STATUS_CORRUPT_POOL
When devices are missing or claimed by another subsystem (e.g.
mdadm, LVM), zpool import reports "The pool metadata is corrupted"
and suggests destroying the pool. This is misleading because the
metadata is not necessarily corrupted -- it may simply be incomplete
due to inaccessible devices.

Update the status, action, and recovery messages to acknowledge
that missing devices can trigger this status, and suggest checking
device availability before resorting to pool destruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Longros <chris.longros@gmail.com>
Closes #18251
Closes #8236
2026-02-23 09:41:24 -08:00
Louis Leseur
bbf0106c6b
build: get objtool from $kernelbuild
On systems where `$kernelsrc` is different than `$kernelbuild`, the
objtool binary will be located in `$kernelbuild` as it's the result of
running `make prepare` during kernel build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Louis Leseur <louis.leseur@gmail.com>
Closes #18248
Closes #18249
2026-02-23 09:39:51 -08:00
MigeljanImeri
4975430cf5
Add vdev property to disable vdev scheduler
Added vdev property to disable the vdev scheduler.
The intention behind this property is to improve IOPS
performance when using o_direct.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #17358
2026-02-23 09:34:33 -08:00
Tony Hutter
d2f5cb3a50
Move range_tree, btree, highbit64 to common code
Break out the range_tree, btree, and highbit64/lowbit64 code from kernel
space into shared kernel and userspace code.  This is needed for the
updated `zpool status -vv` error byte range reporting that will be
coming in a future commit.  That commit needs the range_tree code in
kernel and userspace.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18133
2026-02-22 11:43:51 -08:00
Rob Norris
168023b603
Linux 7.0: explicitly set setlease handler to kernel implementation
The upcoming 7.0 kernel will no longer fall back to generic_setlease(),
instead returning EINVAL if .setlease is NULL. So, we set it explicitly.

To ensure that we catch any future kernel change, adds a sanity test for
F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test,
also a small adjustment to the test runner to allow OS-specific helper
programs.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18215
2026-02-22 11:39:06 -08:00
Rob Norris
d11c661544 zdb: handle key load/derive failures a bit more gracefully
There's no real need to outright crash if key loading fails; we can
just unwind nicely.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:37:43 -08:00
Rob Norris
9f874ad092 zdb: don't try to load key for unencrypted dataset
Previously using -K/--key on an unencrypted dataset would trip a VERIFY,
because the dataset has nowhere to load the key into.

Now, just ignore it. This makes zdb much easier to drive when there's a
mix of encrypt and non-encrypted datasets, as the key can provided for
all of them (at least, assuming the same encryption root, which is a
common enough case).

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:37:11 -08:00
Rob Norris
b021cb60aa ZTS: make get_same_blocks() fail harder if zdb fails
Because it's called in $(...), it will swallow all errors, so we have to
work harder to recognise falure and echo a string that can't ever match
what the test is expecting.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:36:49 -08:00
Rob Norris
aeb9fb3828 sha2_test: do correctness checks for all implementations
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18232
2026-02-19 15:16:36 -08:00
Rob Norris
b291d9aa22 get_cpu_freq: handle CPUs with variable frequency
If a CPU has variable frequency, then lscpu will list separate "CPU min
freq" and "CPU max freq" values. In this case, take the maximum.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18232
2026-02-19 15:16:18 -08:00
Alexander Motin
d06a1d9ac3
Fix available space accounting for special/dedup (#18222)
Currently, spa_dspace (base to calculate dataset AVAIL) only includes
the normal allocation class capacity, but dd_used_bytes tracks space
allocated across all classes.  Since we don't want to report free
space of other classes as available (we can't promise new allocations
will be able to use it), report only allocated space, similar to how
we report space saved by dedup and block cloning.

Since we need deflated space here, make allocation classes track
deflated allocated space also.  While here, make mc_deferred also
deflated, matching its use contexts.  Also while there, use
atomic_load() to read the allocation class stats.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18190
Closes #18222
2026-02-19 10:36:35 -08:00
Tony Hutter
640a217faf
CI: Test & fix Linux ZFS built-in build
ZFS can be built directly into the Linux kernel.  Add a test build
of this to the CI to verify it works.  The test build is only enabled
on Fedora runners (since they run the newest kernels) and is done in
parallel with ZTS.  The test build is done on vm2, since it typically
finishes ~15min before vm1 and thus has time to spare.

In addition:

- Update 'copy-builtin' to check that $1 is a directory
- Fix some VERIFYs that were causing the built-in build to fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18234
2026-02-19 10:15:41 -08:00
Attila Fülöp
c8a72a27e5
ICP: AES-GCM assembly: remove unused Gmul functions
In the AES-GCM assembly files we are defining Gmul functions we
don't use anywhere.

Just remove the dead code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18226
2026-02-19 10:10:02 -08:00
Alexander Motin
370570890f
Remove parent ZIO from dbuf_prefetch()
I am not sure why it was added there 10 years ago, but it seems not
needed now.  According to my tests removing it improves sequential
read performance with recordsize=4K by 5-10% by reducing the CPU
overhead in prefetcher.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18214
2026-02-18 18:12:13 -08:00
Attila Fülöp
d489677280
ICP: AES-GCM VAES-AVX2: fix typos and document source files
Require AVX2 compiler support and document source files for
`aesni-gcm-avx2-vaes.S`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18225
2026-02-17 16:51:32 -08:00
Jessica Clarke
bfb276e55c
freebsd: Fix TIMESPEC_OVERFLOW for PowerPC
Once upon a time, 32-bit PowerPC did indeed have a 32-bit time_t, but
FreeBSD 12.0 switched to a 64-bit time_t for PowerPC as an ABI break,
which predates the addition of FreeBSD support to OpenZFS. Moreover,
64-bit PowerPC has existed since FreeBSD 9.0, where __powerpc__ is also
defined (alongside __powerpc64__ to disambiguate), which has always had
a 64-bit time_t. This code has therefore always been wrong for all
PowerPC variants. Fix this by limiting the 32-bit case to just i386,
which is the only architecture in FreeBSD to have a 32-bit time_t and
not have broken ABI, due to its special legacy compatibility status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #18217
Closes #18218
2026-02-17 16:46:02 -08:00
Attila Fülöp
bee53d8c10
Linux 6.19 compat: in-tree build: fix duplicate GCM assembly functions
Linux 6.19 added an AES-GCM VAES-AVX2 assembly implementation. It's
basically a translation from the BoringSSL perlasm syntax to macro
assembly. We're using the same source but the perlasm generated flat
assembly which shares some global function names with the former.
When  building in-tree this results in the linker failing due to the
duplicate symbols.

To avoid the error we prepend `icp_` via a macro to our function
names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Moch <mail@alexmoch.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18204
Closes #18224
2026-02-17 13:09:41 -08:00
Alexander Motin
0f9564e85b
Simplify dnode_level_is_l2cacheable()
We should not dereference through dn_handle->dnh_dnode once we
already have a dnode pointer.  The result will be the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18212
2026-02-16 10:34:22 -05:00
Alexander Motin
ba970eb202
Cleanup allocation class selection
- For multilevel gang blocks it seemed possible to fallback from
normal to special class, since they don't have proper object type,
and DMU_OT_NONE is a "metadata".  They should never fallback.
 - Fix possible inversion with zfs_user_indirect_is_special = 0,
when indirects written to normal vdev, while small data to special.
Make small indirect blocks also follow special_small_blocks there.
 - With special_small_blocks now applying to both files and ZVOLs,
make it apply to all non-metadata without extra checks, since there
are no other non-metadata types.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18208
2026-02-16 10:33:21 -05:00
Mariusz Zaborski
cdf89f413c
Flush RRD only when TXGs contain data
This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #18082
Closes #18138
2026-02-11 11:35:45 -08:00
Marc Sladek
cc184fe98b
Fix send:raw permission for send -w -I
When performing an incremental raw send with intermediates (-w -I),
the standard 'send' permission was incorrectly required instead of
allowing 'send:raw'. This was due to a strict boolean comparison on
the 'rawok' flag in zfs_secpolicy_send() with non-boolean value.

This change normalizes the 'rawok' variable to be strictly 0/1 and
updates the test suite to properly verify delegated raw send behavior.

Introduced-by: https://github.com/openzfs/zfs/pull/17543
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marc Sladek <marc@sladek.dev>
Closes #18198
Closes #18193
2026-02-11 10:30:26 -08:00
Tony Hutter
3463d40779
ZTS: Fix zed_synchronous_zedlet
Wait for scrub_finish (as the comments in the code suggest) rather
than trim_finish in zed_synchronous_zedlet.ksh.  This seems to
workaround the ZTS failures in #18192.  Also, fix some typos.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18192
Closes #18196
2026-02-11 10:05:14 -08:00
Tony Hutter
fdd70565cb
Linux 6.19 compat: META
Update the META file to reflect compatibility with the 6.19
kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18197
2026-02-11 09:37:02 -08:00
Christos Longros
040ba7a7ca
libzfs: improve error message for zpool create with ENXIO
When zpool create fails because a vdev cannot be opened (ENXIO),
the error falls through to zpool_standard_error() which reports
the generic 'one or more devices is currently unavailable'. This
is misleading when the real cause is a block size mismatch or
other device open failure.

Add an explicit ENXIO case in zpool_create()'s error handling to
provide a more descriptive message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18184
Closes #11087
2026-02-10 13:19:44 -08:00
Tony Hutter
e601a1fb77
CI: Test build Lustre against ZFS
The Lustre filessytem calls a number of exported ZFS functions.  Do a
test build on the Almalinux runners to make sure we're not breaking
Lustre.  We do the Lustre build in parallel with the normal ZTS test
for efficiency, since ZTS isn't very CPU intensive. The full Lustre
build takes around 15min when run on its own.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18161
2026-02-10 09:54:17 -08:00
Alexander Motin
aa29455dd7
Restrict cloning with different properties
While technically its not a problem to clone between datasets with
different properties, it might create expectation of new properties
being applied during data move, while actually it won't happen.
For copies and checksum it may mean incorrect safety expectations.
For dedup, compression and special_small_blocks -- performance and
space usage. New zfs_bclone_strict_properties tunable controls it.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18180
2026-02-10 09:53:24 -08:00
rmacklem
1412bdc6c2
zfs_vnops_os.c: Move a vput() to after zfs_setattr_dir()
Without this patch, the following crash can occur when
a file system is configured with "xattr=dir".

VNASSERT failed: locked not true at
 /posix-acl/freebsd-rdma/sys/kern/vfs_subr.c:5786 (assert_vop_locked)
    hold count flags ()
    flags ()
    lock type zfs: UNLOCKED
panic: zfs_dirent_lookup: vnode is not locked but should be
cpuid = 3
time = 1770520763
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x136/frame 0xfffffe00914c8270
panic() at panic+0x43/frame 0xfffffe00914c82d0
assert_vop_locked() at assert_vop_locked+0x78
zfs_dirent_lookup() at zfs_dirent_lookup+0x41
zfs_setattr_dir() at zfs_setattr_dir+0x123
zfs_setattr() at zfs_setattr+0x1389
zfs_freebsd_setattr() at zfs_freebsd_setattr+0x56b
VOP_SETATTR_APV() at VOP_SETATTR_APV+0x5d
setfown() at setfown+0xb1
kern_fchownat() at kern_fchownat+0x192

This patch fixes the problem by moving the vput() call for
attrzp to after the zfs_setattr_dir() call that takes it as
an argument.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes: #18188
2026-02-10 09:29:37 -05:00
Tim Hatch
64bae56b00
Include missing newline in 'man' error
Because the `strerror` result doesn't include a newline, we need to add
one.  Observed on a minimal system that doesn't have `man` installed,
which behaves like this before the fix:

```
[root@upper tim]# zpool help import
couldn't run man program: No such file or directory[root@upper tim]#
```

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Hatch <tim@timhatch.com>
Closes #18183
2026-02-09 10:19:08 -08:00
Alexander Motin
2646bd5585
Allow rewrite skip cloned and snapshotted blocks
Rewrite of cloned and snapshotted blocks can allocate additional
space, that may be undesired.  In some cases it may have sense
to still rewrite snapshotted blocks, expecting the snapshots to
rotate with time, freeing space.  In other cases rewrite of cloned
blocks may be acceptable, despite persistent space usage increase.
For this reason add them as separate flags to `zfs rewrite`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18179
2026-02-09 10:17:56 -08:00
Rob Norris
15fbf534c6
AUTHORS: add names of recent new contributors
"Welcome to my house! Enter freely. Go safely, and leave something of
the happiness you bring!"

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18189
2026-02-09 10:11:09 -08:00
Brian Behlendorf
ae488e496f ZTS: update the relevant mmp test cases
- mmp_concurrent_import: added test case to verify that concurrent
  import correctness.  The pool may only be imported once.

- mmp_exported_import: an activity check is now required for pools
  which were cleanly exported if the system and pool hostids don't
  match.

- mmp_inactive_import: an activity check is now required for any
  pool which wasn't cleanly exported, even if the system and pool
  hostids match.

- mmp_on_uberblocks: updated expected uberblocks to take in to account
  the value MMP_INTERVAL_DEFAULT is set too.

- mmp_reset_interval: reduce the number of iterations from 10 to 3.
  This is sufficient to verify functionality and significantly speeds
  up the test.

- mmp_on_uberblocks: adjust the thresholds and increase the runtime
  to avoid false positives observed in CI.

- Update tests to use 'zhack action idle' instead of ztest to improve
  the reliability of the tests.

- Add additional log_note messages to test cases which have multiple
  verification steps to make it clear which portion of a test failed
  when reviewing the logs.

- Replace default_setup/cleanup_noexit calls with 'zpool create' and
  'zpool destroy' calls to avoid additional unnecessary dataset
  creation work.

- Update activity/noactivity check helper functions to use the
  ZFS_LOAD_INFO_DEBUG information now available from 'zpool import'
  to determine if this activity check ran and why.  This is more
  reliable in the CI than measuring the runtime.

- Removed all mmp tests from the zts-report.py exceptions list.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:18 -08:00
Brian Behlendorf
d4c0e52188 zhack: add "action idle" subcommand
In order to reliably test the multihost protection we need two (or more)
systems attempting to import the pool at the same time.  Historically, we've
used ztest running in userspace to simulate an active pool and attempted to
import the pool with the kernel modules.  This works but ztest is a bit
unwieldy for this and if it crashes for unrelated reasons it can result
in false positives.

All we really need is the pool imported in userspace so the MMP thread is
active and writing out uberblocks.  We can extend zhack which already knows
how to import the pool read/write and add an option to leave the pool open
and idle.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:14 -08:00
Brian Behlendorf
731ff0a5ac zhack: add -G option to dump debug buffer
Add a -G option to zhack to dump the internal debug buffer on exit.
We were able to use the same code from zdb for this which was nice.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:10 -08:00
Brian Behlendorf
20176224ee mmp: claim sequence id before final import
As part of SPA_LOAD_IMPORT add an additional activity check to
detect simultaneous imports from different hosts.  This check is
only required when the timing is such that there's no activity
for the the read-only tryimport check to detect.  This extra
safety chceck operates as follows:

1. Repeats the following MMP check 10 times:
  a. Write out an MMP uberblock with the best txg and a random
     sequence id to all primary pool vdevs.
  b. Verify a minimum number of good writes such that even if
     the pool appears degraded on the remote host it will see
     at least one of the updated MMP uberblocks.
  c. Wait for the MMP interval this leaves a window for other
     racing hosts to make similar modifications which can be
     detected.
  d. Call vdev_uberblock_load() to determine the best uberblock
     to use, this should be the MMP uberblock just written.
  e. Verify the txg and random sequeunce number match the MMP
     uberblock written in 1a.

2. Restore the original MMP uberblocks.  This allows the check
   to be performed again if the pool fails to import for an
   unrelated reason.

This change also includes some refactoring and minor improvements.

- Never try loading earlier txgs during import when the import
  fails with EREMOTEIO or EINTER.  These errors don't indicate
  the txg is damaged but instead that its either in use on a
  remote host or the import was interactively cancelled.  No
  rewind is also performed for EBADD which can result from a
  stale trusted config when doing a verbatim import.

- Refactor the code for consistent logging of the multihost
  activity check using spa_load_note() and console messages
  indicating when the activity check was trigger and the result.

- Added MMP_*_MASK and MMP_SEQ_CLEAR() macros to allow easier
  modification of the sequence number in an uberblock.

- Added ZFS_LOAD_INFO_DEBUG environment variable which can be
  set to log to dump to stdout the spa_load_info nvlist returned
  during import.  This is used by the updated mmp test cases
  to determine if an activity check was run and its result.

- Standardize the mmp messages similarly to make it easier to
  find all the relevent mmp lines in the debug log.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:01 -08:00
Brian Behlendorf
2f048ced4d mmp: add spa_load_name() for tryimport
Tryimport adds a unique prefix to the pool name to avoid name
collisions.  This makes it awkward to log user-friendly info
during a tryimport.  Add a spa_load_name() function which can
be used to report the unmodified pool name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:35:03 -08:00
Brian Behlendorf
62a1bf7d19 mmp: move "Starting import" log message
Move the "Starting import" log message in to the import block so
it's matched with the "Fiinshed importing" debug message.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:34:57 -08:00
Brian Behlendorf
a9564b1787 mmp: further restrict mmp exported pool check
For a cleanly exported pools there exists a small window where
both systems may determine it's safe to import the pool and skip
the activity check.  Only allow the check to be skipped when the
last imported hostid matches the systems hostid and the pool was
cleanly exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:32:58 -08:00
Austin Wise
4f180e095a
Fix activating large_microzap on receive
This ensures that the in-memory state of the feature is recorded and
that `dsl_dataset_activate_feature` is not called when the feature
is already active.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Austin Wise <AustinWise@gmail.com>
Closes #18143
Closes #18144
2026-02-05 15:48:03 -08:00
George Shammas
20f94ef24a
pyzfs: remove unimplemented libzfs_core functions from pyzfs
As per #9008, pyzfs implements and documents several functions that
would be very useful, but then try to call c functions in libzfs_core.
These functions do not exist in libzfs_core, and in the ~7 years of
ticket creation still do not exist in libzfs_core.

It seems unlikely that these functions will get implemented, though 2
years ago, ~5 years after that ticket lzc_get_props was implemented in
23a489a411 which enabled get properties in
pyzfs. Sadly the first thing the  pyzfs function for lzc_get_props does
is call _list, which cals lzc_list, which is not implmented. And the
functions to set or inherit properties are still missing.

Having these functions in pyzfs are misleading, footguns, and time
wasters when evaluating pyzfs.

Removing these functions from pyzfs means that _if_ these functions are
added in libzfs_core, then pyzfs will also need to re-implement these
functions. It's a shame, because these py functions have good
documentation and tests. Funny enough the tests are auto skipped if it
detects that the functions don't exist in libzfs_core.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Shammas <george@shamm.as>
Closes #9008
Closes #18162
2026-02-05 15:34:55 -08:00
Alexander Motin
21bbe7cb67
Improve caching for dbuf prefetches
To avoid read errors with transaction open dmu_tx_check_ioerr()
is used to read everything required in advance.  But there seems
to be a chance for the buffer to evicted from dbuf cache in
between, which result in immediate eviction from ARC, which may
require additional disk read later in a place where error handling
is problematic.

To partially workaround this introduce a new flag DMU_IS_PREFETCH,
relayed to ARC as ARC_FLAG_PREFETCH | ARC_FLAG_PRESCIENT_PREFETCH,
making ARC delay eviction by at least several seconds, or till the
actual read inside the transaction, that will promote it to demand
access.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18160
2026-02-04 10:12:32 -08:00
Ameer Hamza
00d69b0f72 arc: remove unused l2df_size and l2df_type from l2arc_data_free_t
These fields became unused when ABD was introduced in a6255b7fc.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:26 -08:00
Ameer Hamza
6f17052743 cache_012_pos: disable compression to ensure L2ARC wrap
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:22 -08:00
Ameer Hamza
13552d754f ZTS: Add L2ARC DWPD and parallel writes tests
Add four new functional tests to validate L2ARC DWPD rate limiting and
parallel write features:

- l2arc_dwpd_ratelimit_pos: Verifies DWPD rate limiting with different
  values (0, 100, 1000, 10000) and ordering
- l2arc_dwpd_reimport_pos: Verifies DWPD rate limiting persists after
  pool export/import
- l2arc_multidev_scaling_pos: Verifies parallel write scaling ratio
  (dual devices achieve ~2× single device throughput)
- l2arc_multidev_throughput_pos: Verifies absolute parallel write
  throughput scales with device count (~32MB/s per device)

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:16 -08:00
Ameer Hamza
48d3f7fac9 man: Update L2ARC tunables for DWPD and parallel writes
Add l2arc_dwpd_limit, remove l2arc_write_boost, update related tunables.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:11 -08:00
Ameer Hamza
d1f290f1ea L2ARC: Implement DWPD-based rate limiting with adaptive feed intervals
Add DWPD (Drive Writes Per Day) rate limiting to control L2ARC write
speeds and protect SSD endurance. Write rate is constrained by the
minimum of l2arc_write_max and DWPD-calculated budget. Devices
accumulate unused write budget over 24-hour periods with automatic reset
and carry-over. Writes occur in controlled bursts (max 50MB) with
adaptive intervals to achieve target rates. Applies after initial device
fill.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:07 -08:00
Ameer Hamza
b525525b44 L2ARC: Implement per-device feed threads for parallel writes
Transform L2ARC from single global feed thread to per-device threads,
enabling parallel writes to multiple L2ARC devices. Each device runs
its own feed thread independently, improving multi-device throughput.
Previously, a single thread served all devices sequentially; now each
device writes concurrently. Threads are created during device addition
and torn down on removal.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:02 -08:00
Ameer Hamza
825dc41ad4 L2ARC: Preserve L2HDR in arc_release() for in-flight writes
When arc_release() is called on a header with a single buffer and
L2_WRITING set, the L2HDR must be preserved for ABD cleanup (similar
to the arc_hdr_destroy() case). If we destroy the L2HDR here, later
arc_write() will allocate a new ABD and call arc_hdr_free_abd(),
which needs b_l2hdr.b_dev to properly defer ABD cleanup, causing
VERIFY(HDR_HAS_L2HDR(hdr)) to fail.

Allocate a new header for the buffer in the single_buf_l2writing
case (single buffer + L2_WRITING), leaving the original header with
L2HDR intact. The original header becomes an "orphan" (no buffers, no
b_pabd) but retains device association for ABD cleanup when
l2arc_write_done() completes.

The shared buffer case (HDR_SHARED_DATA) is excluded because L2ARC
makes its own transformed copy via l2arc_apply_transforms(), so the
original ABD is not used by the L2 write. The header can be safely
reused without allocating a new one.

For proper evictable space accounting, arc_buf_remove() must be
called before remove_reference() in the single_buf_l2writing path.
This ensures arc_evictable_space_increment() (during remove_reference)
and arc_evictable_space_decrement() (during destruction) see the
same state (b_buf=NULL), preventing accounting leaks that cause
module unload to hang with non-zero esize.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:57 -08:00
Ameer Hamza
b8610c3d93 L2ARC: Reorder header destruction for in-flight L2 writes
With multiple L2ARC devices, headers can be destroyed asynchronously
(e.g., during zpool sync) while L2_WRITING is set. The original code
destroyed L2HDR before L1HDR, causing ABDs to lose their device
association (b_l2hdr.b_dev) when arc_hdr_free_abd() is called.

This caused ABDs to be added to the global free-on-write list without
device information. When any L2ARC device completed its write and
attempted to free these orphaned ABDs, it would panic on
ASSERT(!list_link_active(&abd->abd_gang_link)) because the ABD was
still part of another device's vdev_queue I/O aggregation gang.

Fix by extending l2ad_mtx lock scope to cover L1HDR destruction and
reordering to destroy L1HDR before L2HDR when L2_WRITING is set. This
ensures arc_hdr_free_abd() can access b_l2hdr.b_dev to properly tag
ABDs with their device for deferred cleanup.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:51 -08:00
Ameer Hamza
2f41b9d865 L2ARC: Implement persistent markers with consistent tail scanning
This commit introduces per-sublist persistent markers that eliminate
redundant tail scanning between L2ARC iterations, providing significant
CPU efficiency improvements. Markers are pre-allocated during device
initialization and properly cleaned up during device removal.

The implementation uses conditional behavior based on device capacity:
small devices (capacity < arc_c) retain original HEAD/TAIL scanning
based on ARC warmup state, while large devices (capacity >= arc_c)
use the persistent marker approach for optimal CPU efficiency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:47 -08:00
Ameer Hamza
3523b5f3f9 L2ARC: Implement even-depth multi-sublist scanning
The introduction of ARC multilists made L2ARC writing quite random,
depending on whether it found something to write in a randomly selected
sublist. This created inconsistent write patterns and poor utilization
of available sublists leading to uneven cache population.

This commit replaces random selection with systematic scanning across
all sublists within each burst. Fair headroom distribution ensures
even-depth traversal across all sublists until the target write size
is reached. Round-robin processing with random starting points eliminates
sequential bias while maintaining predictable write behavior.

The systematic approach provides consistent L2ARC filling patterns
and better utilization of available ARC data across all sublists.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:05:53 -08:00
Dennis Værum
07ae463d1a
Added support for multiple homes in pam_zfs_key module (#18084)
This implemented support for having multiple datasets unlocked and
mounted when a session is opened.
Example: `homes=rpool/home,tank/users`

Extra unit tests have been added

A man page documents have been added `man 8 pam_zfs_key`. A few
references to the new man page have also been added in other documents.

Signed-off-by: Dennis Vestergaard Værum <github@varum.dk>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2026-02-03 16:09:10 -08:00
Erik Larsson
7e33476a7c
Fix build for Linux 6.18 with PowerPC/RISC-V kernels. (#18145)
The macro 'flush_dcache_page(...)' modifies the page flags, but in Linux
6.18 the type of the page flags changed from 'unsigned long' to the
struct type 'memdesc_flags_t' with a single member 'f' which is the page
flags field.

Signed-off-by: Erik Larsson <catacombae@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2026-02-02 14:16:10 -08:00
John Cabaj
13601e2d24
Linux 6.19: handle --werror with CONFIG_OBJTOOL_WERROR=y
Linux upstream commit 56754f0f46f6: "objtool: Rename
--Werror to --werror" did just that, so we should check for
either "--Werror" or "--werror", else the build will fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Closes #18152
2026-02-02 10:19:18 -08:00
Tony Hutter
da9e8ff0df
CI: Fix qemu-1-setup failure, remove debug stuff
- For whatever reason, the runner will now startup with either two 75GB
  disks or one 150GB disk.  Previously the runner was always booting
  with two 75GB, but about a quarter of the time it now starts up
  with a single 150GB disk.  This caused qemu-1-setup.sh to fail
  since it expected the two 75GB disks.  This commit updates
  qemu-1-setup.sh to work with either disk config.

- Remove the watchdog from qemu-1-setup.sh.  It didn't turn out to be
  useful.

- Remove the timestamps that zfs-qemu.yml added to the qemu-1-setup.sh
  output.  The timestamps were redundant, since you can already
  download timestamped logs from the Github web interface.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18166
2026-01-31 12:40:55 -08:00
Brooks Davis
b364720524
nvpair: chase FreeBSD xdrproc_t definition
As of FreeBSD 16, xdrproc_t will take exactly two arguments in both
kernel and userspace in line with the Linux kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alan Somers <asomers@freebsd.org>
Signed-off-by:	Brooks Davis <brooks@capabilitieslimited.co.uk>
Closes #18154
2026-01-28 21:41:33 -05:00
Mariusz Zaborski
a157ef62a1
Make sure we can still write data to txg
The final txgs are used only to clear out any remaining deferred
frees, and we cannot write new data to them. Make sure we do not
try to do so.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #18139
2026-01-26 21:33:21 -05:00
Alexander Motin
35b2d39709
Lock db_mtx around arc_release() in couple places
* Lock db_mtx around arc_release() in dbuf_release_bp()

While this function is called only in sync context, the same buffer
can be touched by dbuf_hold_impl() in open context, creating races.
All other accesses to arc_release() are already protected by db_mtx,
so just take it here too.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>

* Lock db_mtx in sa_byteswap()

While SA code seems protected by sa_lock, there is a back door of
dmu_objset_userquota_get_ids(), that may hold and access the dbuf
without sa_lock, relying only on db_mtx. Taking db_mtx here should
protect both the arc_release() and the data for db_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18146
2026-01-26 21:32:16 -05:00
Alek P
cd895f0e57
remove thread unsafe debug code causing FreeBSD double free panic
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18140
2026-01-21 10:00:34 -08:00
Alexander Moch
28291536bc Zstd: Document update policy
Add the Zstd update policy to the subtree README.

Also update the documented location of zstd-in.c to match upstream
changes, and normalize naming from 'ZSTD' to 'Zstd'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:24 -08:00
Alexander Moch
2d5a9b6a4c Zstd: Restore SPDX license identifiers
When updating Zstandard to version 1.5.7 the SPDX license identifiers
were lost. This commit restores them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:18 -08:00
Alexander Moch
e7f9734bc7 Zstd: Fix ASan poisoning for pooled Zstd contexts
The Zstd context mempool can reuse buffers that were previously poisoned
under AddressSanitizer, leading to false-positive use-after-poison reports
during zloop and other stress tests.

Explicitly unpoison memory when handing buffers out to Zstd and poison the
user-visible region again when buffers are returned to the pool. This makes
the allocator ASan-correct while preserving existing pooling behavior.

Also fix non-standard void * pointer arithmetic in zstd_free() and remove an
early return in zstd_dctx_alloc() so kmem_type/kmem_size are always set on
pool hits.

This only affects ASan bookkeeping in user space, does not change runtime
behavior in non-ASan configurations, and does not affect on-disk formats.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:12 -08:00
Alexander Moch
a2ac9cd606 Zstd: Integrate v1.5.7 into the ZFS build system
This commit builds on the previous zstd library update and adds the
necessary ZFS integration and build system changes required to make
zstd 1.5.7 compile and function correctly.

Changes:
- Add zstd_preSplit.c (new in 1.5.7) to all build systems.
- Enable x86_64 assembly in userspace (huf_decompress_amd64.S).
- Disable assembly in kernel for RETHUNK/IBT compatibility.
- Disable intrinsics in kernel for EL10 x86_64-v3 baseline.
- Disable tracing in kernel builds for AArch64 compatibility.
- Fix ZSTD_isError symbol renaming with __asm__ directive.
- Rename abs64 to ZSTD_abs64 (FreeBSD kernel conflict).
- Fix bitstream.h attributes (MEM_STATIC -> FORCE_INLINE_TEMPLATE).
- Remove xxhash.c from BSD build (now header-only).
- Update symbol names in zstd_compat_wrapper.h.
- Ignore checkstyle for zstd-in.c.

Kernel assembly disabled for security mitigation compatibility. User
space retains full performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:06 -08:00
Alexander Moch
bbcddb127a Zstd: Update bundled library to v1.5.7 without further adjustments
This commit only replaces the bundled source and does not include any
ZFS integration changes. Because the build depends on integration
adjustments, it will fail until the accompanying integration commit is
applied.

Upstream release: https://github.com/facebook/zstd/releases/tag/v1.5.7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:40:37 -08:00
Mark Johnston
54b141fab5
FreeBSD: Remove references to DEBUG_VFS_LOCKS
This option is removed upstream in favour of plain INVARIANTS.

VNASSERT is always defined so I see no reason to use it conditionally.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18136
2026-01-19 08:55:17 -08:00
Martin Matuška
8605bdfdda
FreeBSD: unbreak compilation on i386
tests/zfs-tests/cmd/mmap_seek.c: use correct printf specifier
module/zfs/vdev.c: vdev_clear(): correctly cast argument to
atomic_add_64().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #18096
2026-01-14 17:02:41 -08:00
Alan Somers
3fffe4e707
Fix --enable-invariants on FreeBSD
The make symbols were never getting forwarded to the correct make
subprocess.  As far as I can tell, this has never worked.  Either that,
or something has changed in the behavior of make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #18131
2026-01-14 14:54:12 -08:00
shuppy
09e4e01e93
Fix history logging for zpool create -t
`zpool create` is supposed to log the command to the new pool’s history,
as a special record that never gets evicted from the ring buffer. but
when you create a pool with `zpool create -t`, no such record is ever
logged (#18102). that bug may be the cause of issues like #16408.

`zpool create -t` (83e9986f6e) and `zpool
import -t` (26b42f3f9d) are both designed
to override the on-disk zpool property `name` with an in-core
“temporary” name, but they work somewhat differently under the hood.

importing with a temporary name sets `spa->spa_import_flags |=
ZFS_IMPORT_TEMP_NAME` in ZFS_IOC_POOL_IMPORT, which tells
spa_write_cachefile() and spa_config_generate() to use the
ZPOOL_CONFIG_POOL_NAME in `spa->spa_config` instead of `spa->spa_name`.

creating with a temporary name permanently(!) sets the internal zpool
property `tname` (ZPOOL_PROP_TNAME) in the `zc->zc_nvlist_src` of
ZFS_IOC_POOL_CREATE, which tells zfs_ioc_pool_create()
(4ceb8dd6fd) and spa_create() to use that
name instead of `zc->zc_name`, then sets `spa->spa_import_flags |=
ZFS_IMPORT_TEMP_NAME` like an import.

but zfsdev_ioctl_common() fails to check for `tname` when saving the
pool name to `zfs_allow_log_key`, so when we call ZFS_IOC_LOG_HISTORY,
we call spa_open() on the wrong pool name and get ENOENT, so the logging
silently fails.

this patch fixes #18102 by checking for `tname` in zfsdev_ioctl_common()
like we do in zfs_ioc_pool_create().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: delan azabani <dazabani@igalia.com>
Closes #18118  
Closes #18102
2026-01-14 14:51:51 -08:00
Alexander Motin
765929cb4e
DDT: Add locking for table ZAP destruction
Similar to BRT, DDT ZAP can be destroyed by sync context when it
becomes empty.  Respectively similar to BRT introduce RW-lock to
protect open context methods from the destruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18115
2026-01-13 15:07:15 -08:00
Rob Norris
1051c3d211 spdxcheck: enforce SPDX license tags on build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-01-08 15:08:32 -08:00
Rob Norris
85391ee931 build: add SPDX license tags to build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-01-08 15:08:03 -08:00
Andrew Walker
aca58dbb65
Add fh_to_parent export definition
This commit adds support for converting a file handle to its
parent dentry. This is called in exportfs_decode_fh_raw()
when subtree checking is enabled in NFS. Defining this and
handling the expanded filehandles allows the knfsd to succeed
in handling the file handle where it might otherwise fail
with ESTALE when trying to open by filehandle.

A side effect of this change is that name_to_handle_at(2)
and open_by_handle_at(2) now support AT_HANDLE_CONNECTABLE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18099
2026-01-08 15:06:12 -08:00
Rob Norris
f2b4ed3fe5 spl: remove a _KERNEL check
This code is only compiled for the Linux kernel module, so that define
is always set.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:44 -08:00
Rob Norris
02a631139f spl: unexport kstat_proc_entry functions
These are used to implement the kstat and procfs_list interfaces, and
aren't used from outside. There's no need to export them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:37 -08:00
Rob Norris
662f33f323 spl: lift 64-bit math compat out to separate file
It's a lot of rarely-compiled code, so move it to the side to make other
code easier to read.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:32 -08:00
Rob Norris
2ca6e880da spl: remove old atomic lock
Long ago, SPL atomics were implemented as a global spinlock over
conventional operations. In 5e9b5d832b (2009-10) they was converted to
proper atomics, with the spinlock retained as a fallback.

The switch to compile with the fallback was later removed in a91258913f
(2018-05), but the code it enabled wasn't. So lets do that.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:14 -08:00
Dimitry Andric
2f1f25217f
icp: emit .note.GNU-stack section for all ELF targets
On FreeBSD, linking the zfs kernel module with binutils ld 2.44 shows
the following warning:

    ld: warning: aesni-gcm-avx2-vaes.o: missing .note.GNU-stack section
    implies executable stack
    ld: NOTE: This behaviour is deprecated and will be removed in a
    future version of the linker

Some of the `.S` files under `module/icp/asm-x86_64/modes` check whether
to emit the `.note.GNU-stack` section using:

    #if defined(__linux__) && defined(__ELF__)

We could add `&& defined(__FreeBSD__)` to the test, but since all other
`.S` files in the OpenZFS tree use:

    #ifdef __ELF__

it would seem more logical to use that instead. Any recent ELF platform
should support these note sections by now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #18119
2026-01-08 09:21:12 -08:00
Austin Wise
794f1587db
When receiving a stream with the large block flag, activate feature
ZFS send streams include a feature flag DMU_BACKUP_FEATURE_LARGE_BLOCKS
to indicate the presence of large blocks in the dataset. On the sending
side, this flag is included if the `-L` flag is passed to `zfs send`
and the feature is active in the dataset. On the receive side, the
stream is refused if the feature is active in the destination dataset
but the stream does not include the feature flag.

The problem is the feature is only activated when a large block is
born. If a large block has been born in the destination, but never
the source, the send can't work. This can arise when sending streams
back and forth between two datasets.

This commit fixes the problem by always activating the large blocks
feature when receiving a stream with the large block feature flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Austin Wise <AustinWise@gmail.com>
Closes #18105
2026-01-07 16:47:12 -08:00
Jitendra Patidar
2301755dfb
Fix zfs_open() to skip zil_async_to_sync() for the snapshot
Fix zfs_open() to skip zil_async_to_sync() for the snapshot, as it won't
have any transactions. zfsvfs->z_log is NULL for the snapshot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #18091
2026-01-06 10:58:56 -08:00
Wolfgang Hoschek
c77f17b750
Add snapshots_changed_nsecs dataset property
Add a read-only dataset property, snapshots_changed_nsecs, which 
exposes the nanosecond resolution version of snapshots_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Wolfgang Hoschek <wolfgang.hoschek@mac.com>
Closes #17998
Closes #18031
2026-01-06 09:36:20 -08:00
shuppy
6eef5cdc94
ZTS: add regression test for #17180
In #17180, we fixed an interesting bug that i believe i hit in one of my
pools, but as far as i can tell, there was no test for it.

this patch adds a regression test for #17180, minimised from my attempts
to reproduce the bug in a way that resembled the history of my pool.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: delan azabani <dazabani@igalia.com>
Closes #18109
2026-01-06 09:33:03 -08:00
Dimitry Andric
2dbd6af5e4
Rename several printf attributes declarations to __printf__
For kernel builds on FreeBSD, we redefine `__printf__` to
`__freebsd_kprintf__`, to support FreeBSD kernel printf(9) extensions
with clang.

In OpenZFS various printf related functions are declared with
`__attribute__((format(printf, X, Y)))`, so these won't work with the
above redefinition. With clang 21 and higher, this leads to errors
similar to:

    sys/contrib/openzfs/module/zfs/spa_misc.c:414:38: error: passing
    'printf' format string where 'freebsd_kprintf' format string is
    expected [-Werror,-Wformat]
      414 |         (void) vsnprintf(buf, sizeof (buf), fmt, adx);
          |                                             ^

Since attribute names can always be spelled with leading and trailing
double underscores, rename these instances.

Note that in the FreeBSD base system we usually use `__printflike` from
`<sys/cdefs.h>`, but that does not apply to OpenZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #18095
2026-01-05 14:15:22 -08:00
Andrew Walker
312bdab0f5
Add handling for STATX_CHANGE_COOKIE
This commit adds handling for the STATX_CHANGE_COOKIE so that
we can properly surface the ZFS znode sequence to NFS clients via
knfsd.

If knfsd does not have STATX_CHANGE_COOKIE in statx result then
it will synthesize the NFS change_info4 structure and related
change4id values algorithmically based on the ctime value of the
file. Since internally ZFS is using ktime_get_coarse_real_ts64()
for the timestamp calculation here it introduces the possiblity
that the change will not increment the change4id of directories
/ files causing a failure in the client to invalidate its attr
cache (among other things). See RFC 8881 Section 10.8 for
discussion of how clients may implement name and directory
caching.

Notable in this commit is that we are not initializing the
inode->i_version to the znode->z_seq number. The reason for this
is that we're intentionally not setting `SB_I_VERSION`. This
indicates that the filesystem manages its own i_version and
so it is not populated in the generic_fillattr.

The following compares tight loop of setattr over NFSv4
protocol while traching nfsd4_change_attribute.

Before change:
inode, change_attribute
4723, 7590032215978780890
4723, 7590032215978780890
4723, 7590032215978780890
4723, 7590032215982780865
4723, 7590032215982780865

After change:
inode, change_attribute
7602, 7590032992517123951
7602, 7590032992517123952
7602, 7590032992517123953
7602, 7590032992517123954
7602, 7590032992517123955

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18097
2026-01-05 14:06:28 -08:00
Rob Norris
a1319bf654
kmem: don't add __GFP_RECLAIMABLE for KM_VMEM allocations
vmalloc()'d memory is not movable/reclaimable, so __GFP_RECLAIMABLE is
not a valid flag, and since 6.19 the kernel warns if you use it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18107
2026-01-05 13:35:13 -08:00
Ivan Shapovalov
dbb3f247ed
cmd/zfs: clone: accept -u to not mount newly created datasets
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18080
2026-01-05 12:21:56 -05:00
Alexander Moch
b9b84445ea
CI: Add Alpine Linux 3.23 runner to the pipeline (#18087)
Add an Alpine Linux 3.23 runner to the CI chain to run OpenZFS builds
and tests against musl libc.

Currently, zfs_send_sparse is killed after 10 minutes on Alpine, causing
cascading EBUSY failures in the test suite. With zfs_send_sparse
disabled, the ZFS test suite reaches a pass rate of 94.62%.

This commit introduces the required Alpine-specific setup and a small
set of shell and cloud-init compatibility fixes that also apply to
existing Linux runners.

The Alpine runner is not enabled by default and is not executed for new
pull requests.

Sponsored-by: ERNW Research GmbH - https://ernw-research.de/

Signed-off-by: Alexander Moch <amoch@ernw.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-12-30 09:29:48 -08:00
Alexander Moch
e72f3054e3
cmd/ztest: avoid PATH_MAX stack allocation in ztest_get_zdb_bin() (#18085)
Calling realpath(path, buf) can trigger fortified header wrappers that
allocate a PATH_MAX-sized temporary buffer on the stack, exceeding the
4 KiB frame limit on some systems. Use the heap-allocating
realpath(path, NULL) form instead.

Sponsored-by: ERNW Research GmbH - https://ernw-research.de/

Signed-off-by: Alexander Moch <amoch@ernw.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-12-29 11:16:34 -08:00
Rob Norris
f041375b52 kmem: don't add __GFP_COMP for KM_VMEM allocations
It hasn't been necessary since Linux 3.13
(torvalds/linux@a57a49887e), and since 6.19 the kernel warns if you
use it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:54:34 -08:00
Rob Norris
f95e306266 kmem: don't pass __GFP_HIGHMEM to __vmalloc
Since Linux 4.12 (torvalds/linux@19809c2da2) __GFP_HIGHMEM has been
automatically added to calls to __vmalloc() internally, so we don't need
it anymore. This is good, because since 6.19 the kernel warns if you use
__GFP_HIGHMEM.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:54:11 -08:00
Rob Norris
3c8665cb5d Linux 6.19: replace i_state access with inode_state_read_once()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:53:32 -08:00
Ivan Shapovalov
3c4193333b zed.d, contrib: fix shellcheck errors in scripts
Not sure why this was not caught by CI; perhaps my shellcheck is new
enough to catch more things.

Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
e28d980d68 man: cosmetic: fix typos; use consistent spelling for "non-existing"
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
1e7280cece zfs_main: cosmetic: add missing flag to the comment for create
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
9880ac3080 zvol: cosmetic: fix up volthreading property short name
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Rob Norris
654e7628d6 u8_textprep: move into module/zfs
Now that it's built into the main zfs module in all cases, there's no
reason to put it in its own dir.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18071
2025-12-22 14:58:36 -08:00
Rob Norris
309006a0c6 libunicode: merge into libzpool
It's a single source file that is not used anywhere else, so there's no
reason to keep it separate.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18071
2025-12-22 14:58:20 -08:00
Tony Hutter
648a9a2938
CI: Test 2.4.x in qemu-test-repo-vm.sh, quick mode
The qemu-test-repo-vm.sh script tests installs ZFS from different
repos.  Have it test from the new 2.4.x repos as well.

Also add a checkbox to run in "lookup mode".  This just does a
quick lookup to see what version is installed in each repo.  It does
not do a test install and module load.  It only takes 3min to run vs
over an hour for the full version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18070
2025-12-19 19:57:19 -08:00
Rob Norris
0d44b58d7f
libshare: fold into libzfs and reorg headers a little
libzfs is the only user of libshare, and only internally, so there's no
particular reason to build it separately, nor to export its symbols. So,
pull it into libzfs proper, remove its "public" header, and hide its
symbols.

The bare minimum "public" API is just to count and enumerate the
supported share types. These are moved to libzfs.h with the other share
API.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18072
2025-12-19 19:52:33 -08:00
Alexander Motin
962e68865e
Use reduced precision for scan times
Scan time limits do not need precision beyond 1ms.  Switching
scn_sync_start_time and spa_sync_starttime from gethrtime() to
getlrtime() saves ~3% of CPU time during resilver scan stage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18061
2025-12-18 10:22:11 -08:00
Alexander Motin
a83bb15fcd
Reduce minimal scrub/resilver times
With higher throughput and lower latency of modern devices ZFS can
happily live with pretty short (fractions of a second) TXGs.  But
the two decade old multi-second minimal time limits can almost stop
payload writes by extending TXGs beyond dirty data limits of ARC
ability to amortize it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18060
2025-12-18 10:21:45 -08:00
Allan Jude
1d43387dd8
zdb: Add -O option for -r to specify object-id
"zdb -r -O pool/dataset obj-id destination" will copy
the file with object-id obj-id to the named destination;
without -O it'll still be interpreted as a pathname.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes #16307
2025-12-18 09:25:09 -08:00
Mark Maybee
7ff329ac2e
Fix rangelock test for growing block size
If the file already has more than one block, then the current
block size cannot change. But if the file block size is less
than the maximum block size supported by the file system, and
there are multiple blocks in the file, the current code will
almost always extend the rangelock to its maximum size.
This means that all writes become serialized and even reads
are slowed as they will more often contend with writes. This
commit adjusts the test so that we will not lock the entire
range if there is more than one block in the file already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Maybee <mark.maybee@perforce.com>
Closes #18046
Closes #18064
2025-12-18 09:23:38 -08:00
Alexander Motin
051a8c7494
Bypass snprintf() in quota checks if no quotas set
This improves synthetic 1 byte write speed by ~2.5%.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18063
2025-12-17 21:59:47 -05:00
Alexander Motin
0550abd4b8
RAIDZ: Remove some excessive logging
There were some per I/O logging into dbgmsg in RAIDZ code, that
increased CPU load and wiped useful content out of dbgmsg, for
example during routine disk replacement process.  I don't think
we need it to be that verbose.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18059
2025-12-17 14:00:01 -08:00
Turbo Fredriksson
0ba3403323 Change shellcheck and checkbashism triggers.
Newer versions of `shellcheck` and `checkbashism` finds more than
previous, so fix those.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
6c6a469bea Replace bashisms in ZFS shell function stub.
The `type` command is an optional feature in POSIX, so shouldn't be
used.

Instead, use `command -v`, which commit
  e865e7809e
did, but it missed this file.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
1842d6b3cb Make lines stay within 80 char limit.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
ead77e952e Add some comments to clarify the mounting of filesystems.
There's no real documenation (which should probably be written!),
so instead document the code the best we can on what's going and
with the mounting of file systems to make future updates easier.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
01cb64510d Standardise if/then/else and for/do/done lines.
More code standard changes, where if/then is on different lines.
To have it on the same, or on different lines, can be argued, but
we need to pick one, and try not to mix how to do things.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
29819a0177 Add missing initrd config variables.
The `ZFS_INITRD_ADDITIONAL_DATASETS` variable is used in the initrd
script to boot additional OS file systems besides the root file system.
But it wasn't included as an example in the config files.

The `ZFS_POOL_EXCEPTIONS` *was* included in the example defaults file,
but it was not exported, so not available in the initrd.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
4af8e28a59 Remove unnecessary sourcing of variables.
The file `/etc/default/zfs` is already sourced by the `/etc/zfs/zfs-functions`,
so no need to source it again.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
94975ff79b Fix issue with finding degraded pool(s).
When a pool is degraded, or needs special action, the `zpool import`
(without pool to import) line will report:
```
  pool: rpool
    id: 01234567890123456789
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
   [..]
```
If the import with the pool name fails, it is supposed to try importing
using the pool ID.

However, the script is also getting the `action` line (and probably `scrub:`
if/when that's available):
  pool; The pool can be imported using its name or numeric identifier.;config:;
which causes issues on consequent import attempts.

Cleanup the information by rewriting the `sed` command line.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
33dd57e1b4 Prefix all variables that are local with underscore.
This just to make them easier to see.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
d3b447de4e Shell script good practices changes.
It's considered good practice to:
1) Wrap the variable name in `{}`.
   As in `${variable}` instead of `$variable`.
2) Put variables in `"`.

Also some minor error message tuning.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
61ab032ae0 Fix potential global variable overwrite.
In a previous commit (e865e7809e), the
`local` keyword was removed in functions because of bashism.

Removing bashisms is correct, however this could cause variable overwrites,
since several functions use the same variable name.

So this commit make function variables unique in the (now) global name
space.

The problem from the original bug report (see #17963) could not be duplicated,
but it is still sane to make sure that variables stay unique.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Tony Hutter
32faecb0c2
CI: Use Ubuntu mirrors instead of azure (#18057)
Use the official Ubuntu apt mirrors instead of
azure.archive.ubuntu.com, since that mirror can be slow:

    https://github.com/actions/runner-images/issues/7048

This can help speed up the 'Setup QEMU' stage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18057
2025-12-16 09:15:18 -08:00
Alan Somers
a69a90b49e
Remove the obsolete FreeBSD 14.2-RELEASE from CI
Sponsored by:	ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #18013
2025-12-15 15:13:04 -08:00
Tony Hutter
842fb1c135
CI: Change timeout values
The 'Setup QEMU' CI step updates and installs all packages necessary to
startup QEMU.  Typically the step takes a little over a minute, but
we've seen cases where it can take legitimately take more than 45min
minutes.  Change the timeout to 60 minutes.

In addition, change the 'Install dependencies' timeout to 60min since
we've also seen timeouts there.

Lastly, remove all timeouts from the zfs-qemu-packages workflow.
We do this so that we can always build packages from a branch, even if
the time it takes to do a CI step changes over time.  It's ok to
eliminate the timeouts from the zfs-qemu-packages completely since that
workflow is only run manually.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18056
2025-12-15 14:58:01 -08:00
Alexander Motin
22e89aca88
DDT: Fix compressed entry buffer size
The first byte of the entry after compression is used for algorithm
and byte order flag.  We should decrement when calling compression/
decompression algorithm.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18055
2025-12-15 14:52:44 -08:00
Alexander Motin
3b1ff816bd
DDT: Add/use zap_lookup_length_uint64_by_dnode()
Unlike other ZAP consumers due to compression DDT does not know
how big entry it is reading from ZAP.  Due to this it called
zap_length_uint64_by_dnode() and zap_lookup_uint64_by_dnode(),
each of which does full ZAP entry lookup.

Introduction of the combined ZAP method dramatically reduces the
CPU overhead and locks contention at DBUF layer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18048
2025-12-15 14:38:34 -08:00
Alexander Motin
ff5414406f
DDT: Switch to using ZAP _by_dnode() interfaces
As was previously done for BRT, avoid holding/releasing DDT ZAP
dnodes for every access.  Instead hold the dnodes during all their
life time, never releasing.

While at this, add _by_dnode() interfaces for zap_length_uint64()
and zap_count(), actively used by DDT code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18047
2025-12-15 09:49:14 -08:00
Alexander Motin
46d6f1fe56
DDT: Move logs searches out of the lock
Postponing entry removal from the DDT log in case of hit till later
single-threaded sync stage allows to make ddl_tree stable during
multi-threaded ZIO processing stage.  It allows to drop the DDT lock
before the search instead of after, reducing the contention a lot.

Actually ddt_log_update_entry() was already handling the case of
entry present in the active log, so we only need to remove it from
flushing log, if the entry happen to be there.

My tests with parallel 4KB block writes show throughput increase
from 480MB/s (122K blocks/s) to 827MB/s (212K blocks/s), even
though still limited by the global DDT lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18044
2025-12-15 09:17:04 -08:00
Alexander Motin
3d76ba2737
Improve async destroy processing timing
Previous code effectively enforced that all async free ZIOs were
_issued_ within the TXG timeout.  But they could take forever to
complete, especially if the required metadata were not in ARC.

This patch introduces periodic waits every 2000 ZIOs, which should
give at least somewhat reasonable TXG timings even for single HDD
pools with empty ARC.  And makes them complete within half of the
TXG timeout, since we might still need time to sync DDT and BRT.

While there, change zfs_max_async_dedup_frees semantics to include
also clone and gang blocks, which are similar.  Bump the default
value from set long ago to be more forgiving to block cloning
(still not having logs and benefiting from large TXGs), now that
we have better working time limits.  The limit now is a possible
amount of dirty data produced by BRT updates.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18043
2025-12-11 18:46:08 -08:00
Alexander Motin
f72fd378c8 Defer async destroys on pool import
We've observed a number of cases when pool import stuck for many
minutes due to large async destroy trying to load DDT or BRT from
HDD pool.  While proper destroy dosage is a separate problem,
lets give import process a chance to complete before that at all.
It may be not enough if there is a lot of ZIL to replay, but that
is harder to cover, since those are in separate syscalls.

Code investigation shown that we already have this mechanism used
for scrub/resilver, so this patch converts SCAN_IMPORT_WAIT_TXGS
into a tunable and applies it to async destroys also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18033
2025-12-11 18:44:46 -08:00
Alexander Motin
d976587a35 ZTS: Fix zvol_misc_fua SLOG writes check
Instead of comparing number of SLOG writes to number of normal
writes we should just make sure SLOG got the required number of
writes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18033
2025-12-11 18:43:59 -08:00
Alexander Motin
20f09eae42
ZIO: ZIO_STAGE_DDT_WRITE is a blocking stage
ddt_lookup() in zio_ddt_write() might require synchronous DDT ZAP
read.  Running it from interrupt taskq might lead to deadlock.
Inclusion of ZIO_STAGE_DDT_WRITE into ZIO_BLOCKING_STAGES should
hopefully fix that, even though I am not sure how I got there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17981
2025-12-10 19:51:53 -05:00
Alexander Motin
d393166c54
ARC: Increase parallel eviction batching
Before parallel eviction implementation zfs_arc_evict_batch_limit
caused loop exits after evicting 10 headers.  The cost of it is not
big and well motivated.  Now though taskq task exit after the same
10 headers is much more expensive.  To cover the context switch
overhead of taskq introduce another level of batching, controlled
by zfs_arc_evict_batches_limit tunable, used only for parallel
eviction.

My tests including 36 parallel reads with 4KB recordsize that shown
1.4GB/s (~460K blocks/s) before with heavy arc_evict_lock contention,
now show 6.5GB/s (~1.6M blocks/s) without arc_evict_lock contention.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17970
2025-12-10 13:03:01 -08:00
Rob Norris
9fdb854109
Linux: work around use of GPL-only symbol kasan_flag_enabled
We may not be able to avoid our code referencing the symbol, but we can
ensure that a symbol of that name is available to the linker during
build, and so not require linking the GPL-exported version.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18009
Closes #18040
2025-12-10 10:04:57 -08:00
Chunwei Chen
0c194352b5
Fix ddtprune causing space leak
In zio_ddt_free, if a pruned dde is still in ddt, it would do nothing
and cause space leak.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17982
Closes #17983
2025-12-10 10:02:14 -08:00
Alexander Moch
ff47dd35e2
Ensure 64-bit off_t is used in user space instead of loff_t
Use 64-bit POSIX off_t in user space instead of the Linux kernel type
loff_t. This is enforced at configure time via AC_SYS_LARGEFILE and
AC_CHECK_SIZEOF([off_t]). loff_t remains in shared headers where they
mirror Linux VFS interfaces, and on FreeBSD we typedef loff_t to off_t
in those headers since libc does not provide it.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18020
2025-12-10 09:45:39 -08:00
Ameer Hamza
48842c0a41
ZTS: Add test for snapshot automount race
Add snapshot_019_pos to verify parallel snapshot automount operations
don't cause AVL tree panic. Regression test for commit 4ce030e025.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18035
2025-12-10 09:16:45 -08:00
Brian Behlendorf
a3238a745e
Linux 6.18 compat: META (#18039)
Update the META file to reflect compatibility with the 6.18
kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-12-10 09:04:24 -08:00
Tony Hutter
5d40e0ed70
CI: Fix Ubuntu 22.01 rsend failures
For whatever reason, the single `log_note` in the `directory_diff`
function causes the function to stop executing on Ubuntu 22.  This
causes most of the rsend tests to fail.  Remove the line since it's only
informational.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18032
2025-12-09 20:04:51 -08:00
Alex
104da9657a
Fix a declaration position of the nth_page.
Compilation time bug introduced by 87df5e4 commit.
Fix for the compilation error(Linux kernel 6.18.0):
"zfs/module/os/linux/zfs/abd_os.c:920:32: error: implicit declaration
of function ‘nth_page’; did you mean ‘pte_page’?
[-Werror=implicit-function-declaration]".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: agiUnderground <alex.dev.cv@gmail.com>
Closes #18034
2025-12-09 15:45:51 -08:00
Alexander Motin
a62c62120e
ARC: Pre-convert zfs_arc_min_prefetch_ms
There is no need to do MSEC_TO_TICK() for each evicted ARC header.
We can do it when tunables are set, since we already have separate
internal variables for those.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17965
2025-12-09 12:07:10 -08:00
Alexander Motin
09492e0f21
Reduce dataset buffers re-dirtying
For each block written or freed ZFS dirties ds_dbuf of the dataset.
While dbuf_dirty() has a fast path for already dirty dbufs, it still
require taking the lock and doing some things visible in profiler.

Investigation shown ds_dbuf dirtying by dsl_dataset_block_born()
and some of dsl_dataset_block_kill() are just not needed, since
by the time they are called in sync context the ds_dbuf is already
dirtied by dsl_dataset_sync().

Tests show this reducing large file deletion time by ~3% by saving
CPU time of single-threaded part of the sync thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18028
2025-12-09 09:18:09 -08:00
Brian Behlendorf
574d5f3313 CI: exclude signed-off-by/reviewed-by from 72 char limit
Allow an author or reviewer's name and email address to exceed
the 72 character limit enforced by the commitcheck target.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18030
2025-12-09 09:12:32 -08:00
bspengler-oss
060bc8b70d Fix HIGHMEM/kmap API violation in zfs_uiomove_bvec_impl()
Fix another instance where ZFS assumes multiple pages can be
mapped at once via zfs_kmap_local(), resulting in crashes and
potential memory corruption on HIGHMEM-enabled (typically 32-bit)
systems.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:12:24 -08:00
bspengler-oss
2cab0554c0 Preserve LIFO ordering of kmap ops in abd_raidz_gen_iterate()
ZFS typically preserves proper LIFO ordering regarding map/unmap
operations that wrap the Linux kernel's kmap interfaces that
require such ordering, but one instance in abd_raidz_gen_iterate()
did not.

Similar issues have been fixed in the Linux kernel in the past,
see for instance CVE-2025-39899 for userfaultfd.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:12:16 -08:00
bspengler-oss
87df5e4872 Fix interaction of abd_iter_map()/abd_iter_unmap() with HIGHMEM
HIGHMEM kmap interfaces operate on only a single page at a time
yet ZFS hadn't accounted for this, resulting in crashes and
potential memory corruption on HIGHMEM (typically 32-bit) systems.
This was caught by PaX's KERNSEAL feature as it makes use of
HIGHMEM functionality on x64.

On typical 64-bit systems, this issue wouldn't have been observed,
as the map interfaces simply fall back to returning an address in
lowmem where the contiguous pages can be accessed directly.

Joint work with the PaX Team, tested by Mark van Dijk

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:10:32 -08:00
Ameer Hamza
4ce030e025
Fix snapshot automount race causing duplicate mounts and AVL tree panic
Multiple threads racing to automount the same snapshot can both spawn
mount helper processes that successfully complete, causing both parent
threads to attempt AVL tree registration and triggering a VERIFY()
panic in avl_add(). This occurs because the fsconfig/fsmount API lacks
the serialization provided by traditional mount() via lock_mount().

The fix adds a per-entry mutex (se_mtx) to zfs_snapentry_t that
serializes mount and unmount operations on the same snapshot. The first
mount thread creates a pending entry with se_spa=NULL and holds se_mtx
during the helper execution. Concurrent mounts find the pending entry
and return success without spawning duplicate helpers. Unmount waits on
se_mtx if a mount is pending, ensuring proper serialization. This allows
different snapshots to mount in parallel while preventing the AVL panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17943
2025-12-08 13:49:11 -08:00
Mark Johnston
86b064469d
FreeBSD: Fix a potential null dereference in zfs_freebsd_fsync()
In general it's possible for a vnode to not have an associated VM
object.  This happens in particular with named pipes, which have
some distinct VOPs, defined in zfs_fifoops.  Thus, this chunk of
zfs_freebsd_fsync() needs to check for the FIFO case, like other
vm_object_mightbedirty() callers do.

(Note that vn_flush_cached_data() calls are predicated on
zn_has_cached_data() returning true, and it checks for a NULL v_object
pointer already.)

Fixes: ef4058fcdc
Reported-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18015
2025-12-08 13:46:30 -08:00
Alan Somers
89f729dcca
During CI, use nproc instead of sysctl -n hw.ncpu
The latter may give the wrong result if cpusets are in use.

Sponsored by:	ConnectWise
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #18012
2025-12-04 16:57:15 -08:00
Brian Behlendorf
dfb0875200
ZTS: Add slow_vdev_degraded_sit_out retry
While not common the draid3 vdev type has been observed to
not always sit out a vdev when run in the CI.  To prevent
continued false positives allow the test to be retried up
to three times before considering it a failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18003
2025-12-04 09:10:22 -08:00
Alexander Moch
05e2747bf2
Provide loff_t via <fcntl.h> on musl-based Linux systems
Musl exposes loff_t only as a macro in <fcntl.h> when _GNU_SOURCE is
defined. Including <fcntl.h> ensures the type is available, and a
fallback typedef is provided when no macro is defined. This fixes build
failures on musl systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18002
2025-12-02 12:14:09 -08:00
Alexander Motin
ffaea08319
FreeBSD: Remove HAVE_INLINE_FLSL use
These macros are deprecated in FreeBSD kernel for several years,
and unneeded for much longer.  Instead, similar to Linux, let
kernel let compiler do the right things.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18004
2025-12-02 12:13:16 -08:00
Ameer Hamza
88d012a1d6
Fix snapshot automount expiry cancellation deadlock
A deadlock occurs when snapshot expiry tasks are cancelled while holding
locks. The snapshot expiry task (snapentry_expire) spawns an umount
process and waits for it to complete. Concurrently, ARC memory pressure
triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the
expiry task while holding locks. The umount process spawned by the
expiry task blocks trying to acquire locks held by arc_prune, which is
blocked waiting for the expiry task to complete. This creates a circular
dependency: expiry task waits for umount, umount waits for arc_prune,
arc_prune waits for expiry task.

Fix by adding non-blocking cancellation support to taskq_cancel_id().
The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to
reschedule the unmount, which needs to cancel any existing expiry task.
It now uses non-blocking cancellation to avoid waiting while holding
locks, breaking the deadlock by returning immediately when the task is
already running.

The per-entry se_taskqid_lock has been removed, with all taskqid
operations now protected by the global zfs_snapshot_lock held as
WRITER. Additionally, an se_in_umount flag prevents recursive waits when
zfsctl_destroy() is called during unmount. The taskqid is now only
cleared by the caller on successful cancellation; running tasks clear
their own taskqid upon completion.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17941
2025-12-01 14:43:42 -08:00
Alexander Motin
4754ac8529 raidz_test: Restore rand_data protection
It feels dirty to modify protection of a memory allocated via libc,
but at least we should try to restore it before freeing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:52 -08:00
Alexander Motin
338d432b42 raidz_test: Fix ZIO ABDs initialization
- When filling ABDs of several segments, consider offset.
 - "Corrupt" ABDs with actually different data to fail something.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:48 -08:00
Alexander Motin
95b2eb50f2 raidz_test: Set io_offset reasonably
- io_offset of 1 makes no sense.  Set default to 0.
 - Initialize io_offset in all cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:43 -08:00
Alexander Motin
3647fa3902 ZFS: Enable more logs for raidz_001_neg
The output is not so big here, so lets collect something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:19 -08:00
Alexander Motin
928eccc5bc
DDT: Reduce global DDT lock scope during writes
Before this change DDT lock was taken 4 times per written block,
and as effectively a pool-wide lock it can be highly congested.
This change introduces a new per-entry dde_io_lock, protecting some
fields during I/O ready and done stages, so that we don't need the
global lock there.

According to my write tests on 64-thread system with 4KB blocks this
significantly reduce the global lock contention, reducing CPU usage
from 100% to expected ~80%, and increasing write throughput by 10%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17960
2025-12-01 10:44:10 -08:00
Alexander Motin
a5b665df39
DDT: Switch to using wmsums for lookup stats
ddt_lookup() is a very busy code under a highly congested global
lock.  Anything we can save here is very important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17980
2025-12-01 10:36:31 -08:00
Alexander Motin
48f33c1ef2
DDT: Make children writes inherit allocator
Even though unlike gang children it is not so critical for dedup
children to inherit parent's allocator, there is still no reason
for them to have allocation policy different from normal writes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17961
2025-12-01 10:30:27 -08:00
Tony Hutter
9a453b2050
CI: zfs-test-packages: Add in new repos
Test install from our new repos: zfs-latest, zfs-legacy,
zfs-2.3, zfs-2.2, from the zfs-test-packages workflow.
This on-demand workflow is use to verify that the zfs RPMs
in the repos are correct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17956
2025-12-01 10:24:33 -08:00
Rob Norris
bfd137d92b config/kmap_atomic: initialise test data
6.18 changes kmap_atomic() to take a const pointer. This is no problem
for the places we use it, but Clang fails the test due to a warning
about being unable to guarantee that uninitialised data will definitely
not change. Easily solved by forcibly initialising it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:46 -08:00
Rob Norris
b7e00c7397 zvol_id: make array length properly known at compile time
Using strlen() in an static array declaration is a GCC extension. Clang
calls it "gnu-folding-constant" and warns about it, which breaks the
build. If it were widespread we could just turn off the warning, but
since there's only one case, lets just change the array to an explicit
size.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:40 -08:00
Rob Norris
c631f5e6c2 Linux: bump -std to gnu11
Linux switched from -std=gnu89 to -std=gnu11 in 5.18
(torvalds/linux@e8c07082a8). We've always overridden that with gnu99
because we use some newer features.

More recent kernels are using C11 features in headers that we include.
GCC generally doesn't seem to care, but more recent versions of Clang
seem to be enforcing our gnu99 override more strictly, which breaks the
build in some configurations.

Just bumping our "override" to match the kernel seems to be the easiest
workaround. It's an effective no-op since 5.18, while still allowing us
to build on older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:11 -08:00
Alexx Saver
39303febac
chksum: run 256K benchmark on demand, preserve chksum_stat_data
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexx Saver <lzsaver.eth@ethermail.io>
Co-authored-by: Adam Moss <c@yotes.com>
Closes #17945
Closes #17946
2025-12-01 10:14:52 -08:00
Alexander Motin
7f7d4934cb
FreeBSD: Fix uninitialized variable error
On FreeBSD errno is defined as (* __error()), which means compiler
can't say whether two consecutive reads will return the same.
And without this knowledge the reported error is formally right.

Caching of the errno in local variable fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17975
2025-11-25 05:16:35 -05:00
Rob Norris
e37937f42d
ztest: fix broken random call
Bad copypasta in 4d451bae8a, leading to random stuff being blasted all
over stack, destroying the program.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17957
2025-11-24 12:43:15 -05:00
Shreshth3
1f3444f2bb
zpool: fix special vdev -v -o conflict
Right now, running `zpool list` with -v and -o passed
does not work properly for special vdevs. This commit
fixes that problem.

See the discussion on #17839.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17932
2025-11-19 08:30:20 -08:00
Ameer Hamza
36e4f18883
Fix taskq NULL pointer dereference on timer race
Remove unsafe timer_pending() check in taskq_cancel_id() that created a
race where:
- Timer expires and timer_pending() returns FALSE
- task_done() frees task with tqent_func = NULL
- Timer callback executes and queues freed task
- Worker thread crashes executing NULL function

Always call timer_delete_sync() unconditionally to ensure timer callback
completes before task is freed.

Reliably reproducible by injecting mdelay(10) after setting CANCEL flag
to widen the race window, combined with frequent task cancellations
(e.g., snapshot automount expiry).

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17942
2025-11-19 08:21:10 -08:00
Rob Norris
71609a9264 zfs: replace tpool with taskq
They're basically the same thing; lets just carry one.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17948
2025-11-19 08:16:51 -08:00
Rob Norris
be7d8eaf54 taskq: initialize tsd on first use
Doing it this way means that callers don't have to call
system_taskq_init() and also get the system and system_delay taskqs that
they possibly don't even want.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17948
2025-11-19 08:16:11 -08:00
Brian Behlendorf
06c73cffab
CI: Add smatch static analysis workflow
Smatch is an actively maintained kernel-aware static analyzer
for C with a low false positive rate.  Since the code checker
can be run relatively quickly against the entire OpenZFS code
base (15 min) it makes sense to add it as a GitHub Actions
workflow.  Today smatch reports a significant numbers warnings
so the workflow is configured to always pass as long as the
analysis was run.  The results are available for reference.
Long term it would ideal to resolve all of the errors/warnings
at which point the workflow can be updated to fail when new
problems are detected.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17935
2025-11-17 12:33:40 -08:00
Rob Norris
74b50a71a0 libuutil: remove packaging
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:23:17 -08:00
Rob Norris
adb316f411 libuutil: remove the whole thing
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:23:05 -08:00
Rob Norris
871fa61d26 zfs: replace uu_list with sys/list
Lets just use the list implementation we use everywhere else.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:22:48 -08:00
Rob Norris
b593748287 zfs: replace uu_avl with sys/avl
Lets just use the AVL implementation we use everywhere else.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:21:26 -08:00
Toomas Soome
e63d026b91
cmd/zpool cstyle issues
add missing headers.
usage() is no-return, so anything after call to it is unreachable code.
use (void) cast where we do ignore return value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17885
2025-11-14 15:58:50 -08:00
Brian Behlendorf
6015edb374 lib: update ABI meta following libspl changes
In theory they should not have resulted in a change. In practice, the
way visibility is set up currently means that many of our convenience
libraries will "leak through" into the available symbols in our public
libraries.

In this commit, we're seeing all the new symbols in libspl through
libuutil, libzfs and libzfs_core. Importantly, none have been removed,
so consumers of these libraries will not notice.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17861
2025-11-12 10:25:14 -08:00
Rob Norris
23d17f3587 libspl/random: add switch to force pseudo-random numbers for all calls
ztest wants to force all kernel random calls to use the pseudo-random
generator (/dev/urandom), to avoid depleting the system entropy pool
just for testing.

Up until the previous commit, it did this by switching the path that the
libzpool (now libspl) random API would use to get random data from; that
is, it took advantage of an implementation detail.

Now that that hole is closed to it, we need another method. This commit
introduces that; a simple API call to enable/disable "force pseudo"
mode.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:30 -08:00
Rob Norris
4d451bae8a libspl: hide global data objects
Currently libspl is a static archive that is linked into multiple shared
objects, which then re-export its symbols. We intend to fix this soon.

For the moment though, most programs shipped with OpenZFS depend on two
or more of these shared objects, and see the same symbols twice. For
functions this is not a problem, as they do not have any mutable state
and so the linker can simply select the first one and use that for all.

For global data objects however, each shared object will have direct
(non-relocatable) references to its own instance of the symbol, such
that changes on one will not necessarily be seen by the other. While
this shouldn't be a problem in practice as these reexported interfaces
are not supposed to be used, they are technically undefined behaviour in
C (C17 6.9.2) and are reported by ASAN as a violation of C++'s "One
Definition Rule".

To fix this, we hide these globals inside their compilation units, and
add access functions and macros as appropriate to preserve the existing
API (though not ABI).

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:22 -08:00
Rob Norris
e282e98e79 libzpool: add zfs_impl.c, remove from libicp
This isn't used by libicp directly, but is by some clients, and relies
on headers specific to the zfs module, which makes using it difficult
otherwise.

Also switch the checksum tests over to use libzpool, so they can get
access to it. That's not exactly what we want in the long term, but the
icp and zfs modules have a complicated relationship so this will do for
now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:15 -08:00
Brian Behlendorf
677d6ed730 zfs_context: remove duplicate includes
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:03 -08:00
Brian Behlendorf
a49158c064 icp: remove global icp includes
Only include the required icp headers.  There's no need to
include sys/zfs_context.h and pull in all of the zfs headers.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:51 -08:00
Brian Behlendorf
913bdbf4d1 libzpool: remove global libzpool includes
Only include the zfs headers where they're currently required to
compile.  Unfortunately, including zfs_ioctl.h in user space pulls
in a bunch of internal zfs headers as a side effect.  We'll need
to move these structures in to a new shared header to avoid this.
We should not need to add the LIBZPOOL_CPPFLAGS when building the
zed, zinject, zpool, libzfs, ior libzfs_core.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:15 -08:00
Rob Norris
99d7453b43 libzpool: add BE_POSIX_VENDOR for userspace bootenv
This is mostly a placeholder; it's not actually clear if a boot
environment makes any sense for userspace. Still, "posix" is the likely
future name of libzpool as a port, and this define is mandatory, so lets
roll with it for now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:07 -08:00
Brian Behlendorf
801d9b4f96 debug: move all of the debug bits out of the spl
Pull all of the internal debug infrastructure up in to the zfs
code to clean up the layering.  Remove all the dodgy usage of
SET_ERROR and DTRACE_PROBE from the spl.  Luckily it was
lightly used in the spl layer so we're not losing much.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:51 -08:00
Rob Norris
eceb5b32e9 libspl: move loff_t declaration from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:46 -08:00
Rob Norris
5305d0f8b9 zfs_context: move empty __init/__exit macros to sys/debug.h
These are kind-of compiler attribute placeholders, so go here with the
others for now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:42 -08:00
Rob Norris
292438295d libspl: move compiler attribute macros from zfs_context.h
sys/debug.h is not really the right place for them, but we already have
some there for libspl, so it is at least convenient.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:35 -08:00
Rob Norris
a43edeefaf libzutil: move NN_NUMBUF_SZ from zfs_context.h nearer to nicenum()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:29 -08:00
Rob Norris
b9d2e7782f libspl: common sysmacros.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:25 -08:00
Rob Norris
248c7ed0d2 libspl: move DTRACE_PROBE macros from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:20 -08:00
Rob Norris
03b2e5c40c libspl: move remaining ddi_* prototypes from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:12 -08:00
Rob Norris
559597b66c zfs_context: remove misc unused
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:08 -08:00
Rob Norris
ee0e86cfb5 libzpool: remove unused userspace ioctl policy functions
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:04 -08:00
Rob Norris
b5af61b569 libspl: move zone definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:00 -08:00
Rob Norris
70a1fadaf2 libspl: move SID implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:56 -08:00
Rob Norris
faa295b9a6 libspl: move SID definitions from zfs_context.h; remove kernel gate
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:48 -08:00
Rob Norris
2b4a0dd6c0 libspl: move callb stubs from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:44 -08:00
Rob Norris
9d609098cd libspl: move random impl from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:39 -08:00
Rob Norris
1911501c7d libspl: move random definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:32 -08:00
Rob Norris
55fb30ebe6 zfs_context: move vn_dumpdir to libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:28 -08:00
Rob Norris
daff6b7e35 libspl: move utsname() etc to sys/misc.h; initialise in libspl_init()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:21 -08:00
Rob Norris
6cf6f091cf libspl: move physmem to sys/systm.h; initialise at libspl_init()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:17 -08:00
Rob Norris
d02ea5170a libspl: init/fini
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:08 -08:00
Rob Norris
4e3b88927c libzpool: separate driver-side include
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:04 -08:00
Rob Norris
0c6be03fd7 zfs_context: remove duplicated access control stuff; remove kernel gate
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:52 -08:00
Rob Norris
335f46b219 libspl: move ptob() from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:46 -08:00
Rob Norris
bca4ca7949 libspl: add include guards for sys/string.h
The extra inclusion via xvattr.h appears to upset the linter in CI. I'm
not entirely sure what its complaint is, but removing sys/string.h
entirely is not quite possible yet, and include guards are rarely a bad
idea, so this will do.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:41 -08:00
Rob Norris
db1c58095e libspl: move vattr and xvattr definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:24 -08:00
Rob Norris
cf1044a15f libspl: move kmem implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:21 -08:00
Rob Norris
8b5d919d4e libspl: move kmem definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:17 -08:00
Rob Norris
ee89fefe4d libspl: move procfs_list implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:13 -08:00
Rob Norris
8700fc669b libspl: move procfs_list definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:10 -08:00
Rob Norris
586eba95de libspl: move kstat implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:06 -08:00
Rob Norris
ce7a894af1 libspl: move kstat definitions from zfs_context.h, slim down to basics
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:03 -08:00
Rob Norris
8c022088a7 libspl: move tsd definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:59 -08:00
Rob Norris
c0984c936f libspl: move cred implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:55 -08:00
Rob Norris
52cf8eac42 libspl: move cred definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:51 -08:00
Rob Norris
3823492ca1 libspl: move taskq implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:47 -08:00
Rob Norris
a2e10ebfd3 libspl: move taskq definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:43 -08:00
Rob Norris
0c60920d09 libspl: move thread implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:40 -08:00
Rob Norris
21ae59a53b libspl: move thread definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:14 -08:00
Rob Norris
7234d69748 libspl: move cmn_err definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:09 -08:00
Rob Norris
e7a856d954 libspl: move condvar implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:03 -08:00
Rob Norris
a9f3733376 libspl: move condvar definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:59 -08:00
Rob Norris
40ddba8256 libspl: move rwlock implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:55 -08:00
Rob Norris
c7eb0a7633 libspl: move rwlock definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:50 -08:00
Rob Norris
3e37ea85af libspl: move mutex implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:44 -08:00
Rob Norris
cc119fbb48 libspl: move mutex headers from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:37 -08:00
Rob Norris
ba2ff4b42c libspl: move time definitions from zfs_context_os.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:31 -08:00
Rob Norris
37d5df62e0 libzpool: move ZFS-specific headers from libspl
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:27 -08:00
Rob Norris
f49b93e2c7 libzpool: move zfs_context_os.h from libspl
Keeping the spl/zfs module split, libzpool is the zfs module for
userspace. Headers and functions specific to it belong there.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:18 -08:00
Rob Norris
5588f189a7 libspl: single zfs_context_os.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:53:34 -08:00
Brian Behlendorf
cb6b249f8c Update all ABI files
Refresh all ABI files using the CI generated files to reflect
the library interfaces to be published for the 2.4 release.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:39:00 -08:00
Brian Behlendorf
64c4b6d17a libspl: hide zfs_tunable_* symbols
The zfs_tunable_* functions are a public interface which are
part of the internal libspl convenience library.  They should
be hidden to prevent an unnecessary ABI change in installed
libraries which link against libspl (e.g. libzfs_core, libuutil).

We do already leak long standing libspl symbols.  This commit is
solely intended to prevent leaking these new ones until this is
properly sorted out.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:54 -08:00
Brian Behlendorf
e4fe41a79f Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since the
last major release.  Bump the SONAME for the upcoming 2.4 release
branch to libzfs7 and libzpool7.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:48 -08:00
Brian Behlendorf
80cfdbb19f Bump SONAME on libnvpair
The nvlist_snprintf() function was added to the ABI of libnvpair.
No other symbols were modified or removed.  Bump the library-info
SONAME current and age args to reflect this is a minor library
version update.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:20 -08:00
Adi-Goll
24aaf3a3f9
Reduce timeout to zero when running inside a container
Detect container environments and set timeout to zero unless
ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten
second delay after running zfs/zpool commands in a container where
/dev/zfs is unavailable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #15165
Closes #17922
2025-11-11 15:01:37 -08:00
Mariusz Zaborski
02fdd26e51
Add knob to disable slow io notifications
Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes 17477
2025-11-11 10:42:17 -08:00
Alexander Motin
b4f073b5a6
Add BRT support to zpool prefetch command
Implement BRT (Block Reference Table) prefetch functionality similar
to existing DDT prefetch.  This allows preloading BRT metadata into
ARC to improve performance for block cloning operations and frees
of earlier cloned blocks.

Make -t parameter optional.  When omitted, prefetch all supported
metadata types (both DDT and BRT now).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17890
2025-11-10 16:16:22 -08:00
Alexander Motin
cc5cae5475
BRT: Increase block size from 4KB to 8KB
According to my observations, BRT ZAPs are typically compressible
3:1 for data and 2:1 for indirects.  With ashift=12, typical these
days, it means increasing the block sizes to 8KB we may get most
of possible compression, reducing on-disk and in-ARC BRT footprint
in half by the cost of some compression/decompression overhead,
but without real write inflation, only some dirty data increase.

Increase to 32KB similar to DDT could further increase compression
and storage efficiency, but at the cost of write inflation and
much bigger dirty data increase, which we can not properly control
now.  So lets leave this for a time when BRT log gets implemented.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17916
2025-11-10 15:44:46 -08:00
Alexander Motin
72b2a9571a
ZAP: Remove dmu_object_info_from_dnode() call
dmu_object_info_from_dnode() takes two locks and copies plenty of
data that we don't need in zap_lockdir_impl().  Just read dn_type
directly in this hot path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17921
2025-11-10 14:26:15 -08:00
Rob Norris
6e12f0bd77
spa_misc: add an API for spa_namespace_lock
This is useful as debugging support, as it lets namespace lock
operations be traced directly. It will also be useful for future work to
reduce the use of spa_namespace_lock, traditionally a source of
difficult deadlocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17906
2025-11-10 14:23:39 -08:00
Alexander Motin
8aaed7dc42
BRT: Fix ranges to blocks conversion math
BRT_RANGESIZE_TO_NBLOCKS() takes number of ranges as its argument.
To get number of blocks we should multiply it by the entry size,
not divide by it, as it was due to missing parentheses.

Before #17875 this could cause small memory corruptions for vdevs
bigger than 64TB, but the change made the bug more noticeable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17886
Closes #17915
2025-11-10 13:58:39 -08:00
Adi-Goll
57b1b99d31
Update man page description of zpool rewind
Update description of zpool import --rewind-to-checkpoint in
man/man7/zpoolconcepts.7 to explain that rewinding automatically
discards a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #12646
Closes #17918
2025-11-10 10:13:13 -08:00
Alexander Motin
baefe098ee
ZIO: Set minimum number of free issue threads to 32
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads.  I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at #11966 on small
systems, but lets make it at least 32.

While here, make free taskqs configurable, similar to read and
write ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17903
2025-11-08 14:41:53 -05:00
rmacklem
e26b9fc871
FreeBSD: Add support for _PC_CASE_INSENSITIVE
FreeBSD now has a pathconf name called _PC_CASE_INSENSITIVE
used to check if a file system performs case insensitive
name lookups.

This patch adds support for this name.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17908
2025-11-08 13:20:23 -05:00
Brian Behlendorf
962474d1a2
zstd: disable intrinsics
Disable the aarch64 NEON SIMD intrinsics for kernel builds.  Safely
using them in the kernel context requires saving/restoring the FPU
registers which is not currently done.

Additionally, remove the aarch64 optimized PREFETCH_L1 and PREFETCH_L2
instruction.  Rely on the more portable compiler built ins.

This lets us remove the problematic workaround in the aarch64_compat.h
header which undefines the __aarch64__ macro.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17904
Closes #17852
2025-11-07 10:01:12 -08:00
Adi-Goll
54876ee85e
Fix typo in vdev_raidz.c
Change the spelling of "begining" on line 4875 to
"beginning".

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #17905
2025-11-07 09:55:03 -08:00
Toomas Soome
fe553581f0
libzfs: ignoring unreachable code
We have infinite loop and on certain condition, we exit this loop
and thread with pthread_exit(). But also after this loop,
we have a code to perform pthread_cleanup_pop() and return from the
thread.

The  problem is that modern compilers are able to recognize that we
actually never get to the statements after loop and therefore
it is dead code there.

I think, instead of pthread_exit(), it is better to break out of loop
and let the last statements to work as intended. This is because
we do need to keep pthread_cleanup_pop() anyhow. Of course,
it is matter of taste if we want to use return or pthread_exit as very
last statement in this function.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17900
2025-11-07 09:27:18 -05:00
Rob Norris
336c95372d
man: describe zfs-rewrite method and properties
We've heard anecdotes that suggest some
confusion/surprise/disappointment that a changed recordsize is not
applied during rewrite. Until such time as we actually can do that, we
can at least explicitly mention it at something that doesn't work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17898
2025-11-07 09:01:59 -05:00
Alexander Ziaee
242fdb58e5
zfs-jail.8: Add introductory sentence, refactor
Add an introductory sentance explaining why the reader may want to use
this command, and establishing the requirement that the jail must be
running. Move other requirements from the description of the subcommands
to follow this for flow and structure. Move the caveat that this is for
FreeBSD down to a cannonical CAVEATS section, and crossreference Linux's
equivelant functionality. Mention that this utility can not be used to
delegate the root directory of the jail to that section also.

Reported by: Jan Brankamp <crest@rlwinm.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #17883
2025-11-06 13:53:24 -08:00
Tony Hutter
f93506d1df
Linux 6.17 compat: Fix broken projectquota on 6.17
We need to specifically use the FX_XFLAG_* macros in zpl_ioctl_*attr()
codepaths, and the FS_*_FL macros in the zpl_ioctl_*flags() codepaths.
The earlier code just assumes the FS_*_FL macros for both codepaths.
The 6.17 kernel add a bitmask check in copy_fsxattr_from_user() that
exposed this error via failing 'projectquota' ZTS tests.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17884
Closes #17869
2025-11-05 16:22:03 -08:00
Paul Dagnelie
8c225ff1b4
Fix gang write late_arrival bug
When a write comes in via dmu_sync_late_arrival, its txg is equal to the
open TXG. If that write gangs, and we have not yet activated the new
gang header feature, and the gang header we pick can store a larger gang
header, we will try to schedule the upgrade for the open TXG + 1. In
debug mode, this causes an assertion to trip. This PR sets the TXG for
activating the feature to be the larger of either the current open TXG
or the syncing TXG + 1.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17824
2025-11-05 11:40:22 -08:00
Tino Reichardt
7e7c360256
CI: Update FreeBSD versions and ci-type handling
Update FreeBSD versions:
- add FreeBSD 15.0-STABLE
- add FreeBSD 16.0-CURRENT

So we use the latest versions of each line now:
  - Freebsd 14.3 (RELEASE)
  - FreeBSD 15.0 (STABLE)
  - FreeBSD 16.0 (CURRENT)

In commits - you may specify which type of CI should run:
- ZFS-CI-Type: quick
- ZFS-CI-Type: linux
- ZFS-CI-Type: freebsd
- ZFS-CI-Type: full

Reviewed-by: Alexx Saver <lzsaver@users.noreply.github.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17896
2025-11-05 09:56:17 -08:00
Toomas Soome
5d33801802
get_key_material_https: label 'kfdok' defined but not used
The label 'kfdok' is only used with O_TMPFILE, we need to use
the same #ifdef around this label.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17894
2025-11-04 13:13:07 -08:00
Robert Evans
d0294aa758
Update dnode_next_offset_level to accept blkid instead of offset
Currently this function uses L0 offsets which:
1. is hard to read since it maps offsets to blkid and back each call
2. necessitates dnode_next_block to handle edge cases at limits
3. makes it hard to tell if the traversal can loop infinitely

Instead, update this and dnode_next_offset to work in (blkid, index).
This way the blkid manipulations are clear, and it's also clear that
the traversal always terminates since blkid goes one direction.

I've also considered updating dnode_next_offset to operate on blkid.
Callers use both patterns, so maybe another PR can split the cases?

While here tidy up dnode_next_offset_level comments.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #17792
2025-11-04 13:12:17 -08:00
jamisiveshkumar
a90f816be6
Fix capitalization typo in README.md
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sivesh Kumar <siveshjami@gmail.com>
Closes #17889
2025-11-03 16:37:58 -08:00
Alexander Motin
6cfc3dba9c
Cleanup ZIO_FLAG_IO_RETRY vs TRYHARD usage
In cases where all issued ZIOs must succeed, and we can't do
anything clever about the errors, we should just explicitly set
ZIO_FLAG_TRYHARD and let OS to do all the reasonable retries.

In other cases, where retries can be different from the original,
for example, some ZIOs are allowed to fail due to redundancy, or
we can disable aggregation on retrial to get at least some of
the data, we can do first pass without TRYHARD, and only if needed
retry with ZIO_FLAG_IO_RETRY (which implies TRYHARD semantics).

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17877
2025-10-30 16:29:48 -07:00
Alexander Motin
ec268cdf97 Fix caching of DDT log and BRT
Both DDT log and BRT counters we read on pool import and then only
append or overwrite in full blocks.  We don't need them in DMU or
ARC caches.  Fortunately we have DMU_UNCACHEDIO for this now.

Even more we don't need BRT in non-evictable metadata DMU caches,
since it will likely never fit there, while block the cache from
its original users.  Since DMU_OT_IS_METADATA_CACHED() has no way
to differentiate the new metadata types, mark BRT with storage
type of DMU_OT_DDT_ZAP.  As side effect it will also put it on
dedup device, but that should actually be right.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17875
2025-10-30 16:28:28 -07:00
Alexander Motin
ea125eeb5d BRT: Round bv_entcount up to BRT_BLOCKSIZE
Since we set bv_mos_brtvdev block size, and since we keep dirty
bitmap at the same granularity, we should keep the allocations
and writes done with.  Otherwise it makes the last block write
short, that will be odd once we implement writing of only dirty
blocks, but also requires read-modify-write on DMU layer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17875
2025-10-30 16:28:05 -07:00
Joseph Anthony Pasquale Holsten
033dbdc982
autogen.sh: remove workaround for automake <1.14, needed for EL <=7
Ultimately this is a revert of 779ac93, which according to
@nabijaczleweli is to paper over automake <1.14's lack of
%reldir% support.

As I understand it, EL8 is the lowest current build target.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Joseph Holsten <joseph@josephholsten.com>
Closes #17878
2025-10-30 16:26:56 -07:00
Brian Behlendorf
f819b41c78
Retire ZoL patch scripts
Remove the out of date helper scripts originally used to port
Illumos commits to the ZoL repository.  Due to layout changes
made to this repository they're no longer entirely correct.
Remove them to make it clear they're no longer being used or
actively maintained.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17880
2025-10-30 09:07:38 -07:00
Alexander Motin
dcada084b9
Pass flags to more DMU write/hold functions
Over the time many of DMU functions got flags argument to control
prefetch, caching, etc.  Few functions though left without it, even
though closer look shown that many of them do not require prefetch
due to their access pattern.  This patch adds the flags argument to
dmu_write(), dmu_buf_hold_array() and dmu_buf_hold_array_by_bonus(),
passing DMU_READ_NO_PREFETCH where applicable.

I am going to also pass DMU_UNCACHEDIO to some of them later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17872
2025-10-29 11:17:51 -07:00
Quartz
3caf66c25b
man: Update zpool-event subclass names and document new types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quartz <yyhran@163.com>
Closes #17868
2025-10-28 12:49:05 -07:00
Toomas Soome
67e716329b
ZTS: autotrim_config.ksh is missing pool type
functional/trim tests do create pools of different types to test
trim, autotrim_config.ksh is missing the type from zpool
create command line while we are looping over different pool
types.

Sponsored-by: Edgecast Cloud LLC.
Signed-off-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17874
2025-10-28 12:41:36 -04:00
Ryan Libby
0455150f11
FreeBSD zio_crypt.c: initialize uio variables before access
In zio_crypt_key_wrap and zio_crypt_key_unwrap, the cuio_s variable was
not initialized before the calls to zfs_uio_init, leading to
uninitialized access to cuio_s.uio_offset.  Initialize it to avoid gcc
warnings.

Similar issue as fixed in 2bf152021 ("Fix gcc uninitialized warning in
FreeBSD zio_crypt.c")

Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17863
2025-10-23 21:23:25 -04:00
Rob Norris
fc519b2c11
mailmap/AUTHORS: update with recent new contributors
We’re not always on the same page, but at least we’re in the same book.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17860
2025-10-22 09:27:54 -07:00
Shreshth3
44704616b4
zpool: fix conflict with -v and -o options
Right now, the -v and -o options for `zpool list` work independently,
but when paired, the -v "wins out" and the -o effect is lost. This
commit fixes that problem.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #11040
Closes #17839
2025-10-21 15:10:52 -07:00
Rob Norris
72f41454a6
ZTS: fail test run if test runner crashes unexpectedly
zfs-tests.sh executes test-runner.py to do the actual test work. Any
exit code < 4 is interpreted as success, with the actual value
describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the
process exit code is 1.

Taken together, this means that test-runner.py can crash during setup,
but return a "success" error code to zfs-tests.sh, which will report and
exit 0. This in turn causes the CI runner to believe the test run
completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code
of 255 as a failure in the runner itself. Then, in test-runner.py, the
"fail()" function defaults to a 255 return, and the main function gets
wrapped in a generic exception handler, which prints it and calls
fail().

All together, this should mean that any unexpected failure in the test
runner itself will be propagated out of zfs-tests.sh for CI or any other
calling program to deal with.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17858
2025-10-21 14:34:39 -07:00
Jean-Sébastien Pédron
3a55e76b84
FreeBSD: zfs_getpages: Don't zero freshly allocated pages
Initially, `zfs_getpages()` is provided with an array of busy pages by
the vnode pager. It then tries to acquire the range lock, but if there
is a concurrent `zfs_write()` running and fails to acquire that range
lock, it "unbusies" the pages to avoid a deadlock with `zfs_write()`.
After that, it grabs the pages again and retries to acquire the range
lock, and so on.

Once it got the range lock, it filters out valid pages, then copy DMU
data to the remaining invalid pages.

The problem is that freshly allocated zero'd pages it grabbed itself are
marked as valid. Therefore they are skipped by the second part of the
function and DMU data is never copied to these pages. This causes mapped
pages to contain zeros instead of the expected file content.

This was discovered while working on RabbitMQ on FreeBSD. I could
reproduce the problem easily with the following commands:

    git clone https://github.com/rabbitmq/rabbitmq-server.git
    cd rabbitmq-server/deps/rabbit

    gmake distclean-ct RABBITMQ_METADATA_STORE=mnesia \
      ct-amqp_client t=cluster_size_3:leader_transfer_stream_send

The testsuite fails because there is a sendfile(2) that can happen
concurrently to a write(2) on the same file. This leads to sendfile(2)
or read(2) (after the sendfile) sending/returning data with zeros, which
causes a function to crash.

The patch consists of not setting the `VM_ALLOC_ZERO` flag when
`zfs_getpages()` grabs pages again. Then, the last page is zero'd if it
is invalid, in case it would be partially filled with the end of the
file content. Other pages are either valid (and will be skipped) or they
will be entirely overwritten by the file content.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Closes #17851
2025-10-20 17:04:21 -07:00
Rob Norris
fe8b50f09f Linux 6.18: generic_drop_inode() and generic_delete_inode() renamed
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
3651888182 sha256_generic: make internal functions a little more private
Linux 6.18 has conflicting prototypes for various sha256_* and sha512_*
functions, which we get through a very long include chain. That's tough
to fix right now; easier is just to rename our internal functions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
8911360a41 Linux 6.18: namespace type moved to ns_common
The namespace type has moved from the namespace ops struct to the
"common" base namespace struct. Detect this and define a macro that does
the right thing for both versions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
76c238f1ba Linux 6.18: replace write_cache_pages()
Linux 6.18 removed write_cache_pages() without a usable replacement.
Here we implement a minimal zpl_write_cache_pages() that find the dirty
pages within the mapping, gets them into the expected state and hands
them off to zfs_putpage(), which handles the rest.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
39db4bda80 Linux 6.18: block_device_operations->getgeo takes struct gendisk*
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
5de4a297e7 Linux 6.18: convert ida_simple_* calls
ida_simple_get() and ida_simple_remove() are removed in 6.18. However,
since 4.19 they have been simple wrappers around ida_alloc() and
ida_free(), so we can just use those directly.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
9d50ee59dc Linux 6.18: replace nth_page()
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Andrew Walker
adacf020ce
Fix return value for setting zvol threading
We must return -1 instead of ENOENT if the special zvol threading
property set function can't locate the dataset (this would typically
happen with an encypted and unmounted zvol) so that the operation
gets inserted properly into the nvlist for operations to set. This
is because we want the property to be set once the zvol is
decrypted again.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #17836
2025-10-20 15:21:40 -07:00
Shreshth3
3ea8ca8c0f
zdb: fix bug with -A flag
Fixes #10544.

According to the manpage, zdb -A should
ignore all assertions. But it currently
does not do that. This commit fixes
this bug.

Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17825
2025-10-20 09:30:20 -04:00
Andrew Walker
783a02b5d3
Fix ZFS_READONLY implementation on Linux
MS-FSCC 2.6 is the governing document for
DOS attribute behavior. It specifies the following:

For a file, applications can read the file but
cannot write to it or delete it. For a directory,
applications cannot delete it, but applications can
create and delete files from the directory.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17837
2025-10-20 09:28:57 -04:00
Brian Behlendorf
5a03e358fc
Update device removal documentation
Make a minor update to the 'zpool remove' man page to clarify both
raidz and draid pools do not support removal, and change sector to
ashift which is what we actually care about.

Update the big theory comment in vdev_removal.c to accurately reflect
which types of vdevs can be removed.  Furthermore, I've added some
discussion for the casual reader to briefly explain the top-level
vdev removal restrictions.  This has been a common area of confusion
and it's not intuitive where they come from without understanding
the implementation details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17847
2025-10-20 09:26:51 -04:00
Rob Norris
6ae99d2692
mmap_seek: print error code and text on failure
If lseek() returns an unexpected error, it's useful to know the error
code to help connect it to the trouble spot inside the module.

Since the two seek functions should be basically identical, lift them
into a single generic function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17843
2025-10-14 15:57:35 -07:00
Ameer Hamza
d1e1b80ffe
CI: Fix FreeBSD 15.0 by staying on ALPHA4 due to broken ALPHA5 image
FreeBSD 15.0-ALPHA5 image fails to boot on cloud VMs due to missing
/boot/efi mount point, causing the system to drop to single user mode
where SSH cannot start. Work around this by staying on ALPHA4 and
setting IGNORE_OSVERSION=yes to bypass pkg's kernel version mismatch
prompt during bootstrap. This allows CI to proceed with ALPHA4 until we
have a stable FreeBSD 15.0 image.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17846
2025-10-13 21:19:47 -04:00
Shreshth3
a5af3f2db7
arc: fix small typos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17840
2025-10-13 11:23:55 -07:00
Rob Norris
0e62831110
libzpool/cmn_err: remove suppression, add stop option, cleanup
A small uplift of the cmn_err() and panic() calls in userspace:

- remove the suppression on CE_NOTE. We have very few of these calls in
  a standard build, it's convenient for "print debugging".

- make prefixes clear and consistent.

- add LIBZPOOL_PANIC_STOP environment variable to send SIGSTOP to the
  process group on a panic, rather than abort(), so all threads remain
  alive for inspection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17834
2025-10-13 10:55:03 -07:00
Mark Johnston
a9f2a1f361
Fix the type of the raidz_outlier_check_interval_ms parameter
It's an hrtime_t, which is an unsigned long long.  In practice this is
just a U64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17833
2025-10-13 10:47:09 -07:00
Alexander Motin
f4276479c9
Suppress some ashift warnings
Do not warn about vdev ashifts being smaller then physical ashifts
in a pool status if the pool ashift property set and vdev ashift
satisfies it (bigger or equal), since user explicitly requested
this.  The ashift of individual vdevs are still reported.

Do not warn about vdev ashifts in zpool import, since it doesn't
matter much, and we don't even report individual vdevs ashifts
there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17830
2025-10-13 10:41:42 -07:00
Alexander Motin
51de2d76f8
Explicit set ashift for non-leaf vdevs
Before this change ashift property was applied only to a leaf
vdevs.  As result, it worked only as a minimal value for parent
vdevs, since bigger physical_ashift value reported by any child
could be used instead when deciding parent's ashift, as if the
ashift property was never set.

This change explicitly passes ZPOOL_CONFIG_ASHIFT to all vdevs,
allowing override for parents only if the passed value is below
logical_ashift and so unacceptable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17826
2025-10-13 10:41:02 -07:00
Ameer Hamza
6045740c8e
zpool_reopen_004_pos: Clear label from offline disk after destroy
zpool_reopen_004_pos destroys a pool with an offline disk, leaving its
label intact. In TrueNAS local repo, zpool_reopen_005_pos is skipped,
causing zpool_reopen_007_pos to fail as it doesn't use -f flag when
creating pools unlike zpool_reopen_005_pos.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17831
2025-10-10 11:53:11 -04:00
Dag-Erling Smørgrav
6e5b836e9f
FreeBSD: Correct _PC_MIN_HOLE_SIZE
The actual minimum hole size on ZFS is variable, but we always report
SPA_MINBLOCKSIZE, which is 512.  This may lead applications to believe
that they can reliably create holes at 512-byte boundaries and waste
resources trying to punch holes that ZFS ends up filling anyway.

* In the general case, if the vnode is a regular file, return its
  current block size, or the record size if the file is smaller than
  its own block size.  If the vnode is a directory, return the dataset
  record size.  If it is neither a regular file nor a directory,
  return EINVAL.

* In the control directory case, always return EINVAL.

Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17750
2025-10-08 09:13:22 -04:00
Shreshth3
ea914e4a43
Add missing include statement
Resolve a build failure for user applications that include <sys/uio.h>.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17781
Closes #17814
2025-10-07 09:21:03 -07:00
Tony Hutter
1861a329fb
zvol: verify IO type is supported
ZVOLs don't support all block layer IO request types.  Add a check for
the IO types we do support.  Also, remove references to
io_is_secure_erase() since they are not supported on ZVOLs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17803
2025-10-06 16:54:09 -07:00
Mateusz Guzik
346ecac61b
Annotate arc_buf_is_shared as __maybe_unused
Otherwise the compiler warns about it on production FreeBSD builds.

The routine proved resilient to attempts to ifdef on debug.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #17818
2025-10-06 16:43:20 -07:00
Tino Reichardt
3ef2d5ee45
CI: Switch FreeBSD 15 to 15.0-ALPHA4 and add FreeBSD 16
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17815
2025-10-06 16:41:01 -07:00
Ivan Shapovalov
f8b082b5af zdb: adjust block histogram binning strategy
Previously, a bin included all blocks _starting_ from given size
(e.g., a "4K" bin would include all blocks within the [4K; 8K) region).
This is counter-intuitive and does not match the typical use-case of the
block histogram (that is, to estimate disk usage considering how ZFS'
block allocation works). In other words, if I'm looking at the "4K" row,
I'm interested in records that _fit into_ a 4K block.

Adjust the binning strategy such that a bin includes all blocks _up to_
given size, such that e.g. a "4K" bin would include all blocks within
the (2K; 4K] region.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:32 -07:00
Ivan Shapovalov
3a1a22abb4 zdb: factor out block histogram bin number computation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:28 -07:00
Ivan Shapovalov
c0a874fced zdb: add --class=(normal|special|...) to filter blocks by alloc class
When counting blocks to generate block size histograms (`-bb`), accept a
`--class=` argument (as a comma-separated list of either "normal",
"special", "dedup" or "other") to only consider blocks that belong to
these metaslab classes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:23 -07:00
Ivan Shapovalov
8e97b98140 zdb: add --bin=(lsize|psize|asize) arg to control histogram binning
When counting blocks to generate block size histograms (`-bb`), accept a
`--bin=` argument to force placing blocks into all three bins based on
*this* size.

E.g. with `--bin=lsize`, a block with lsize=512K, psize=128K, asize=256K
will be placed into the "512K" bin in all three output columns. This
way, by looking at the "512K" row the user will be able to determine
how well was ZFS able to compress blocks of this logical size.

Conversely, with `--bin=psize`, by looking at the "128K" row the user
will be able to determine how much overhead was incurred for storage
of blocks of this physical size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:02 -07:00
Ivan Shapovalov
1269fa9b79 zdb: convert ALLOCATED_OPT into anonymous enum
We are adding more long-only options, so use an enum for all of them
to avoid manually numbering these constants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:34:03 -07:00
Rob Norris
5605a6d79b pool_iter_refresh: don't refresh pools twice
In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which
will call zpool_refresh_stats() before calling add_pool(). If we already
have the pool, this is a different handle, so we just release it and
return. Back in pool_iter_refresh(), we then call zpool_stats_refresh()
again for our handle on the same pool.

All together, this means we're doing two ZFS_IOC_POOL_STATS calls into
the kernel for every pool in the system. This isn't wrong, but it does
double the pressure on global locks.

Instead, we add a new function zpool_refresh_stats_from_handle() that
simply copies the pool config and state from one handle to another, and
use it to update our handle before we release it in add_pool(), so we
only have one call per pool per interval.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:39:09 -07:00
Rob Norris
5f09781cca pool_iter_refresh: don't flag existing pools as refreshed
zpool_iter() passes the callback a new instance of zpool_handle_t each
time, so the existing handle in the pool_list AVL never actually gets a
refresh. Internally, that means its zpool_config is never updated, and
the old config is never moved to zpool_old_config. As a result,
print_iostat() never sees any updated config, and so repeats the first
line forever.

This is the simplest workaround: just don't mark existing pools as
refreshed. pool_list_refresh() will see this and refresh them.
The downside is a second call to ZFS_IOC_POOL_STATS for existing pools,
because zpool_iter() just called it for the handle we threw away.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:39:00 -07:00
Rob Norris
1a32adca0f zpool iostat: update pool counter when skipping boot row
When skipping the boot row (with -y), the early loop meant we weren't
updating the "last_npools" count. That means the count never advanced
past zero, so cb_iteration was always reset to 0, leading to it being
"stuck" on the boot line, printing the header and nothing else forever.

Updating the pool counter on every loop sorts that out: it advances,
cb_iteration moves properly, and normal rows are printed.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:38:35 -07:00
Ameer Hamza
ac2d8c80b6
Make mount/share errors non-fatal for zfs create/clone
If zfs_mount_and_share() fails, the error propagates to zfs create/clone
commands despite successful operation. If create/clone operations were
successful, there's no point in making zfs_mount_and_share() failures
fatal.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17799
2025-10-02 11:24:26 -04:00
Igor Ostapenko
cb3c18a9a9 ddt prune: Add SCL_ZIO deadlock workaround
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17793
2025-10-01 15:17:09 -07:00
Igor Ostapenko
e829e2fd04 spa_config: Rename spa_config_enter_mmp() to spa_config_enter_priority()
Originally this was created for MMP, but now new cases are emerging
where the same mechanism is required. Hence the name's generalization.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17793
2025-10-01 15:16:04 -07:00
Robert Evans
8869caae5f
zinject: Introduce ready delay fault injection
This adds a pause to the ZIO pipeline in the ready stage for
matching I/O (data, dnode, or raw bookmark).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #17787
2025-10-01 12:17:13 -07:00
Paul Dagnelie
fa4d4b1f80
Fix display of default xattr to show 'sa'
When the default value of the xattr property was changed from 'dir' to
'sa', the code that displays the property's value was not affected. The
problem with this state of affairs is that 1) user tooling that
specifically looked for 'sa' before will be confused now that the code
displays 'on' instead. And 2) users may be confused when manually
running the commands about which specific type of xattr is in use unless
they are up to date on the latest zfs changes.

The fix here is to show the actual type always, rather than 'on' if we
happen to be using the default. This turns out to be easy to do, by
simply reordering the list of xattr values in the properties code. When
the property is displayed, we iterate down the table until we find a row
with a matching value, and use that row's name as the
display. Reordering the row fixes the display without affecting any
other code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17801
2025-10-01 12:14:56 -07:00
Shreshth3
32ce74ff32
docs: fix a few small typos (#17804)
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-10-01 10:15:46 -07:00
nav1s
102ff2a640
manuals: fix typos in zpool-upgrade man page
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: nav1s <nav1s@proton.me>
Closes #17797
2025-09-29 16:43:22 -07:00
hoshinomori
e4a407f29f
range_tree: drop duplicate zfs_ prefix from rs_set_fill_raw
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: hoshinomori <hoshinomori@owarisekai.moe>
Closes #17800
2025-09-29 16:38:52 -07:00
Rob Norris
f0a95e8971
zpool iostat: refresh pool list every interval
When running zpool iostat in interval mode, it would not notice any new
pools created or imported, and would forget any destroyed or exported,
so would not notice if they came back. This leads to outputting "no
pools available" every interval until killed.

It looks like this was at least intended to work; the comment above
zpool_do_iostat() indicates that it is expected to "deal with pool
creation/destruction" and that pool_list_update() would detect new
pools. That call however was removed in 3e43edd2c5, though its unclear
if that broke this behaviour and it wasn't noticed, or if it never
worked, or if something later broke it. That said, the lack of
pool_list_update() is only part of the reason it doesn't work properly.

The fundamental problem is that the various things involved in
refreshing or updating the list of pools would aggressively ignore,
remove, skip or fail on pools that stop existing, or that already exist.
Mostly this meant that once a pool is removed from the list, it will
never be seen again. Restoring pool_list_update() to the
zpool_do_iostat() loop only partially fixes this - it would find "new"
pools again, but only in the "all pools" (no args) mode, and because its
iterator callback add_pool() would abort the iterator if it already has
a pool listed, it would only add pools if there weren't any already.

So, this commit reworks the structure somewhat. pool_list_update()
becomes pool_list_refresh(), and will ensure the state of all pools in
the list are updated. In the "all pools" mode, it will also add new
pools and remove pools that disappear, but when a fixed list of pools is
used, the list doesn't change, only the state of the pools within it.

The rest of the commit is adjusting things for this much simpler
structure. Regardless of the mode in use, pool_list_refresh() will
always do the right thing, so the driver code can just get on with the
display.

Now that pools can appear and disappear, I've made it so the header (if
enabled) is re-printed when the list changes, so that its easier to see
what's happening if the column widths change.

Since this is all rather complicated, I've included tests for the "all
pools" and "set of pools" modes.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17786
2025-09-29 16:35:27 -07:00
Tony Hutter
75be5f2973
CI: Add ZTS -O option, log Setup Testing Machines step
Add a -O option to zfs-test.sh to dump debug information on test
timeout.  The debug info includes:

- 30 lines from 'top'
- /proc/<PID>/stack output of process with highest CPU usage
- Last lines strace-ing process with highest CPU usage
- /proc/sysrq-trigger kernel stack traces

All debug information gets dumped to /dev/kmsg (Linux only).

In addition, print out the VM console lines from the "Setup Testing
Machines" step.  We have often see VMs timeout at this step and don't
know why.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17753
2025-09-29 16:32:05 -07:00
Tony Hutter
8d4c3ee9e6
zvol: Fix blk-mq sync
The zvol blk-mq codepaths would erroneously send FLUSH and TRIM
commands down the read codepath, rather than write.  This fixes
the issue, and updates the zvol_misc_fua test to verify that
sync writes are actually happening.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17761
Closes #17765
2025-09-29 16:29:20 -07:00
Brian Behlendorf
4ff25e9013
CI: Switch FreeBSD 15 to 15.0-ALPHA3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17795
2025-09-26 20:52:57 -04:00
Brian Behlendorf
a44985315e
CI: Remove Buildbot references
The Buildbot CI infrastructure has been fully replaced by GitHub
Actions.  Remove any lingering references from the repository.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17794
2025-09-26 15:32:41 -07:00
Brian Behlendorf
79be201806
Linux 6.17 compat: META
Update the META file to reflect compatibility with the 6.17
kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17789
2025-09-26 10:00:18 -07:00
Brian Behlendorf
aecd6deeb3
CI: update perf and bpftools with the kernel packages
When updating a Fedora instance to an experimental kernel make sure
to include the matching versioned perf and bpftool packages.  This
helps ensure there are no unexpected conflicts which would prevent
the new packages from being installed.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17791
2025-09-25 17:47:32 -07:00
patrickxia
5c38029f4b
zdb: add ZFS_KEYFORMAT_RAW support for -K option
This change adds support for ZFS_KEYFORMAT_RAW to zdb_derive_key in 
zdb.c. The implementation reads the raw key from the file specified 
by the -K option which is consistent with how raw keys are handled in 
the other parts of ZFS, along with a check to ensure that the keyfile 
doesn't have too many bytes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Patrick Xia <patrickx@google.com>
Closes #17783
2025-09-25 12:05:42 -07:00
Robert Evans
26b0f561be
dnode_next_offset: backtrack if lower level does not match
This changes the basic search algorithm from a single search up and down
the tree to a full depth-first traversal to handle conditions where the
tree matches at a higher level but not a lower level.

Normally higher level blocks always point to matching blocks, but there
are cases where this does not happen:

1. Racing block pointer updates from dbuf_write_ready.

   Before f664f1ee7f (#8946), both dbuf_write_ready and
   dnode_next_offset held dn_struct_rwlock which protected against
   pointer writes from concurrent syncs.

   This no longer applies, so sync context can f.e. clear or fill all
   L1->L0 BPs before the L2->L1 BP and higher BP's are updated.

   dnode_free_range in particular can reach this case and skip over L1
   blocks that need to be dirtied. Later, sync will panic in
   free_children when trying to clear a non-dirty indirect block.

   This case was found with ztest.

2. txg > 0, non-hole case. This is #11196.

   Freeing blocks/dnodes breaks the assumption that a match at a higher
   level implies a match at a lower level when filtering txg > 0.

   Whenever some but not all L0 blocks are freed, the parent L1 block is
   rewritten. Its updated L2->L1 BP reflects a newer birth txg.

   Later when searching by txg, if the L1 block matches since the txg is
   newer, it is possible that none of the remaining L1->L0 BPs match if
   none have been updated.

   The same behavior is possible with dnode search at L0.

   This is reachable from dsl_destroy_head for synchronous freeing.
   When this happens open context fails to free objects leaving sync
   context stuck freeing potentially many objects.

   This is also reachable from traverse_pool for extreme rewind where it
   is theoretically possible that datasets not dirtied after txg are
   skipped if the MOS has high enough indirection to trigger this case.

In both of these cases, without backtracking the search ends prematurely
as ESRCH result implies no more matches in the entire object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16025
Closes #11196
2025-09-25 11:06:28 -07:00
Brian Behlendorf
c722bf8812
Add interface to interface spa_get_worst_case_min_alloc() function
Provide an interface to retrieve the lowest and highest minimum
allocation size for the normal allocation class.  This can be used
by external consumers of the DMU to estimate potential wasted
capacity when setting the recordsize for an object.

The new "min_alloc" and "max_alloc" keys are added to the pool
configuration and used by default_volblocksize() to warn when
an ineffecient block size is requested.  For older kmods which
don't yet include the new keys fallback to the previous logic.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17758
2025-09-25 09:35:35 -07:00
Brian Behlendorf
0e1a53a8c0
Fix 'zpool add' safety check corner cases
Three cases were discovered where 'zpool add' would fail to
warn when adding vdevs to a pool with a mismatched replication
level.  These are:

  1. When a pool contains mixed file and disk vdevs.
  2. When a pool contains an active dRAID distributed spare
  3. When a pool contains an active hot spare

The lack of warnings are caused by get_replication() assessing
the current pool configuration an inconsistent and disabling
the mismatched replication check for the new pool configuration
after 'zpool add'.  This change updates get_replication() to
be slightly more tolerant in the non-fatal case.

The zpool_add_010_pos.ksh test case was split in to separate
tests: zpool_add_warn_create.ksh, pool_add_warn_degraded.ksh,
and zpool_add_warn_removal.  These test were extended to
include coverage for dRAID pools and the three scenarios
described above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17780
2025-09-25 09:32:59 -07:00
Brian Behlendorf
3e9347c9f7
ZTS: update upgrade_readonly_pool.ksh
Modify the test case to use the `zfs mount` command instead
of directly calling the mount command, create a dedicated dataset,
and use the default mount point.  These changes are intended to
preserve the intent of the original test case and resolve some
spurious mount failures which have been observed by the CI.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17785
2025-09-25 09:31:10 -07:00
jozzsi
b2196fbedf
contrib: dracut: install dependent kernel modules
Eliminates the need for the following workaround

> Add other drivers to dracut:

```
if grep mpt3sas /proc/modules; then
  echo 'force_drivers+=" mpt3sas "'  >> /etc/dracut.conf.d/zfs.conf
fi
if grep virtio_blk /proc/modules; then
  echo 'filesystems+=" virtio_blk "' >> /etc/dracut.conf.d/fs.conf
fi
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jo Zzsi <jozzsicsataban@gmail.com>
Closes #17762
2025-09-23 16:58:38 -07:00
Alexander Motin
ea37c30fcb
zdb: Fix asize overflow in verify_livelist_allocs()
Spacemap entry might be too big to fit into a block pointer ashift.
We hit an assertion trying to run `zdb -bvy` on a large pool.  But
it seems the code does not really need size there, since we only
need to search for a range of offsets, so setting it to zero should
just make btree return position just before the first entry.  I
suspect the previous code could actually miss the first entry
due to this if its size was smaller.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17764
2025-09-23 16:09:37 -07:00
trick2011
876f705cc4
Use "vdev" instead of "devices" when referring to vdevs
Update documentation to use the correct terminology.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: trick2011 <trick2011@users.noreply.github.com>
Closes #17734
Closes #17755
2025-09-23 16:08:07 -07:00
Tony Hutter
11787965e0
ZTS: Fix stale symlinks with zfs-helpers.sh
zfs-helpers.sh is a utility script that sets up udev symlinks so you
can run ZTS from a local ZFS git workspace.  However, it doesn't check
that the udev symlinks point to the current workspace.  They may point
to an old workspace that has been deleted.  This means the udev rules
never get executed, which in turn causes the zvol tests to fail.

This commit removes old symlinks that do not point to the current
ZFS workspace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17766
2025-09-23 12:58:14 -07:00
jozzsi
6ba51da93b
contrib: dracut: always include zfs kernel module
This commit fixes the issue and includes the zfs kernel
module even when dracut is used in hostonly mode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jo Zzsi <jozzsicsataban@gmail.com>
Closes #17754
2025-09-18 16:56:45 -07:00
Alan Somers
545d66204d
Fix a printf format specifier on FreeBSD/i386
This is breaking the build on FreeBSD/i386.  Originally committed
downstream as https://github.com/freebsd/freebsd-src/commit/2d76470b701

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17705
2025-09-17 16:32:29 -07:00
Alan Somers
3387d34093
Fix atomic-alignment warnings in libspl on FreeBSD/i386
On i386, Clang complains about misaligned atomic operations.  Silence
these warnings to fix the build on FreeBSD/i386.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17708
2025-09-17 16:31:27 -07:00
Rob Norris
ab8cc63c77 linux/super: add tunable to request immediate reclaim of unused dentries
Traditionally, unused dentries would be cached in the dentry cache until
the associated entry is no longer on disk. The cached dentry continues
to hold an inode reference, causing the inode to be pinned (see previous
commit).

Here we implement the dentry op d_delete, which is roughly analogous to
the drop_inode superblock op, and add a zfs_delete_dentry tunable to
control its behaviour. By default it continues the traditional
behaviour, but when the tunable is enabled, we signal that an unused
dentry should be freed immediately, releasing its inode reference, and
so allowing that inode to be deleted if no longer in use.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2025-09-17 08:16:32 -07:00
Rob Norris
ab93b4b70e linux/super: add tunable to request immediate reclaim of unused inodes
Traditionally, unused inodes would be held on the superblock inode cache
until the associated on-disk file is removed or the kernel requests
reclaim.  On filesystems with millions of rarely-used files, this can be
a lot of unusable memory.

Here we implement the superblock drop_inode method, and add a
zfs_delete_inode tunable to control its behaviour. By default it
continues the traditional behaviour, but when the tunable is enabled, we
signal that the inode should be deleted immediately when the last
reference is dropped, rather than cached. This releases the associated
data to the dbuf cache and ARC, allowing them to be reclaimed normally.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2025-09-17 08:15:56 -07:00
buzzingwires
ffe93aee0a Add typesets to zhack label repair test scripts
As a quality assurance measure, `typeset` is added to local variable
declarations to actually enforce their intended scope.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #17732
2025-09-17 08:13:24 -07:00
buzzingwires
1d2d812986 Refactor zhack label repair and fix -c regression on nonzero TXG
This commit fixes a likely regression introduced by 64db435 where the
checksum repair functionality (`-c` or default behavior) will perform
checks and access data associated with the newer undetach (`-u`)
functionality, resulting in a failure when an uberblock's TXG is not 0
as required by `-u` but not `-c`

Additionally, code is refactored for better separation of tasks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #17732
2025-09-17 08:13:06 -07:00
Rob Norris
d36684201f man: add silent rules for mancheck
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:32 -07:00
Rob Norris
45ac6045cc mancheck: allow single files
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:28 -07:00
Rob Norris
faf2db3435 Shellcheck.am: add silent rules for shellcheck and checkbashisms
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:08 -07:00
Alexander Motin
d147ed7d26
CI: Switch FreeBSD 15 to 15.0-ALPHA2
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17749
2025-09-15 12:15:31 -07:00
Igor Ostapenko
58b84289e8
Fix txg_log_time ZAP key typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #17748
2025-09-15 10:43:43 -07:00
Kyle Evans
5c46baa1ce
zfsprops(7): attempt to clarify the keylocation description
The current description is somewhat difficult to parse through, and in
some cases is a little unclear as to the behavior.

Split it into a paragraphs based on the three distinct behaviors you
may get: prompt, file URL, HTTP(S) URL.  The descriptions of the file
and HTTP(s) behavior seems fine, but prompt is a little vague- expand
on it and make it clear that the behavior is actively based on whether
the inquisitor of key-data is provided with a tty for stdin or not.

Also clarify *why* one shouldn't "place keys which should be kept secret
on the command line" and note that you *have* to supply the key via
stdin if it's a raw key, just to be sure.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #17742
2025-09-15 10:26:17 -07:00
Brian Behlendorf
f330b463de
ZTS: default to random data in fill_fs
Update the fill_fs helper function to request a random fill pattern
when the "data" argument isn't specified.  This ensures the default
behavior is to perform a more realistic fill of incompressible blocks.

Additionally, update a few test cases to specify a random fill.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17739
2025-09-15 09:37:33 -07:00
Brian Behlendorf
4b764fb01a
ZTS: Fix zfs_send_delegation_user test
Correct the path in the common.run file.  The zfs_send_delegation_user
test is installed under cli_user not cli_root.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17740
2025-09-15 09:30:57 -07:00
Rob Norris
f319ff3570
vdev_disk_close: take disk write lock before destroying it
Many IO operations are submitted to the kernel async, and so the zio can
complete and followup actions before the submission call returns. If one
of the followup actions closes the disk (eg during pool create/import),
the initiator may be left holding a lock on the disk at destruction.

Instead, take the write lock before finishing up and decoupling the disk
state from the vdev proper. The caller will hold until all IO is
submitted and locks released.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17719
2025-09-15 09:12:24 -07:00
Alexander Motin
3f4312a0a4
Fix two infinite loops if dmu_prefetch_max set to zero
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17692
Closes #17729
2025-09-13 12:58:48 -04:00
Paul Dagnelie
9b772f328b
Fix time database update calculations
The time database update math assumed that the timestamps were in
nanoseconds, but at some point in the development or review process they
changed to seconds. This PR fixes the math to use seconds instead.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17735
2025-09-12 16:33:36 -07:00
Brian Behlendorf
455c36156c
ZTS: refreserv/refreserv_raidz improvements
Several small changes intended to make this test reliable.

- Leave the default compression enabled for the pool and switch
  to using /dev/urandom as the data source.  Functionally this
  shouldn't impact the test but it's preferable to test with
  the pool defaults when possible.

- Verify the device is created and removed as required.  Switch
  to a unique volume name for a more clarity in the logs.

- Use the ZVOL_DEVDIR to specify the device path.

- Speed up the test by creating the pool with an ashift=12 and
  testing 4K, 8K, 128K volblocksizes.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17725
2025-09-12 11:08:53 -07:00
Alexander Motin
bc8bcfc71a
Fix type in dbrrd_closest()
For ABS() to work, the argument must be signed, but rrdd_time is
uint64_t.  Clang noticed it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Fixes #16853
Closes #17733
2025-09-12 11:05:38 -07:00
Alexander Motin
cb5f9aa582
FreeBSD: Satisfy ASSERT_VOP_IN_SEQC()
zfs_aclset_common() might be called for newly created or not even
created vnodes, that triggers assertions on newer FreeBSD versions
with DEBUG_VFS_LOCKS included into INVARIANTS.  In the first case
make sure to call vn_seqc_write_begin()/_end(), in the second just
skip the assertion.

The similar has to be done for project management IOCTL and file-
bases extended attributes, since those are not going through VFS.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17722
2025-09-12 13:29:27 -04:00
Paul Dagnelie
35f47cb4f4
Make new zhack test a little more reliable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17728
2025-09-12 10:07:24 -07:00
Chunwei Chen
37cd30f714
Fix ddle memleak in ddt_log_load
In ddt_log_load(), when removing dup entry from flushing tree, it doesn't
free the entry causing memleak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17657
Closes #17730
2025-09-12 10:05:06 -07:00
JT Pennington
955fbc5ade Add send:encrypted test
Create tests for the new send:encrypted permission

Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: JT Pennington <jt.pennington@klarasystems.com>
Closes #17543
2025-09-12 09:53:54 -07:00
Allan Jude
7b1cc9eb61 ZFS allow send:encrypted
A new `zfs allow` permissions that ONLY allows sending replication
streams in raw (encrypted) mode, so encrypted data will not be
decrypted as part of the replication process.

Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Co-authored-by: JT Pennington <jt.pennington@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #17543
2025-09-12 09:53:31 -07:00
Tony Hutter
654f2dcb42
zed: Add synchronous zedlets
Historically, ZED has blindly spawned off zedlets in parallel and never
worried about their completion order.  This means that you can
potentially have zedlets for event number 2 starting before zedlets for
event number 1 had finished.  Most of the time this is fine, and it
actually helps a lot when the system is getting spammed with hundreds
of events.

However, there are times when you want your zedlets to be executed
in sequence with the event ID.  That is where synchronous zedlets
come in.

ZED will wait for all previously spawned zedlets to finish before
running a synchronous zedlet.  Synchronous zedlets are guaranteed to be
the only zedlet running.  No other zedlets may run in parallel with a
synchronous zedlet.  Users should be careful to only use synchronous
zedlets when needed, since they decrease parallelism.

To make a zedlet synchronous, simply add a "-sync-" immediately
following the event name in the zedlet's file name:

	EVENT_NAME-sync-ZEDLETNAME.sh

For example, if you wanted a synchronous statechange script:

	statechange-sync-myzedlet.sh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17335
2025-09-11 11:34:07 -07:00
Brian Behlendorf
bc0b5318aa
Prevent scrubbing a read-only pool
While it would be nice to be able to scrub a pool imported read-only
this will currently trip an ASSERT.  Before we can support this there
are some designs challenges which need to be thought through first.

For starters, a read-only import skips reading certain information 
from disk which it knows won't be needed, such as the space maps.
Furthermore, the scrub process expects to be checkpoint it's progress, 
update the on disk error log, and issue repair IO.  None of which 
would be possible when the pool is imported read-only.  

Each of these wrinkles can certainly be handled, but that will take 
some signifcant work.  In the meanwhile we disable the 'zpool scrub' 
command when the pool is imported read-only.

Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #17527
Closes #17717
2025-09-11 10:58:46 -07:00
Paul Dagnelie
d64711c202 Detect a slow raidz child during reads
A single slow responding disk can affect the overall read
performance of a raidz group.  When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.

Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.

The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter.  Setting it to
zero disables slow outlier detection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Contributions-by: Don Brady <don.brady@klarasystems.com>
Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17227
2025-09-10 15:25:03 -07:00
Paul Dagnelie
0620c979a5 Remove RAIDZ reconstruct flags from debug defaults
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17227
2025-09-10 15:24:50 -07:00
Tony Hutter
7f7c58389e
ZTS: Print warning if running ZTS user_run test locally
Print a warning if you're attempting to run a ZTS test that calls
'user_run', and the ephemeral user doesn't have permissions to
access the test binaries.

This can happen if you're running ZTS from a local git repo.  In
that case the test user (say, 'testuser1') may need access to the
ZTS binaries in:

/home/<your_username>/zfs/tests/zfs-tests/bin/

... but 'testuser1' doesn't have permission to enter your home dir:

/home/<your_username>

The warning will help alert users to what is going on.  This will
not be an issue when ZTS is actually installed on the system
(via 'make install' or from packages).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17721
2025-09-10 14:55:58 -07:00
Alan Somers
cd6db758f3
Fix the build of crypto_test on LP32 architectures
test->id is a uint64_t, not a long.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17707
2025-09-10 11:27:39 -07:00
Paul Dagnelie
bc4aac0395 Enable zhack to work properly with 4k sector size disks
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17576
2025-09-10 11:13:55 -07:00
Paul Dagnelie
8f15d2e4d5 Add allocation profile export and zhack subcommand for import
When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.

This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.

The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17576
2025-09-10 11:13:24 -07:00
Shengqi Chen
92ca3ae56a contrib/debian: install files into merged /usr
This commit synchronizes the debian packaging files with the distro
version (also maintained by me) as much as possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17712
2025-09-10 10:45:26 -07:00
Shengqi Chen
9ae20cf03d cmd: rename arcstat to zarcstat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:45:21 -07:00
Shengqi Chen
a5571a0dd1 cmd: rename arc_summary to zarcsummary
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:45:13 -07:00
Shengqi Chen
d3429a75b0 Remove renaming notice and symlinks for arcstat and arc_summary
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:44:17 -07:00
Tony Hutter
c6fe41cac5
CI: Increase setup timeout to 20min, add timestamps
- Increase qemu-1-setup.sh timeout to 20min since it sometimes
  fails to complete after 15min.

- Timestamp all qemu-1-setup.sh lines to look for hangs.

- Add a 'watchdog' process to print out the top running process every
  30sec to help with debugging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17714
2025-09-10 10:25:58 -07:00
Rob Norris
fe2f7cf6d7
linux/rw_destroy: assert no holders before destroying
While rw_destroy() may do nothing on Linux, we still want to make sure
that we don't have any holders outstanding like we do for mutexes.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17718
2025-09-10 08:59:57 -07:00
Rob Norris
7939bad5e7 Linux 6.17: d_set_d_op() is no longer available
We only have extremely narrow uses, so move it all into a single
function that does only what we need, with and without d_set_d_op().

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2025-09-09 13:44:43 -07:00
Rob Norris
9e5e95c24d config: restore ZFS_AC_KERNEL_DENTRY tests
Accidentally removed calls in ed048fdc5b.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2025-09-09 13:44:18 -07:00
Tony Hutter
4b83891db0
ZTS: Fix fault_limits timeouts
fault_limits would often hit the 10min timeout and be killed on Fedora
41-42.  Investigation showed that the 'fill_fs' portion of the test,
which would fill the pool with junk data before vdev replacement, was
writing highly compressible data (~126x), which would have taxed the
CPUs, potentially causing the timeout.

The fix is to write random data and reduce the number of writes.
This has an added benefit that more real data being is written to the
pool (~1GB) vs the old way (~300-400MB).  It also speeds up the test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17709
2025-09-09 13:42:01 -07:00
Alan Somers
e29bfa5bd0
Fix warnings about sha2_is_supported on FreeBSD/i386
This is one problem currently preventing OpenZFS from building on
FreeBSD/i386.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17704
2025-09-09 09:56:38 -07:00
Alan Somers
a2424312c4
Fix the build on 32-bit FreeBSD with GCC
GCC complains about casting a 64-bit integer to a 32-bit pointer.
Originally committed downstream as
https://github.com/freebsd/freebsd-src/commit/2d76470b701

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17706
2025-09-09 08:56:43 -07:00
rmacklem
59f8f5dfe1
zfs_vnops_os.c: Add support for the _PC_CLONE_BLKSIZE name
FreeBSD now has a pathconf name called _PC_CLONE_BLKSIZE
which is the block size supported for block cloning for
the file system.  Since ZFS's block size varies per file,
return the largest size likely to be used, or zero if block
cloning is not supported.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17645
2025-09-09 08:52:40 -07:00
Rob Norris
8266fa5858
cmd: force zarcstat/zarc_summary recreation at install
If the target already exists, lt will fail. Force it to recreate the
symlinks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17702
2025-09-08 14:36:14 -07:00
Chunwei Chen
e3c3e86c04
Fix wrong dedup_table_size for legacy dedup
If we call ddt_log_load() for legacy ddt, we will end up going into
ddt_log_update_stats() and filling uninitialized value into ddo_dspace.
This value will then get added to dedup_table_size during
ddt_get_dedup_object_stats().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17019
Closes #17699

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
2025-09-08 14:02:51 -07:00
Rob Norris
ced72fdd69
tunables: remove legacy FreeBSD aliases
These are old pre-OpenZFS tunable names that have long been
available via either conventional ZFS_MODULE_PARAM tunables or through
kstats. There's no point doubling up anymore, so delete them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17375
2025-09-08 10:03:01 -07:00
Shengqi Chen
b9c6b0e09b Install zarcstat and zarcsummary in deb / rpm build rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
c69b7ea6ca Install zarcstat and zarcsummary symlinks in Makefile
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
ffba31c236 Add upcoming renaming notice for arc_summary and arcstat
They will become zarcsummary and zarcstat in 2.4.0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
dfc2c32590 ci: fix syntax issues in zfs-qemu.yml
Otherwise it might become `if [ == "" ]` which is ill-formed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-09-05 10:00:45 -07:00
Shengqi Chen
11b5c50238 ci: use real head sha instead of GITHUB_SHA when generating CI type
Because GitHub creates a merge commit on top of real head, so the check
on HEAD will fail regardlessly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-09-05 10:00:37 -07:00
Tony Hutter
0e88a0e1ea
CI: Increase 'Setup QEMU' timeout to 15 minutes
We've seen Fedora 42 still setting up after 10 min.  Change the timeout
to 15 min.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17697
2025-09-05 09:08:15 -07:00
Maksym Shkolnyi
69b65dda8a
config: Add warning if ARCH environment variable is set
If ARCH environment variable is set it can cause the failure of the
kernel modules check during the configure step. The resulting error
will be confusing, and may looks like this:

>    checking for kernel config option compatibility... done
>    checking whether CONFIG_MODULES is defined... no
>    configure: error:
>        *** This kernel does not include the required loadable module
>        *** support!

Detect when ARCH is print a warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maksym Shkolnyi <maksym.shkolnyi@workato.com>
Closes #17680
2025-09-03 11:24:17 -07:00
Rob Norris
64d3143e82
zvol: reject suspend attempts when zvol is shutting down
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17690
2025-09-03 11:13:09 -07:00
Brian Behlendorf
9acedbacee
config: Fix LLVM-21 -Wuninitialized-const-pointer warning
LLVM-21 enables -Wuninitialized-const-pointer which results in the
following compiler warning and the bdev_file_open_by_path() interface
not being detected for 6.9 and newer kernels.  The blk_holder_ops
are not used by the ZFS code so we can safely use a NULL argument
for this check.

    bdev_file_open_by_path/bdev_file_open_by_path.c:110:54: error:
    variable 'h' is uninitialized when passed as a const pointer
    argument here [-Werror,-Wuninitialized-const-pointer]

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17682
Closes #17684
2025-09-02 09:34:08 -07:00
Alexander Ziaee
5a8ba4520b
manuals: Audit/bump dates for last content change
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #17676
2025-08-28 16:26:16 -07:00
classabbyamp
ccf5a8a6fc
linux: use sys/stat.h instead of linux/stat.h
glibc includes linux/stat.h for statx, but musl defines its own statx
struct and associated constants, which does not include STATX_MNT_ID
yet. Thus, including linux/stat.h directly should be avoided for
maximum libc compatibility.

Tested on:
  - glibc: x86_64, i686, aarch64, armv7l, armv6l
  - musl: x86_64, aarch64, armv7l, armv6l

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-By: Achill Gilgenast <achill@achill.org>
Signed-off-by: classabbyamp <dev@placeviolette.net>
Closes #17675
2025-08-27 14:42:32 -07:00
Eric A. Borisch
1da2c30bed
Update pam_zfs_key.c defaultt path for FreeBSD
As described in https://github.com/freebsd/freebsd-src/pull/1305,
FreeBSD's installer defaults to zroot/home for user home directories.

For FreeBSD only, set the default prefix for pam_zfs_key to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Eric A. Borisch <eborisch@gmail.com>
Closes #17600
2025-08-27 09:36:37 -07:00
ofthesun9
976f765341
Update compatibility.d files
Add an openzfs-2.4 compatibility file for the next release.

While there are no compatibility difference between Linux and
FreeBSD for 2.4 symlinks for the -linux and -freebsd names are
created for any scripts expecting that convention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #17672
Closes #17673
2025-08-25 16:47:19 -07:00
Shawn Bayern
ee7c362645
Add description of default sorting behavior to zfs_list.8
The sorting logic is all in cmd/zfs/zfs_iter.c.  I borrowed
where I could from the comments in the source code, but please
note that the comment to zfs_sort() is a little imprecise, or at
least incomplete, because it doesn't give any indication of the
chronological sort that will be used by default for snapshots in
zfs_compare().

While adding this description, I took the liberty to copy-edit
the rest of the file lightly.

In those edits, I've removed "If specified, you can list
property information by the absolute pathname or the relative
pathname" because, in context, it seems more confusing than
helpful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Shawn Bayern <sbayern@law.fsu.edu>
Closes #15713
Closes #15869
2025-08-25 16:45:47 -07:00
Ivan Shapovalov
14bad10f96 config: add and use KERNEL_CC check for -Wno-format-zero-length
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16997
2025-08-25 11:26:13 -07:00
Ivan Shapovalov
e903177b56 config: cleanup KERNEL_CC checks, fix broken status output
If $KERNEL_CC was not defined, configure status output would print an
empty string where the kernel compiler should have been. Fix this and
simplify the code generally.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16997
2025-08-25 11:26:13 -07:00
Tony Hutter
d247538e15
ZTS: add mount_loopback to test zfs behind loop dev
Add a test case to reproduce issue #17277:

1. Make a pool
2. Write a file to the pool
3. Mount the file as a loopback device
4. Make an XFS filesystem on the loopback device
5. Mount the XFS filesystem... <hangs>

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #17277
Closes #17329
2025-08-25 11:20:46 -07:00
Mark Johnston
0d54ae2880
zdb: Fix format strings on 32-bit systems
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17665
2025-08-25 08:59:41 -07:00
youzhongyang
b6bd3228bb
Synchronize the update of feature refcount
The concurrent execution of feature_sync() can lead to a panic due 
to an unprotected update of the feature refcount.  Resolve this by
using the spa->spa_feat_stats_lock to synchronize the update of the 
refcount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #17184
Closes #17632
2025-08-22 16:35:58 -07:00
Cong Zhang
e7485d04f1
Prompt user to unlock when login from dropbear
Update the zfsunlock initramfs hook to provide instructions on how
to unlock the root filesystem when appropriate.  The intent is to
make the dropbear ssh MOTD more user friendly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Cong Zhang <13283869+congzhangzh@users.noreply.github.com>
Closes #17661
Closes #17662
2025-08-22 13:11:41 -07:00
Brian Behlendorf
f1f74577cb Update META
Increase the version to 2.4.99 to indicate the master branch is
newer than the 2.4.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-08-22 12:01:19 -07:00
Brian Behlendorf
00dfa094ac Tag 2.4.0-rc1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-08-22 11:27:48 -07:00
Konstantin Belousov
28ff57505b
FreeBSD: satisfy VFS requirements for readdir()
zfsctl_root_readdir(): properly set eof.
readdir(): set *eofp to 1 on eof.
If there were no dirents to copy out, return EINVAL same as UFS.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Konstantin Belousov <kib@FreeBSD.org>
Closes #17655
2025-08-22 09:14:36 -04:00
Alexander Motin
94413bc75d
zdb: Filter log spacemaps by vdev
When requested to dump metaslabs only for specific vdev, apply the
filter also to log spacemaps to reduce the output.  Unfortunately
filtering by metaslab numbers is more difficult so leave those.

While there, tune the output formatting.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17643
2025-08-21 11:19:46 -04:00
Rob Norris
574eec2964 dnode: remove dn_dirtyctx and dnode_dirtycontext
Only used for a couple of debug assertions which had very little value.

Setting it required taking certain locks, so we can remove all that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16297
Closes #17652
Closes #17658
2025-08-21 06:05:38 -07:00
Rob Norris
aa6f0f878b dnode: remove dn_dirtyctx_firstset
Old debug param, not used for anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16297
Closes #17652
Closes #17658
2025-08-21 06:05:36 -07:00
Rob Norris
eecff1b4a9 dnode: remove dn_dirty_txg and DNODE_IS_DIRTY
dn_dirty_txg only existed for DNODE_IS_DIRTY(). In turn, that only
existed to ensure that a dnode was clean before making it eligible for
removal from the array of cached dnodes attached to the object 0 L0
dbuf.

dn_dirtycnt is enough to check that now, so use it directly and remove
the rest.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16297
Closes #17652
Closes #17658
2025-08-21 06:05:35 -07:00
Rob Norris
f3e49b0cf5 dnode_is_dirty: reimplement in terms of dn_dirtycnt
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16297
Closes #17652
Closes #17658
2025-08-21 06:05:33 -07:00
Rob Norris
3abf72b251 dnode: add dn_dirtycnt, count of number of txgs this dnode is dirty on
Bumped when we take the dirty hold in dnode_setdirty(), dropped when the
dnode is finally cleaned up after sync in dnode_rele_task() or
userquota_updates_task().

This gives us a way to check if the dnode is dirty on any txg without
having to rely on outside information (eg presence on a dirty list),
which has been a rich source of bugs in the past.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Suggested-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16297
Closes #17652
Closes #17658
2025-08-21 06:05:29 -07:00
Tino Reichardt
14d2480708
CI: Add Debian 13 to the FULL_OS runner list
This commit adds Debian 13 alias Trixie to the checked operating
systems. The image needs to be run with UEFI support.

Current Debian version overview:
- Debian 11 (Bullseye) -> "oldoldstable"
- Debian 12 (Bookworm) -> "oldstable"
- Debian 13 (Trixie) -> new "stable"

The CI will be run on Debian 12 and Debian 13 now.
Debian 11 is kept, but won't be used automatically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17648
2025-08-20 08:03:34 -07:00
Shengqi Chen
79a59ae787 Debian rules: install scripts/objtool-wrapper.in into dkms tree
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17633
Closes #17646
2025-08-20 07:32:25 -07:00
Attila Fülöp
2d7928428b objtool-wrapper: Update Debian packaging
6cf17f65 (#17456) introduced a change to `configure.ac` which
breaks the patching done in the Debian packages DKMS source
installation phase. This results in a failed module build.

Adapt the awk script doing the patching to handle the added
`AC_CONFIG_FILE` entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17633
Closes #17646
2025-08-20 07:31:05 -07:00
Alexander Motin
a9410ccbd9
Make zpool_find_config() report errors
All of zpool_find_config() callers now set lpc_printerr.  Actually
printing the errors when pool can not be found should make zdb a
half percent less confusing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17642
2025-08-19 13:09:25 -07:00
Dag-Erling Smørgrav
2c877e8453
FreeBSD: Set st_rdev to NODEV, not 0, when not a device
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #17649
2025-08-19 12:42:21 -07:00
Rob Norris
dcd73069f0 zvol_remove_minors_impl: remove all async fallbacks
Since both ZFS- and OS-sides of a zvol now take care of their own
locking and don't get in each other's way, there's no need for the very
complicated removal code to fall back to async tasks if the locks needed
at each stage can't be obtained right now.

Here we change it to be a linear three-step process: select zvols of
interest and flag them for removal, then wait for them to shed activity
and then remove them, and finally, free them.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:47 -07:00
Rob Norris
8a0e5e8b54 zvol: stop using zvol_state_lock to protect OS-side private data
zvol_state_lock is intended to protect access to the global name->zvol
lists (zvol_find_by_name()), but has also been used to control access to
OS-side private data, accessed through whatever kernel object is used to
represent the volume (gendisk, geom, etc).

This appears to have been necessary to some degree because the OS-side
object is what's used to get a handle on zvol_state_t, so zv_state_lock
and zv_suspend_lock can't be used to manage access, but also, with the
private object and the zvol_state_t being shutdown and destroyed at the
same time in zvol_os_free(), we must ensure that the private object
pointer only ever corresponds to a real zvol_state_t, not one in partial
destruction. Taking the global lock seems like a convenient way to
ensure this.

The problem with this is that zvol_state_lock does not actually protect
access to the zvol_state_t internals, so we need to take zv_state_lock
and/or zv_suspend_lock. If those are contended, this can then cause
OS-side operations (eg zvol_open()) to sleep to wait for them while hold
zvol_state_lock. This then blocks out all other OS-side operations which
want to get the private data, and any ZFS-side control operations that
would take the write half of the lock. It's even worse if ZFS-side
operations induce OS-side calls back into the zvol (eg creating a zvol
triggers a partition probe inside the kernel, and also a userspace
access from udev to set up device links). And it gets even works again
if anything decides to defer those ops to a task and wait on them, which
zvol_remove_minors_impl() will do under high load.

However, since the previous commit, we have a guarantee that the private
data pointer will always be NULL'd out in zvol_os_remove_minor()
_before_ the zvol_state_t is made invalid, but it won't happen until all
users are ejected. So, if we make access to the private object pointer
atomic, we remove the need to take a global lockout to access it, and so
we can remove all acquisitions of zvol_state_lock from the OS side.

While here, I've rewritten much of the locking theory comment at the top
of zvol.c. It wasn't wrong, but it hadn't been followed exactly, so I've
tried to describe the purpose of each lock in a little more detail, and
in particular describe where it should and shouldn't be used.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:34 -07:00
Rob Norris
96f9d271ea zvol: remove the OS-side minor before freeing the zvol
When destroying a zvol, it is not "unpublished" from the system (that
is, /dev/zd* node removed) until zvol_os_free(). Under Linux, at the
time del_gendisk() and put_disk() are called, the device node may still
be have an active hold, from a userspace program or something inside the
kernel (a partition probe). As it is currently, this can lead to calls
to zvol_open() or zvol_release() while the zvol_state_t is partially or
fully freed. zvol_open() has some protection against this by checking
that private_data is NULL, but zvol_release does not.

This implements a better ordering for all of this by adding a new
OS-side method, zvol_os_remove_minor(), which is responsible for fully
decoupling the "private" (OS-side) objects from the zvol_state_t. For
Linux, that means calling put_disk(), nulling private_data, and freeing
zv_zso.

This takes the place of zvol_os_clear_private(), which was a nod in that
direction but did not do enough, and did not do it early enough.

Equivalent changes are made on the FreeBSD side to follow the API
change.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:21 -07:00
Rob Norris
b2c792778c zvol: generalise zvol_remove_minors_impl() for single zvol case
zvol_remove_minor_impl() and zvol_remove_minors_impl() should be
identical except for how they select zvols to remove, so lets just use
the same function with a flag to indicate if we should include children
and snapshots or not.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:11 -07:00
Rob Norris
6bb8fe5528 ZTS: stress test concurrent zvol create/destroy
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:05:34 -07:00
r-ricci
30a915efed
zfs-send.8: mention combination of -c/-e flags and zstd_compress feature
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Roberto Ricci <io@r-ricci.it>
Closes #17647
2025-08-19 10:56:58 -04:00
Ameer Hamza
7b54567c1f
trace_zil.h: rename zcw_zio_error to zcw_error
Rename `zcw_zio_error` to `zcw_error` in `trace_zil.h` that was missed
in commit f562e0f69. This fixes compilation errors exposed when building
with `--with-linux=`.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17654
2025-08-19 10:54:50 -04:00
Brian Behlendorf
f65321e30c
Add missing AC_MSG_RESULT
Output the result of the "iops->mkdir() returns struct dentry*"
check to cleanup the configure output.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17641
2025-08-15 13:18:37 -07:00
Tony Hutter
9b0a9b410e
CI: Add optional patch level, fix hostname on F42
In the past there have been times when we need to generate new RPMs
for an existing ZFS release.  Typically this happens when a new RHEL
version comes out and the kernel symbols no longer match.  To get
users to auto-update we just bump the patch number.  For example, we
had to create zfs-2.1.13-1 for EL8.8 and zfs-2.1.13-2 for EL8.9.

This commit adds an optional patch level text box to the github
package builder runner.

In addition, this commit also uses `hostnamectl` instead of `hostname`
for F42+ compatibility, if available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17638
2025-08-15 09:21:23 -07:00
Brian Behlendorf
5061f959d1
Retire zfs_autoimport_disable kmod option
Back in 2014 the zfs_autoimport_disable module option was added to
control whether the kmods should load the pool configs from the cache
file on module load.  The default value since that time has been for
the kernel to not process the cache file.

Detecting and importing pools during boot is now controlled outside
of the kmod on both Linux and FreeBSD.  By all accounts this has been
working well and we can remove this dormant code on the kernel side.

The spa_config_load() function is has been moved to userspace, it is
now only used by libzpool.  Additionally, the spa_boot_init() hook
which was used by FreeBSD now looks to be used and was removed.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17618
2025-08-14 14:58:58 -07:00
Alexander Motin
d151432073
ZIL: Make allocations more flexible
When ZIL allocates space for new LWBs without knowing how much it
will require, it can use new metaslab_alloc_range() function to
allocate slightly more or less than it predicted.  It allows to
improve space efficiency by allocating bigger LWBs on RAIDZ/dRAID
instead of padding and possibly packing more ZIL records there.
It may also allow to reduce ganging in some cases by allowing to
allocate smaller LWBs when we are not sure we'll need bigger.

On the opposite side, when we allocate space for already closed
LWBs, when we precisely know how much space we need, we may just
allocate what we need instead of relying on writing less than
allocated, that does not work for RAIDZ.

Space for LWBs in open state (still being filled) is allocated
same as before.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17613
2025-08-14 08:50:17 -07:00
Rob Norris
8d35a022e4
AUTHORS/mailmap: update with new contributors
Welcome to the house of fun! 🥳

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17634
2025-08-13 15:56:49 -07:00
Rob Norris
28433c4547 simd_stat: expose availability of VAES and VPCLMULQDQ
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joel Low <joel@joelsplace.sg>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17058
2025-08-13 14:53:24 -07:00
Rob Norris
930f9cc66c crypto_test: include AVX2 GCM implementation in tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joel Low <joel@joelsplace.sg>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17058
2025-08-13 14:52:46 -07:00
Joel Low
bb9225ea86 Backport AVX2 AES-GCM implementation from BoringSSL
This uses the AVX2 versions of the AESENC and PCLMULQDQ instructions; on
Zen 3 this provides an up to 80% performance improvement.

Original source:
d5440dd2c2/gen/bcm/aes-gcm-avx2-x86_64-linux.S

See the original BoringSSL commit at
3b6e1be439.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Joel Low <joel@joelsplace.sg>
Closes #17058
2025-08-13 14:51:20 -07:00
Alexander Motin
885d929cf8
Fix missed assertion update in physical rewrite patch
Physical rewrite patch changed the meaning of BP_GET_BIRTH(), but
I missed update one of its occurences, ending up asserting equal
logical birth times instead of equal physical birth times.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Fixes #17565
Closes #17631
2025-08-13 15:56:25 -04:00
Patrick Fasano
3cfd670e74
Add conflict/replacement with older SONAME libzfs and libzpool packages
In e8f0aa143e, the SONAMEs and package
names for libzfs and libzpool were bumped. The `contrib/debian/control`
file did not declare a conflict/replacement with the old package name.
This can cause dpkg to leave a system in an inconsistent state if the
old package is not manually uninstalled first.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Patrick Fasano <patrick@patrickfasano.com>
Closes #17586
2025-08-13 09:53:24 -07:00
Jitendra Patidar
077269bfed
Fix Assert in dbuf_undirty, which triggers during usage zap shrink
Usage zap's (DMU_*USED_OBJECT) are updated in syncing context via
do_userquota_cacheflush(). zap shrink triggers,
ASSERT(db->db_objset == dmu_objset_pool(db->db_objset)->dp_meta_objset
    || txg != spa_syncing_txg(dmu_objset_spa(db->db_objset)));

DMU_*USED_OBJECT are special object (DMU_OBJECT_IS_SPECIAL), gets
updated in syncing context only. So, relax assert for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #17602
2025-08-12 14:19:05 -07:00
Alan Somers
d3c1d27afd
zdb: better handling for corrupt block pointers
When dumping indirect blocks, attempt to print corrupt block pointers
rather than abort the program.  When corruption is detected zdb will
exit with an error code of 3.

Sponsored by:	ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17166
2025-08-12 14:16:37 -07:00
Rob Norris
f8bc01c79f
contrib/initramfs/scripts/zfs: shellcheck fixup
I got a newer shellcheck, and it pointed out that read without a target
variable is not POSIXly. The var was removed in c3ef9f7528, so I put it
back, and now shellcheck complains about an unused var. That's actually
correct, but necessary, so I've added a suppression for that, probably
better.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17626
2025-08-12 14:09:59 -07:00
Colin Percival
22671c4da4
FreeBSD 15.0 is now "PRERELEASE"
Chase URL change from the FreeBSD project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@tarsnap.com>
Closes #17617
2025-08-12 13:38:55 -07:00
Brian Behlendorf
152e34822b
Silence zstd large allocation warning
Allow zstd_mempool_init() to allocate using vmem_alloc() instead
of kmem_alloc() to silence the large allocation warning on Linux
during module load when the system has a large number of CPUs.

It's not at all clear to me that scaling the allocation size with
the number of CPUs is beneficial and that should be evaluated.
But for the moment this should resolve the warning without
introducing any unexpected side effects.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17620
Closes #11557
2025-08-12 13:38:08 -07:00
Brian Behlendorf
1ccae433e9
Allow vmem_alloc backed multilists
Systems with a large number of CPU cores (192+) may trigger the large
allocation warning in multilist_create() on Linux.  Silence the warning
by converting the allocation to vmem_alloc().

On Linux this results in a call to kvalloc() which will alloc vmem
for large allocations and kmem for small allocations.

On FreeBSD both vmem_alloc and kmem_alloc internally use the same
allocator so there is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17616
2025-08-12 13:36:03 -07:00
Alexander Motin
e0e60d319c
Better pack struct zio_prop
By using precisely sized fields it is possible to reduce the size
of this structure and respectively struct zio it is included into
by 40 bytes (from 92 to 52).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17619
2025-08-12 13:28:46 -07:00
Rob Norris
531568f438 zil_suspend: fix cookie leak if ZIL crashes during wait
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:32 -07:00
Rob Norris
7c9adc6858 zil_process_commit_list: fail better if the pool suspends in stall
Make sure we properly inform the nolwb waiters of the error, and don't
keep trying.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:27 -07:00
Rob Norris
f562e0f691 ZIL: single zil_commit_waiter_done() function to complete a waiter
Just making it easier to not get the locking and broadcast wrong.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:22 -07:00
Rob Norris
92da3e18c8 ZIL: flag crashed LWBs so we know not to process them
If the ZIL crashed, any outstanding LWBs are no longer interesting, so
if they return, we need to just clean them up and return, not try to do
any work on them. This is true even if they return success, as that may
be long after the pool suspended and resumed, depending on when/if the
kernel decides to return the IO to us. In particular, we must not try to
get the "next" LWB from zl_lwb_list, since they're no longer on that
list.

So, we put a flag on in-flight LWBs in zil_crash() when we move them
from zl_lwb_list to zl_lwb_crash_list, so we know what's going on when
they return.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:24:16 -07:00
Rob Norris
508c546975 ZIL: use a bitfield for LWB "slog" and "slim" state flags
I'm soon about to need another LWB flag, and boolean_t is just so big
for only storing a single bit. Changing to a bitfield is far less
wasteful.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17622
2025-08-12 13:23:59 -07:00
achill
6b3333de2d
Linux 6.16 compat: META
Update the META file to reflect compatibility with the 6.16
kernel.

Tested with 6.16.0-0-stable of Alpine Linux edge, see
<https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/87929>.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Achill Gilgenast <achill@achill.org>
Closes #17578
2025-08-11 16:30:09 -07:00
René Wirnata
1d0b94c4e7
zed: prettify slack notification message
This converts the body of a ZED slack notification from
plain text to code block style to help with readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: René Wirnata <rene.wirnata@pandascience.net>
Closes #17610
2025-08-11 09:44:51 -07:00
Rob Norris
2fd145b578
zvol: cleanup error handling and passthrough
This is trying to get all the uses and non-uses of SET_ERROR correct
(being: only call it if we're the originator of an error _within ZFS_),
and correctly negating errors going to/from the kernel. And/or both.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17605
2025-08-08 17:04:01 -07:00
Rob Norris
90a1e13df2 Linux: zfs_sync: remove explicit suspend check
Since zil_commit_flags(NOW) will always return error if the pool is
suspended, there's no need for a separate suspend check here.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:50 -07:00
Rob Norris
ef4058fcdc FreeBSD: zfs_putpage: handle page writeback errors
Page writeback is considered completed when the associated itx callback
completes. A syncing writeback will receive the error in its callback
directly, but an in-flight async writeback that was promoted to sync by
the ZIL may also receive an error.

Writeback errors, even syncing writeback errors, are not especially
serious on their own, because the error will ultimately be returned to
the zil_commit() caller, either zfs_fsync() for an explicit sync op (eg
msync()) or to zfs_putpage() itself for a syncing (VM_PAGER_PUT_SYNC)
writeback.

The only thing we need to do when a page writeback fails is to skip
marking the page clean ("undirty"), since we don't know if it made it to
disk yet. This will ensure that it gets written out again in the future,
either some scheduled async writeback or another explicit syncing call.

On the other side, we need to make sure that if a syncing op arrives,
any changes on dirty pages are written back to the DMU and/or the ZIL
first. We do this by starting an async writeback on the vnode cache
first, so any dirty data has been recorded in the ZIL, ready for the
followup zfs_sync()->zil_commit() to find.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:44 -07:00
Rob Norris
3d6ee9a68c Linux: zfs_putpage: handle page writeback errors
Page writeback is considered completed when the associated itx callback
completes. A syncing writeback will receive the error in its callback
directly, but an in-flight async writeback that was promoted to sync by
the ZIL may also receive an error.

Writeback errors, even syncing writeback errors, are not especially
serious on their own, because the error will ultimately be returned to
the zil_commit() caller, either zfs_fsync() for an explicit sync op (eg
msync()) or to zfs_putpage() itself for a syncing (WB_SYNC_ALL) writeback
(kernel housekeeping or sync_file_range(SYNC_FILE_RANGE_WAIT_AFTER).

The only thing we need to do when a page writeback fails is to re-mark
the page dirty, since we don't know if it made it to disk yet. This will
ensure that it gets written out again in the future, either some
scheduled async writeback or another explicit syncing call.

On the other side, we need to make sure that if a syncing op arrives,
any changes on dirty pages are written back to the DMU and/or the ZIL
first. We do this by starting an _async_ (WB_SYNC_NONE) writeback on the
file mapping at the start of the sync op (fsync(), msync(), etc). An
async op will get an async itx created and logged, ready for the
followup zfs_fsync()->zil_commit() to find, while avoiding a zil_commit()
call for every page in the range.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:38 -07:00
Rob Norris
391e85f519 ZIL: add zil_commit_flags() to make honouring failmode= optional
The vast majority of calls to zil_commit() follow VFS ops, and should
honour the failmode= setting - either wait for sync, or return error.
Some calls however are part of a larger syncing op, and shouldn't ever
block if something goes wrong.

To allow this, we introduce zil_commit_flags(), with a flag
ZIL_COMMIT_FAILMODE to indicate whether or not the pool failmode should
be honoured. zil_commit() is now a wrapper that always sets this flag,
but any caller wanting a different behaviour can request ZIL_COMMIT_NOW
instead to have the call return failure if the pool suspends, regardless
of the failmode= setting.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:33 -07:00
Rob Norris
72602f6ad9 ZIL: "crash" the ZIL if the pool suspends during fallback
If the ZIL runs into trouble, it calls txg_wait_synced(), which blocks
on suspend. We want it to not block on suspend, instead returning an
error. On the surface, this is simple: change all calls to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), and then thread the error
return back to the zil_commit() caller.

Handling suspension means returning an error to all commit waiters. This
is relatively straightforward, as zil_commit_waiter_t already has
zcw_zio_error to hold the write IO error, which signals a fallback to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), which will fail, and so the
waiter can now return an error from zil_commit().

However, commit waiters are normally signalled when their associated
write (LWB) completes. If the pool has suspended, those IOs may not
return for some time, or maybe not at all. We still want to signal those
waiters so they can return from zil_commit(). We have a list of those
in-flight LWBs on zl_lwb_list, so we can run through those, detach them
and signal them. The LWB itself is still in-flight, but no longer has
attached waiters, so when it returns there will be nothing to do.

(As an aside, ITXs can also supply completion callbacks, which are
called when they are destroyed. These are directly connected to LWBs
though, so are passed the error code and destroyed there too).

At this point, all ZIL waiters have been ejected, so we only have to
consider the internal state. We potentially still have ITXs that have
not been committed, LWBs still open, and LWBs in-flight. The on-disk ZIL
is in an unknown state; some writes may have been written but not
returned to us. We really can't rely on any of it; the best thing to do
is abandon it entirely and start over when the pool returns to service.
But, since we may have IO out that won't return until the pool resumes,
we need something for it to return to.

The simplest solution I could find, implemented here, is to "crash" the
ZIL: accept no new ITXs, make no further updates, and let it empty out
on its normal schedule, that is, as txgs complete and zil_sync() and
zil_clean() are called. We set a "restart txg" to three txgs in the
future (syncing + TXG_CONCURRENT_STATES), at which point all the
internal state will have been cleared out, and the ZIL can resume
operation (handled at the top of zil_clean()).

This commit adds zil_crash(), which handles all of the above:
 - sets the restart txg
 - capture and signal all waiters
 - zero the header

zil_crash() is called when txg_wait_synced_flags(TXG_WAIT_SUSPEND)
returns because the pool suspended (ESHUTDOWN).

The rest of the commit is just threading the errors through, and related
housekeeping.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:26 -07:00
Rob Norris
99a5f5d1ba ZIL: pass commit errors back to ITX callbacks
ITX callbacks are used to signal that something can be cleaned up after
a itx is committed. Presently that's only used when syncing out mapped
pages (msync()) to mark dirty pages clean.

This extends the callback interface so it can be passed an error, and
take a different cleanup action if necessary.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:20 -07:00
Rob Norris
967b15b888 ZIL: allow zil_commit() to fail with error
This changes zil_commit() to have an int return, and updates all callers
to check it. There are no corresponding internal changes yet; it will
always return 0.

Since zil_commit() is an indication that the caller _really_ wants the
associated data to be durability stored, I've annotated it with the
__warn_unused_result__ compiler attribute (via __must_check), to emit a
warning if it's ever ussd without doing something with the return code.
I hope this will mean we never misuse it in the future.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:43:09 -07:00
Rob Norris
1f8c39ddb2 ZTS: test response of various sync methods under different failmodes
These are all the same shape: set up the pool to suspend on first write,
then perform some write+sync operation. The pool should suspend, and the
sync operation should respond according to the failmode= property.

We test fsync(), msync() and two forms of write() (open with O_SYNC, and
async with sync=always), which all take slightly different paths to
zil_commit() and back.

A helper function is included to do the write+sync sequence with mmap()
and msync(), since I didn't find a convenient tool to do that.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17398
2025-08-08 16:42:35 -07:00
Rob Norris
b270663e8a
linux/zvol_os: fix crash with blk-mq on Linux 4.19
03987f71e3 (#16069) added a workaround to get the blk-mq hardware
context for older kernels that don't cache it in the struct request.
However, this workaround appears to be incomplete.

In 4.19, the rq data context is optional. If its not initialised, then
the cached rq->cpu will be -1, and so using it to index into mq_map
causes a crash.

Given that the upstream 4.19 is now in extended LTS and rarely seen,
RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached
hardware context has been available since 5.1, I'm not going to huge
lengths to get queue selection correct for the very few people that are
likely to feel it. To that end, we simply call raw_smp_processor_id() to
get a valid CPU id and use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17597
2025-08-08 09:39:14 -07:00
Rob Norris
82d6f7b047 Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:42 -07:00
Rob Norris
f7bdd84328 Prefer VERIFY0P(n) over VERIFY(n == NULL)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:37 -07:00
Rob Norris
611b95da18 Prefer VERIFY0(n) over VERIFY3S(n, ==, 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:32 -07:00
Rob Norris
5c7df3bcac Prefer VERIFY0(n) over VERIFY3U(n, ==, 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:41:25 -07:00
Rob Norris
c39e076f23 Prefer VERIFY0(n) over VERIFY(n == 0)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17591
2025-08-07 11:40:59 -07:00
Todd Zullinger
2564308cb2
rpm: don't list /sbin/zgenhostid twice in %files
The location of zgenhostid was changed in 0ae733c7a (Install zgenhostid
to sbindir, 2021-01-21).  We include all files within sbindir two lines
earlier, which causes rpmbuild to report:

    File listed twice: /sbin/zgenhostid

Drop the redundant entry from the %files section.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Closes #17601
2025-08-07 11:39:56 -07:00
Rob Norris
e44e51f28d zvol_task_report_status: gate behind ZFS_DEBUG
dprintf() is a no-op in production builds, giving a compile warning. So,
refactor it a little to keep all the strings inside the function, and
then make the function a no-op when ZFS_DEBUG is not set.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:36:15 -07:00
Rob Norris
e6eb03a991 zvol_check_volblocksize: fix spa ref leak
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:36:09 -07:00
Rob Norris
3e671f2353 zvol: remove void return casts on void-returning functions
Casting unused returns to (void) is already of dubious value, but it's
entirely meaningless on functions that are defined as void return.
Remove the clutter.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Closes #17596
2025-08-07 11:34:20 -07:00
Mariusz Zaborski
0c376d0f59
Document the new '-a' zpool option
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17585
2025-08-06 17:11:47 -07:00
Alek P
3e004369f7
Removed unused zio_decompress_fail_fraction variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Closes #17599
2025-08-06 17:10:03 -07:00
Attila Fülöp
25930cb8a1 config: Avoid void main() in toolchain-simd.m4
Be standard-compliant by using `int main()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-06 14:35:37 -07:00
Attila Fülöp
03592417cb SIMD: Don't require definition of HAVE_XSAVE
Currently we fail the compilation via the #error directive if
`HAVE_XSAVE` isn't defined. This breaks i586 builds since we check
the toolchains SIMD support only on i686 and onward.

Remove the requirement to fix the build on i586.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13303
Closes #17590
2025-08-06 14:34:53 -07:00
Alexander Motin
8302b6e32b
Some documentation polishing for log vdevs
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17592
2025-08-06 10:45:45 -07:00
Alexander Motin
60f714e6e2 Implement physical rewrites
Based on previous commit this implements `zfs rewrite -P` flag,
making ZFS to keep blocks logical birth times while rewriting
files.  It should exclude the rewritten blocks from incremental
sends, snapshot diffs, etc.  Snapshots space usage same time will
reflect the additional space usage from newly allocated blocks.

Since this begins to use new "rewrite" flag in the block pointers,
this commit introduces a new read-compatible per-dataset feature
physical_rewrite.  It must be enabled for the command to not fail,
it is activated on first use and deactivated on deletion of the
last affected dataset.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17565
2025-08-06 10:36:56 -07:00
Alexander Motin
4ae8bf406b Allow physical rewrite without logical
During regular block writes ZFS sets both logical and physical
birth times equal to the current TXG.  During dedup and block
cloning logical birth time is still set to the current TXG, but
physical may be copied from the original block that was used.
This represents the fact that logically user data has changed,
but the physically it is the same old block.

But block rewrite introduces a new situation, when block is not
changed logically, but stored in a different place of the pool.
From ARC, scrub and some other perspectives this is a new block,
but for example for user applications or incremental replication
it is not.  Somewhat similar thing happen during remap phase of
device removal, but in that case space blocks are still acounted
as allocated at their logical birth times.

This patch introduces a new "rewrite" flag in the block pointer
structure, allowing to differentiate physical rewrite (when the
block is actually reallocated at the physical birth time) from
the device reval case (when the logical birth time is used).

The new functionality is not used at this point, and the only
expected change is that error log is now kept in terms of physical
physical birth times, rather than logical, since if a block with
logged error was somehow rewritten, then the previous error does
not matter any more.

This change also introduces a new TRAVERSE_LOGICAL flag to the
traverse code, allowing zfs send, redact and diff to work in
context of logical birth times, ignoring physical-only rewrites.
It also changes nothing at this point due to lack of those writes,
but they will come in a following patch.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17565
2025-08-06 10:36:07 -07:00
Mariusz Zaborski
894edd084e
Add TXG timestamp database
This feature enables tracking of when TXGs are committed to disk,
providing an estimated timestamp for each TXG.

With this information, it becomes possible to perform scrubs based
on specific date ranges, improving the granularity of data
management and recovery operations.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16853
2025-08-06 10:31:21 -07:00
Rob Norris
c3496b5cc6 Linux: zfs_putpage: document (and fix!) confusing sync/commit modes
The structure of zfs_putpage() and its callers is tricky to follow.
There's a lot more we could do to improve it, but at least now we have
some description of one of the trickier bits.

Writing this exposed a very subtle bug: most async pages pushed out
through zpl_putpages() would go to the ZIL with commit=false, which can
yield a less-efficient write policy. So this commit updates that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-06 09:55:58 -07:00
Rob Norris
fb7a8503bc Linux: zfs_putpage: complete async page writeback immediately
For async page writeback, we do not need to wait for the page to be on
disk before returning to the caller; it's enough that the data from the
dirty page be on the DMU and in the in-memory ZIL, just like any other
write.

So, if this is not a syncing write, don't add a callback to the itx, and
instead just unlock the page immediately.

(This is effectively the same concept used for FreeBSD in d323fbf49c).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
Closes #14290
2025-08-06 09:55:50 -07:00
Rob Norris
a18c9edda6 Linux: sync: remove async/sync accounting
All this machinery is there to try to understand when there an async
writeback waiting to complete because the intent log callbacks are still
outstanding, and force them with a timely zil_commit(). The next commit
fixes this properly, so there's no need for all this extra housekeeping.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-06 09:54:30 -07:00
Rob Norris
7ac5440ecf ZTS: mmap_ftruncate test to confirm async writeback behaviour
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17584
2025-08-06 09:54:05 -07:00
Paul Dagnelie
31c4fa93bb Fix dynamic gang block headers on raidz and mirror devices
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17587
2025-08-06 09:50:58 -07:00
Paul Dagnelie
8ecf044d62 Improve and fix gang blocks dyn header test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17587
2025-08-06 09:50:24 -07:00
Rob Norris
2c8beeece0 CI: match and trim out internal timestamp for test prefix
Adjust the regexes to match the test line with timestamps, then remove
them for the summary. The internal timestamp is still in the full logs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-06 09:45:17 -07:00
Rob Norris
48c9b2e79d ZTS: include microsecond timestamps on all output
When reviewing test output after a failure, it's often quite difficult
to work out the order and timing of events, and to correlate test suite
output with kernel logs.

This adds timestamps to ZTS output to help with this, in three places:

- all of the standard log_XXX functions ultimately end up in _printline,
  which now prefixes output with a timestamp. An escape hatch
  environment variable is provided for user_cmd, which often calls the
  logging functions while also depending on the captured output.

- the test runner logging function log() also now prefixes its output
  with a timestamp.

- on failure, when capturing the kernel log in zfs_dmesg.ksh, the "iso"
  time format is requested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17045
2025-08-06 09:44:49 -07:00
Fedor Uporov
0b6fd024a7
ZVOL: Unify zvol minors operations and improve error handling
Now zvol minors creation logic is passed thru spa_zvol_taskq, like it
is doing for remove/rename zvol minors functions. Appropriate
zvol minors creation functions are refactored:
- The zvol_create_minor()/zvol_minors_create_recursive() were removed.
- The single zvol_create_minors() is added instead.

Also, it become possible to collect zvol minors subtasks status, to
detect, if some zvol minor subtask is failed in the subtasks chain.
The appropriate message is reported to zfs_dbgmsg buffer in this case.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17575
2025-08-06 10:10:52 -04:00
khoang98
0f8a1105ee
Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread
Avoid calling dbuf_evict_one() from memory reclaim contexts (e.g. Linux
kswapd, FreeBSD pagedaemon). This prevents deadlock caused by reclaim
threads waiting for the dbuf hash lock in the call sequence:
dbuf_evict_one -> dbuf_destroy -> arc_buf_destroy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Kaitlin Hoang <kthoang@amazon.com>
Closes #17561
2025-08-01 16:47:41 -07:00
Rob Norris
1aec627c60
linux/atomic: fill out API for atomic pointer ops
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17580
2025-07-31 15:51:47 -07:00
Fedor Uporov
92da9e0e93
ZVOL: Implement zvol_alloc() function on FreeBSD side
Implement zvol_alloc() function on FreeBSD side to increase code base
compatibility with Linux. Also, fix issue with late returning in case
if volmode=none.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17482
2025-07-31 11:02:09 -04:00
Igor Ostapenko
cb5e7e097d
range_tree: Provide more debug details upon unexpected add/remove
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17581
2025-07-31 10:44:42 -04:00
rmacklem
2957eabbef
Add support for FreeBSD's Solaris style extended attribute interface
FreeBSD commit 2ec2ba7e232d added the Solaris style syscall interface
for extended attributes.  This patch wires this interface into the
FreeBSD ZFS port, since this style of extended attributes is supported
by OpenZFS internally when the "xattr" property is set to "dir".

Some specific changes:
LOOKUP_NAMED_ATTR is defined to indicate the need to set V_NAMEDATTR
for calls to zfs_zaccess().
V_NAMEDATTR indicates that the access checking does need to be done
for FreeBSD.

The access checking code for extended attributes was copy/pasted from
the Linux port into zfs_zaccess() in the FreeBSD port.

Most of the changes are in zfs_freebsd_lookup() and
zfs_freebsd_create().
The semantics of these functions should remain unchanged unless named
attributes are being manipulated.

All the code changes are enabled for __FreeBSD_version 1500040 and
newer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17540
2025-07-30 09:49:43 -07:00
Fedor Uporov
dea0fc969b
ZVOL: Return early, if volmode is ZFS_VOLMODE_NONE on FreeBSD side
Return from zvol_os_create_minor() function immediately after
dsl_prop_get_integer() call if volmode property value is set to
'none', like it is doing on Linux side.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17405
2025-07-30 09:46:34 -07:00
Richard Yao
ce9c3b4b94
Add CodeQL mismatched dsl_dataset_hold/_rele pairs check
This check is currently limited to checking mismatches that occur in the
same stack frame. It does not detect across stack frames.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17352
2025-07-30 09:45:28 -07:00
Alexander Motin
f70c85086b
BRT: Fix ZAP entry endianness
During original block cloning implementation a mistake was made,
making BRT ZAP entries an array of 8 1-byte entries instead of 1
entry of 8 bytes. This makes the pools non-endian-safe.

This commit introduces a new read-compatible pool feature
"com.truenas:block_cloning_endian", fixing the endianness issue
for new pools while maintaining compatibility with existing ones.

The feature is automatically activated when creating the first BRT
ZAP (ensuring we don't activate it on pools that already have BRT
entries in the old format).  When active, BRT entries are stored
as single 8-byte values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17572
2025-07-30 09:42:47 -07:00
Tino Reichardt
10a78e2647
Faster checksum benchmark on system boot
While booting, only the needed 256KiB benchmarks are done now.

The delay for checking all checksums occurs when requested via:
- Linux: cat /proc/spl/kstat/zfs/chksum_bench
- FreeBSD: sysctl kstat.zfs.misc.chksum_bench

Reported by: Lahiru Gunathilake <gunathilakebllg@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Colin Percival <cperciva@tarsnap.com>
Closes #17563
Closes #17560
2025-07-29 17:09:48 -07:00
Akash B
b6e8db509d
zpool/zfs: Add '-a|--all' option to scrub, trim, initialize
Add support for the '-a | --all' option to perform trim,
scrub, and initialize operations on all pools.
Previously, specifying a pool name was mandatory for
these operations. With this enhancement, users can now
execute these operations across all pools at once,
without needing to manually iterate over each pool
from the command line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #17524
2025-07-29 14:50:44 -07:00
Paul Dagnelie
fc885f308f
Don't use wrong weight when passivating group
When we're passivating a metaslab group we start by passivating the 
metaslabs that have been activated for each of the allocators.  To do 
that, we need to provide a weight. However, currently this erroneously 
always uses a segment-based weight, even if segment-based weighting is 
disabled.

Use the normal weight function, which will decide which type of weight 
to use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17566
2025-07-29 14:28:01 -07:00
Brian Behlendorf
f23e040a37
CI: Remove Debian backports
The latest Debian 11 image includes bullseye-backports as a default
repository in the /etc/apt/sources.list.  However, this repository
has gone end of life which effectively breaks the default install.

We shouldn't need anything in backports so lets unconditionally
remove backports on all Debian builders to resolve the issue.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17569
2025-07-25 15:47:21 -07:00
Brian Behlendorf
cf146460c1
Default to zfs_bclone_wait_dirty=1
Update the default FICLONE and FICLONERANGE ioctl behavior to wait
on dirty blocks.  While this does remove some control from the
application, in practice ZFS is better positioned to the optimial
thing and immediately force a TXG sync.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17455
2025-07-25 10:42:23 -04:00
Andriy Tkachuk
4bd7a2eaa5
zdb: fix checksum calculation for decompressed blocks
Currently, when reading compressed blocks with -R and decompressing
them with :d option and specifying lsize, which is normally bigger
than psize for compressed blocks, the checksum is calculated on
decompressed data. But it makes no sense since zfs always calculates
checksum on physical, i.e. compressed data. So reading the same block
produces different checksum results depending on how we read it,
whether we decompress it or not, which, again, makes no sense.

Fix: use psize instead of lsize when calculating the checksum so that
it is always calculated on the physical block size, no matter was it
compressed or not.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17547
2025-07-24 21:24:15 -04:00
Ameer Hamza
a8646a8186
ZED: Fix device type detection and pool iteration logic
During hotplug REMOVED events, devid matching fails for partition-based
spares because devid information is not stored in pool config for
partitioned devices. However, when devid is populated by the hotplug
event, the original code skipped the search logic entirely, skipping
vdev_guid matching and resulting in wrong device type detection that
caused spares to be incorrectly identified as l2arc devices.
Additionally, fix zfs_agent_iter_pool() to use the return value from
zfs_agent_iter_vdev() instead of relying on search parameters, which
was previously ignored. Also add pool_guid optimization to enable
targeted pool searching when pool_guid is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17545
2025-07-24 15:47:46 -07:00
Coleman Kane
5a9b9c7f87
linux: Fix out-of-src builds
The linux kernel modules haven't been building successfully when the
build occurs in a separate directory than the source code, which is a
common build pattern in Linux. Was not able to determine the root cause,
but the %.o targets in subdirectories are no longer being matched by the
pattern targets in the Linux Kbuild system. This change fixes the issue
by dynamically creating the missing ones inside our Kbuild.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #17517
2025-07-24 15:38:58 -07:00
Rob Norris
00ce064d8f
spa: update blkptr diagram to include vdev padding on encrypted blocks
Probably just an oversight in 4d044c4c1d. SPA_VDEVBITS is always 24,
regardless of whether or not the bp is for an encrypted block, and it
wouldn't make sense for it to be different anyway.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17564
2025-07-24 09:50:23 -04:00
Rob Norris
bf38c15071 everywhere: misc unnecessary var init/update
These are all cases where we initialise or update a variable, and then
never use it. None of them particularly matter, as the compiler should
optimise them all away during dead store elimination, but some static
analysers complain about them and they are extra work for casual readers
to follow, so worth removing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:23:58 -07:00
Rob Norris
d2b9e66b88 vdev_raidz: asize/psize: remove unnecessary var initialisation
It would have been optimised away anyway so it doesn't matter, but it
does make things a little tougher to read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:23:51 -07:00
Rob Norris
e9d249d7e4 test/draid: fix error return
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:23:47 -07:00
Rob Norris
2755e2aa60 spa_activity_check: narrow scope of MMP vars
They aren't used outside these very small blocks, and their initial
values are never used at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:23:07 -07:00
Rob Norris
9292071565 linux/kmem: remove HAVE_ATOMIC64_T and kmem_alloc_used wrappers
Seems like we haven't set it since the SPL was pulled into the main ZFS
tree. In removing the define, I've taken the 64-bit version (ie the one
that _hasn't_ been running since back then) because it looks like its
closer to the intended width by the way its used.

Since the macros ar eno longer needed as a selector, pull those too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:08:07 -07:00
Rob Norris
1c483cf3d0 linux/kmem: remove long-obsolete __GFP compat flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:07:53 -07:00
Rob Norris
96d20d7d59 linux/kmem: remove PF_FSTRANS and PF_MEMALLOC_NOIO compat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2025-07-22 15:07:36 -07:00
shodanshok
cecff09faa
add uncompressed_size to arc_summary
Add uncompressed ARC size to statistics reported by arc_summary.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17556
2025-07-22 15:06:09 -07:00
shodanshok
a7a144e655
enforce arc_dnode_limit
Linux kernel shrinker in the context of null/root memcg does not scan
dentry and inode caches added by a task running in non-root memcg. For
ZFS this means that dnode cache routinely overflows, evicting valuable
meta/data and putting additional memory pressure on the system.

This patch restores zfs_prune_aliases as fallback when the kernel
shrinker does nothing, enabling zfs to actually free dnodes. Moreover,
it (indirectly) calls arc_evict when dnode_size > dnode_limit.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17487
Closes #17542
2025-07-21 10:32:01 -07:00
Alexander Motin
be1e991a1a
Allow and prefer special vdevs as ZIL
Before this change ZIL blocks were allocated only from normal or
SLOG vdevs.  In typical situation when special vdevs are SSDs and
normal are HDDs it could cause weird inversions when data blocks
are written to SSDs, but ZIL referencing them to HDDs.

This change assumes that special vdevs typically have much better
(or at least not worse) latency than normal, and so in absence of
SLOGs should store ZIL blocks.  It means similar to normal vdevs
introduction of special embedded log allocation class and updating
the allocation fallback order to: SLOG -> special embedded log ->
special -> normal embedded log -> normal.

The code tries to guess whether data block is going to be written
to normal or special vdev (it can not be done precisely before
compression) and prefer indirect writes for blocks written to a
special vdev to avoid double-write.  For blocks that are going to
be written to normal vdev, special vdev by default plays as SLOG,
reducing write latency by the cost of higher special vdev wear,
but it is tunable via module parameter.

This should allow HDD pools with decent SSD as special vdev to
work under synchronous workloads without requiring additional
SLOG SSD, impractical in many scenarios.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17505
2025-07-18 18:44:14 -07:00
Chunwei Chen
2669b00f13
Define sops->free_inode() to prevent use-after-free during lookup
On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can
be dereferenced without refcounts and locks. For this reason, dentry and
inode must only be freed after RCU grace period.

However, zfs currently frees inode in zfs_inode_destroy synchronously
and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on
Linux 5.2 and after, if we define sops->free_inode(), the kernel will do
call_rcu() for us.

This issue may be triggered more easily with init_on_free=1 boot
parameter:

BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:selinux_inode_permission+0x10e/0x1c0
Call Trace:
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? security_inode_permission+0x37/0x60
 ? __die_body.cold+0x8/0xd
 ? no_context+0x113/0x220
 ? exc_page_fault+0x6d/0x130
 ? asm_exc_page_fault+0x1e/0x30
 ? selinux_inode_permission+0x10e/0x1c0
 security_inode_permission+0x37/0x60
 link_path_walk.part.0.constprop.0+0xb5/0x360
 ? path_init+0x27d/0x3c0
 path_lookupat+0x3e/0x1a0
 filename_lookup+0xc0/0x1d0
 ? __check_object_size.part.0+0x123/0x150
 ? strncpy_from_user+0x4e/0x130
 ? getname_flags.part.0+0x4b/0x1c0
 vfs_statx+0x72/0x120
 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
 __do_sys_newlstat+0x39/0x70
 ? __x64_sys_ioctl+0x8d/0xd0
 do_syscall_64+0x30/0x40
 entry_SYSCALL_64_after_hwframe+0x62/0xc7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17546
2025-07-18 08:45:13 -07:00
Alexander Motin
d7ab07dfb4
ZIL: Force writing of open LWB on suspend
Under parallel workloads ZIL may delay writes of open LWBs that
are not full enough.  On suspend we do not expect anything new to
appear since zil_get_commit_list() will not let it pass, only
returning TXG number to wait for.  But I suspect that waiting for
the TXG commit without having the last LWB issued may not wait for
its completion, resulting in panic described in #17509.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17521
2025-07-17 15:31:19 -07:00
Paul Dagnelie
c1e51c55f5
Correct weight recalculation of space-based metaslabs
Currently, after a failed allocation, the metaslab code recalculates the
weight for a metaslab. However, for space-based metaslabs, it uses the
maximum free segment size instead of the normal weighting
algorithm. This is presumably because the normal metaslab weight is
(roughly) intended to estimate the size of the largest free segment, but
it doesn't do that reliably at most fragmentation levels. This means
that recalculated metaslabs are forced to a weight that isn't really
using the same units as the rest of them, resulting in undesirable
behaviors. We switch this to use the normal space-weighting function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.
Closes #17531
2025-07-16 10:20:57 -07:00
Paul Dagnelie
b21e04e8d9
Fix zdb pool/ with -k
When examining the root dataset with zdb -k, we get into a mismatched
state. main() knows we are not examining the whole pool, but it strips
off the trailing slash. import_checkpointed_state() then thinks we are
examining the whole pool, and does not update the target path
appropriately. The fix is to directly inform import_checkpointed_state
that we are examining a filesystem, and not the whole pool.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17536
2025-07-15 17:01:49 -07:00
Rob Norris
d323fbf49c FreeBSD: zfs_putpages: don't undirty pages until after write completes
In syncing mode, zfs_putpages() would put the entire range of pages onto
the ZIL, then return VM_PAGER_OK for each page to the kernel. However,
an associated zil_commit() or txg sync had not happened at this point,
so the write may not actually be on disk.

So, we rework that case to use a ZIL commit callback, and do the
post-write work of undirtying the page and signaling completion there.
We return VM_PAGER_PEND to the kernel instead so it knows that we will
take care of it.

The original version of this (238eab7dc1) copied the Linux model and did
the cleanup in a ZIL callback for both async and sync. This was a
mistake, as FreeBSD does not have a separate "busy for writeback" flag
like Linux which keeps the page usable. The full sbusy flag locks the
entire page out until the itx callback fires, which for async is after
txg sync, which could be literal seconds in the future.

For the async case, the data is already on the DMU and the in-memory
ZIL, which is sufficient for async writeback, so the old method of
logging it without a callback, undirtying the page and returning is more
than sufficient and reclaims that lost performance.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-07-15 15:58:15 -07:00
Mark Johnston
ee2a2d941a Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes"
This causes async putpages to leave the pages sbusied for a long time,
which hurts concurrency.  Revert for now until we have a better
approach.

This reverts commit 238eab7dc1.

Reported by:    Ihor Antonov <ngor@hugpoint.tech>
Discussed with: Rob Norris <rob.norris@klarasystems.com>

References: freebsd/freebsd-src@738a9a7
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Ported-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17533
2025-07-15 15:58:11 -07:00
Rob Norris
1b84bd1dff ZTS: test that zdb can work with libzpool tunables
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17537
2025-07-15 15:47:08 -07:00
Rob Norris
fce18e04d5 libzpool: tunable-based option interface for zdb/ztest
Removes the old dlsym() based option setter and adds a new
function handle_tunable_option() that can set, get and list all the
tunables in the system. And then wire it up to zdb and ztest.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17537
2025-07-15 15:47:03 -07:00
Rob Norris
cb9742e532 libspl: add API for manipulating tunables
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17537
2025-07-15 15:46:58 -07:00
Rob Norris
967ce75669 libspl: implement ZFS_MODULE_PARAM for userspace
For each tunable declaration, we create a zfs_tunable_t with its
details, and then a pointer to it in the 'zfs_tunables' ELF section,
that we can access later with a little support from the linker.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17537
2025-07-15 15:46:51 -07:00
Rob Norris
3a494c6d2a mod.h: make consistent across all three platforms
mod.h only exists to include the platform-specific mod_os.h, so we can
get rid of it and just call the platform header mod.h.

Then, create a libspl mod.h, and move the relevant items to it so we can
start building on it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17537
2025-07-15 15:46:14 -07:00
Carl George
fe3b2b76cf
CI: Add CentOS Stream 9/10 to the FULL_OS runner list
Testing on CentOS Stream provides several months advance notice of
changes coming to the RHEL kernel.  This should help OpenZFS be
proactive instead of reactive to new RHEL minor versions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Carl George <carlwgeorge@gmail.com>
ZFS-CI-Type: full
Closes #16904
Closes #17526
2025-07-15 10:00:35 -07:00
Attila Fülöp
8de8e0df9f
objtool wrapper: use absolute path to call the wrapper
Older kernel versions run make outside of the build directory. This
works since all paths are absolute. Relative paths will fail in such
a scenario.

Use an absolute path to the objtool wrapper as well, since the
relative path breaks the build on older kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17541
2025-07-14 15:10:02 -07:00
Tino Reichardt
2461e6f636
Delete unused .cirrus.yml
The Cirrus_CI was planned for testing FreeBSD, but never really used I
think. Currently it's not needed anymore, so remove it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17155
Closes #17535
2025-07-11 08:49:06 -07:00
Tino Reichardt
d6dcae3166
ZTS: Fix FreeBSD 15.0 ksh errors
The package ksh93 is replaced by ksh now.
This works for FreeBSD 13 and 14 also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17523
2025-07-09 14:40:32 -07:00
Alexander Motin
f66b57c87d
CI: Switch from FreeBSD 13.4 to 13.5
FreeBSD 13.4 is EOL since June 30, 2025.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #17519
2025-07-09 14:38:32 -07:00
Brian Behlendorf
ea38787f2e
Revert "Fix incorrect expected error in ztest"
This reverts commit 2076011e0c.  The
comment which explains EINVAL should be expected for this case was
wrong, not the code.  The kernel will return ENOTSUP when attaching
a distributed spare to the wrong top-level dRAID vdev.  See the
check for this in spa_vdev_attach().

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17503
2025-07-09 14:34:02 -07:00
Paul Dagnelie
a981cb69e4 Implement dynamic gang header sizes
ZFS gang block headers are currently fixed at 512 bytes. This is
increasingly wasteful in the era of larger disk sector sizes. This PR
allows any size allocation to work as a gang header. It also contains
supporting changes to ZDB to make gang headers easier to work with.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17004
2025-07-09 14:02:53 -07:00
Paul Dagnelie
e845be28e7 Add no-upgrade featureflag
Adds a featureflag that is not enabled during upgrades unless listed
explicitly. This is useful for features that could cause issues unless
applied carefully; for example, a feature that could make a root pool
unbootable if bootloaders don't yet have support for it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17004
2025-07-09 14:01:59 -07:00
rmacklem
4c2a7f85d5
FreeBSD: Add support for _PC_HAS_HIDDENSYSTEM
In FreeBSD there is now a pathconf name _PC_HAS_HIDDENSYSTEM.
This patch adds support for it to OpenZFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17518
2025-07-08 22:11:22 -04:00
Ameer Hamza
523d9d6007
Validate mountpoint on path-based unmount using statx
Use statx to verify that path-based unmounts proceed only if the
mountpoint reported by statx matches the MNTTAB entry reported by
libzfs, aborting the operation if they differ. Align
`zfs umount /path` behavior with `zfs umount dataset`.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17481
2025-07-08 22:10:00 -04:00
Rob Norris
6af8db61b1
metaslab: don't pass whole zio to throttle reserve APIs
They only need a couple of fields, and passing the whole thing just
invites fiddling around inside it, like modifying flags, which then
makes it much harder to understand the zio state from inside zio.c.

We move the flag update to just after a successful throttle in zio.c.

Rename ZIO_FLAG_IO_ALLOCATING to ZIO_FLAG_ALLOC_THROTTLED
Better describes what it means, and makes it look less like
IO_IS_ALLOCATING, which means something different.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17508
2025-07-04 23:22:22 -04:00
Rob Norris
92d3b4ee2c
zio: rename io_reexecute as io_post; use it for the direct IO checksum error flag
We're not supposed to modify someone else's io_flags, so we need another
way to propagate DIO_CHKSUM_ERR.

If we squint, we can see that io_reexecute is really just recording
exceptional events that a parent (or its parents) will need to do
something about. It just happens that the only things we've had
historically are two forms of reexecution: now or later (suspend).

So, rename it to io_post, as in, post-IO info/events/actions. And now we
have a few spare bits for other conditions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17507
2025-07-04 23:16:14 -04:00
Igor Ostapenko
ee0cb4cb89
ztest: Fix false positive of ENOSPC handling
Before running a pass zs_enospc_count is checked to free up some space
by destroying a random dataset. But the space freed may still be not
re-usable during the TXG_DEFER window breaking the next dataset creation
in ztest_generic_run().
    
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17506
2025-07-03 16:00:13 -07:00
Meriel Luna Mittelbach
d411ea2e4d
Add templated zfs-mount@.service
Runs `zfs mount -R <dataset>` at boot, after `zfs mount -a`.
Intended to replace `mountpoint=legacy` in certain mount setups.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Meriel Luna Mittelbach <lunarlambda@gmail.com>
Closes #17483
2025-07-03 14:24:07 -07:00
Brian Behlendorf
c98a393cb6
CI: run ztest on compressed zpool
When running ztest under the CI a common failure mode is for the
underlying filesystem to run out of available free space.  Since
the storage associated with a GitHub-hosted running is fixed, we
instead create a pool and use a compressed ZFS dataset to store
the ztest vdev files.  This significantly increases the available
capacity since the data written by ztest is highly compressible.
A compression ratio of over 40:1 is conservatively achieved using
the default lz4 compression.  Autotrimming is enabled to ensure
freed blocks are discarded from the backing cipool vdev file.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17501
2025-07-03 10:27:05 -07:00
Alexander Motin
4e92aee233
Relax special_small_blocks restrictions
special_small_blocks is applied to blocks after compression, so it
makes no sense to demand its values to be power of 2.  At most
they could be multiple of 512, but that would still buy us nothing,
so lets allow them be any within SPA_MAXBLOCKSIZE.

Also special_small_blocks does not really need to depend on the
set recordsize, enabled pool features or presence of special vdev.
At worst in any of those cases it will just do nothing, so we
should not complicate users lives by artificial limitations.

While there, polish comments for recordsize and volblocksize.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17497
2025-07-02 11:11:37 -07:00
Martin Rüegg
17ee0fd4fa pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4
71216b91d2 introduced a regression
on debian/ubuntu systems during build.

The reason being, that building the RPM for pyzfs was using
a different library path than building the library itself.
This is now harmonized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #16155
Closes #17480
2025-07-02 09:58:22 -07:00
Martin Rüegg
6d838ec0b6 pyzfs: Update ax_python_devel.m4 to serial 37
Fixes an obvious typo, where a variable was missing the required
leading dollar sign ($)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #17480
2025-07-02 09:57:50 -07:00
Alexander Motin
bf846dcb7d
Release topology restrictions on special/dedup
Special vdevs were originally designed as a small blocks storage
for dRAID, for which role RAIDZ/dRAID topologies are not good.
But it is more often used as SSD storage for metadata and hot
data of HDD pools.  In these use cases narrow RAIDZ of SSDs might
be fine, so we should not introduce unnecessary restrictions,
and ZFS internally does not care.

Similar applies to dedup vdevs.  Original DDT used 4KB blocks,
for which anything but mirror was a terrible storage.  But new
FDT implementation uses 32KB blocks by default, which are much
less demanding even including compression, and which could be
increased even higher now, if needed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17496
2025-07-02 09:33:47 -07:00
Chunwei Chen
eacf618a65
Missing tests in make pkg
```
Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this
run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed
verification.
```

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17491
2025-06-30 16:16:27 -07:00
Olivier Certner
dee62e074a
spa: ZIO_TASKQ_ISSUE: Use symbolic priority
This allows to change the meaning of priority differences in FreeBSD
without requiring code changes in ZFS.

This upstreams commit fd141584cf89d7d2 from FreeBSD src.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17489
2025-06-30 10:24:23 -04:00
Paul Dagnelie
69ee01aa4b
Fix bug caused by rounding in vdev_raidz_asize_to_psize
When an allocation is happening on a raidz vdev, the number of sectors
allocated is rounded up to a multiple of nparity + 1. If this results in
the allocation spilling into an extra row, then the corresponding call
to vdev_raidz_asize_to_psize will incorrectly assume that parity sectors
were allocated for that spilled row, even though no data is stored
there.

If we determine that happened, we need to subtract out those extra
sectors before performing the rest of the capacity calculation.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17490
2025-06-27 14:54:20 -04:00
Rob Norris
ea076d6921
vdev_raidz_asize_to_psize: return psize, not asize
Since 246e588, gang blocks written to raidz vdevs will write past the
end of their allocation, corrupting themselves, other data, or both.

The reason is simple - when allocating the gang children, we call
vdev_psize_to_asize() to find out how much data we should load into the
allocation we just did. vdev_raidz_asize_to_psize() had a bug; it
computed the psize, but returned the original asize. The raidz layer
dutifully writes that much out, into space beyond the end of the
allocation.

If there's existing data there, it gets overwritten, causing checksum
errors when that data is read. Even there's not data there (unlikely,
given that gang blocks are in play at all), that area is not considered
allocated, so can be allocated and overwritten later.

The fix is simple: return the psize we just computed.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17488
2025-06-26 10:19:59 -04:00
Mark Johnston
0a2163d194
FreeBSD: Ensure that z_pflags is initialized for new znodes
The field is subsequently accessed in zfs_mknode(), in
zfs_inherit_projid().  The Linux implementation of zfs_create_fs() has
this initialization already; there is no counterpart to
zfs_create_share_dir() that I can see.

Reported-by: KMSAN
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17486
2025-06-25 12:07:17 -04:00
Paul Dagnelie
d461a67d0a
Ensure that gang_copies is always at least as large as copies
As discussed in the comments of PR #17004, you can theoretically run
into a case where a gang child has more copies than the gang header,
which can lead to some odd accounting behavior (and even trip a
VERIFY). While the accounting code could be changed to handle this, it
fundamentally doesn't seem to make a lot of sense to allow this to
happen. If the data is supposed to have a certain level of reliability,
that isn't actually achieved unless the gang_copies property is set to
match it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17484
2025-06-25 12:05:36 -04:00
Rob Norris
46a4075100
Linux 6.16: remove writepage and readahead_page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17443
2025-06-23 15:51:02 -04:00
Brian Behlendorf
48ce292ea0
Clarify and restrict dmu_tx_assign() errors
There are three possible cases where dmu_tx_assign() may
encounter a fatal error.  When there is a true lack of free
space (ENOSPC), when there is a lack of quota space (EDQUOT),
or when data required to perform the transaction cannot be
read from disk (EIO).  See the dmu_tx_check_ioerr() function
for additional details of on the motivation for check for
I/O error early.

Prior to this change dmu_tx_assign() would return the
contents of tx->tx_err which covered a wide range of possible
error codes (EIO, ECKSUM, ESRCH, etc).  In practice, none
of the callers could do anything useful with this level of
detail and simply returned the error.

Therefore, this change converts all tx->tx_err errors to EIO,
adds ASSERTs to dmu_tx_assign() to cover the only possible
errors, and clarifies the function comment to include EIO as
a possible fatal error.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian D Behlendorf <behlendo@slag12.llnl.gov>
Closes #17463
2025-06-23 15:48:30 -04:00
Paul Dagnelie
8170eb6ebc
Fix TestGroup warning due to missing tags
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17473
2025-06-19 14:41:31 -07:00
Alexander Motin
5e5253be84
FreeBSD: Wire projects support
While FreeBSD itself does not support projects, there is no reason
why it can't be controlled via `zfs project` and other subcommands.
Most of the code is actually already there and just needs some
revival and sync with Linux, plus enabling some tests not depending
on the OS support.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17423
2025-06-19 14:39:20 -07:00
Paul Dagnelie
717213d431
Fix other nonrot bugs
There are still a variety of bugs involving the vdev_nonrot property
that will cause problems if you try to run the test suite with
segment-based weighting disabled, and with other things in the weighting
code. Parents' nonrot property need to be updated when children are
added. When vdevs are expanded and more metaslabs are added, the weights
have to be recalculated (since the number of metaslabs is an input to
the lba bias function). When opening, faulted or unopenable children
should not be considered for whether a vdev is nonrot or not (since the
nonrot property is determined during a successful open, this can cause
false negatives). And draid spares need to have the nonrot property set
correctly.

Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17469
2025-06-19 09:25:58 -04:00
Tino Reichardt
585dbbf13b
ZTS: Use FreeBSD cloudinit images
FreeBSD provides CI-IMAGES since some time. These images are
based on nuageinit, which does not support fqdn and sudo for
example. So we need currently some workarounds to get it
working.

The FreeBSD images will be more compatible with cloud-init in
some near future. Then we can remove the workaround things.

These versions are used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-3s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17462
2025-06-18 10:19:21 -04:00
Attila Fülöp
6cf17f6538
Linux build: handle CONFIG_OBJTOOL_WERROR=y
Linux 5.16 by default fails the build on objtool warnings. We have
known and understood objtool warnings we can't fix without
involving Linux maintainers.

To work around this we introduce an objtool wrapper script which
removes the `--Werror` flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17456
2025-06-16 08:12:09 -07:00
Alexander Motin
bd27b75401
ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush
defer, when we flush vdev caches only once all the write ZIOs has
completed.  But it was recently spotted that it serializes not only
ZIO completions handling, but also their ready stage.  It means ZIO
pipeline can't calculate checksums for the following ZIOs until all
the previous are checksumed, even though it is not required.  On a
systems where memory throughput of a single CPU core is limited,
it creates single-core CPU bottleneck, which is difficult to see
due to ZIO pipeline design with many taskqueue threads.

While it would be great to bypass the ready stage waits, it would
require changes to ZIO code, and I haven't found a clean way to do
it.  But I've noticed that we don't need any dependency between
the write ZIOs if the previous one has some waiters, which means
it won't defer any flushes and work as a barrier for the earlier
ones.

Bypassing it won't help large single-thread writes, since all the
write ZIOs except the last in that case won't have waiters, and
so will be dependent.  But in that case the ZIO processing might
not be a bottleneck, since there will be only one thread populating
the write buffers, that will likely be the bottleneck.

But bypassing the ZIO dependency on multi-threaded write workloads
really allows them to scale beyond the checksuming throughput of
one CPU core.

My tests with writing 12 files on a same dataset on a pool with
4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a
system with Xeon Silver 4114 CPU show total throughput increase
from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to
~70%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17458
2025-06-14 09:37:18 -04:00
Germano Massullo
b4ebba0e04
Fix mixed-use-of-spaces-and-tabs rpmlint warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Germano Massullo <germano.massullo@gmail.com>
Closes #17461
2025-06-13 12:36:17 -07:00
Rob Norris
238eab7dc1 FreeBSD: zfs_putpages: don't undirty pages until after write completes
zfs_putpages() would put the entire range of pages onto the ZIL, then
return VM_PAGER_OK for each page to the kernel. However, an associated
zil_commit() or txg sync had not happened at this point, so the write
may not actually be on disk.

So, we rework it to use a ZIL commit callback, and do the post-write
work of undirtying the page and signaling completion there. We return
VM_PAGER_PEND to the kernel instead so it knows that we will take care
of it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-12 14:45:18 -07:00
Rob Norris
aa964ce61b zfs_log_write: only put the callback on the last itx
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-12 14:44:33 -07:00
Rob Norris
d1c88cbd4c zpl_sync_fs: work around kernels that ignore sync_fs errors
If the kernel will honour our error returns, use them. If not, fool it
by setting a writeback error on the superblock, if available.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-12 14:42:32 -07:00
Rob Norris
e3f5e317e0 zfs_sync: return error when pool suspends
If the pool is suspended, we'll just block in zil_commit(). If the
system is shutting down, blocking wouldn't help anyone. So, we should
keep this test for now, but at least return an error for anyone who is
actually interested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-12 14:42:27 -07:00
Rob Norris
52352dd748 zfs_sync: remove support for impossible scenarios
The superblock pointer will always be set, as will z_log, so remove code
supporting cases that can't occur (on Linux at least).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-12 14:42:21 -07:00
Rob Norris
c1b7bc52fe zts: test syncfs() behaviour when pool suspends
Fairly coarse, but if it returns while the pool suspends, it must be
with an error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-12 14:39:35 -07:00
Alexander Motin
e0ef4d2768
Improve block cloning transactions accounting
Previous dmu_tx_count_clone() was broken, stating that cloning is
similar to free.  While they might be from some points, cloning
is not net-free.  It will likely consume space and memory, and
unlike free it will do it no matter whether the destination has
the blocks or not (usually not, so previous code did nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17431
2025-06-11 11:59:16 -07:00
Alexander Motin
66ec7fb269
Reduce zfs_dmu_offset_next_sync penalty
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17434
2025-06-11 11:50:49 -07:00
Alexander Motin
4ae931aa93
Polish db_rwlock scope
dbuf_verify(): Don't need the lock, since we only compare pointers.

dbuf_findbp(): Don't need the lock, since aside of unneeded assert
we only produce the pointer, but don't de-reference it.

dnode_next_offset_level(): When working on top level indirection
should lock dnode buffer's db_rwlock, since it is our parent.  If
dnode has no buffer, then it is meta-dnode or one of quotas and we
should lock the dataset's ds_bp_rwlock instead.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17441
2025-06-11 11:13:48 -07:00
Rob Norris
3ff2eca0be zfs-program(8): document zfs.sync.clone()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17426
2025-06-10 14:53:18 -07:00
Rob Norris
1987498b66 ZTS: test zfs.sync.clone() for filesystems and volumes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17426
2025-06-10 14:53:15 -07:00
Rob Norris
fbfda270d5 zcp_synctask: add zfs.sync.clone()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17426
2025-06-10 14:53:10 -07:00
Rob Norris
560e3170ef dsl_dataset: rename dmu_objset_clone* to dsl_dataset_clone*
And make its check and sync functions visible, so I can hook them up to
zcp_synctask. Rename not strictly necessary, but it definitely looks
more like a dsl_dataset thing than a dmu_objset thing, to the extent
that those things even have a meaningful distinction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17426
2025-06-10 14:52:43 -07:00
Alexander Motin
ba227e2cc2
Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-06-10 09:30:06 -07:00
Alexander Motin
bcd0430236
Allow zero compression if dedup is enabled
Having high-refcount dedup entries for zero blocks is inefficient
when they could be recorded as a holes instead.  Normally, zero
compression is not done if compression is disabled to not confuse
naive benchmarks.  But with dedup enabled, it is expected that the
write will be skipped anyway, so we are just optimizing the way it
is skipped.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17435
2025-06-10 09:28:14 -07:00
Tino Reichardt
0e9e2e2501
ZTS: Enable io_uring on CentOS Stream 9 and 10 also
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17447
2025-06-09 16:26:57 -07:00
Mariusz Zaborski
46b82de618
scrub: generate scrub_finish event
The `scn_min_txg` can now be used not only with resilver. Instead
of checking `scn_min_txg` to determine whether it’s a resilver or
a scrub, simply check which function is defined. Thanks to this
change, a scrub_finish event is generated when performing a scrub
from the saved txg.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #17432
2025-06-06 22:43:10 -04:00
Rob Norris
af7d609592
zpl: handle suspend from two remaining calls to txg_wait_synced()
* zfs_link: allow tempfile sync to fail if pool suspends

4653e2f7d3 (#17355) allows dmu_tx_assign() to fail if the pool suspends
when failmode=continue, but zfs_link() can fall back to
txg_wait_synced() if it has to wait for a tempfile to be fully created
before continuing, which will block if the pool suspends.

Handle this by requesting an error return if the pool suspends when
failmode=continue, and if that happens, return EIO.

* zfs_clone_range: allow dirty wait to fail if pool suspends

4653e2f7d3 (#17355) allows dmu_tx_assign() to fail if the pool suspends
when failmode=continue, but zfs_clone_range() can fall back to
txg_wait_synced() if it has to wait for a dirty block to be written out,
which will block if the pool suspends.

Handle this by requesting an error return if the pool suspends when
failmode=continue, and if that happens, return EIO.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17413
2025-06-05 15:38:26 -04:00
Attila Fülöp
b96f1a4b1f
Linux build: silence objtool warnings
After #17401 the Linux build produces some stack related warnings.

Silence them with the `STACK_FRAME_NON_STANDARD` macro.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17410
2025-06-04 17:40:09 -07:00
Alexander Motin
b7f919d228
Relax zfs_vnops_read_chunk_size limitations
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB.  Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.

On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.

Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.

My profiles show ~4% of CPU time saving when reading 16MB blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17415
2025-06-04 11:24:15 -04:00
Alexander Motin
68817d28c5
Include class name into struct metaslab_class
With increasing number of metaslab classes it can be helpful for
debugging to know what we are looking at.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17409
2025-06-03 11:12:59 -04:00
Alexander Motin
108562344c
Improve allocation fallback handling
Before this change in case of any allocation error ZFS always fallen
back to normal class.  But with more of different classes available
we migth want more sophisticated logic.  For example, it makes sense
to fall back from dedup first to special class (if it is allowed to
put DDT there) and only then to normal, since in a pool with dedup
and special classes populated normal class likely has performance
characteristics unsuitable for dedup.

This change implements general mechanism where fallback order is
controlled by the same spa_preferred_class() as the initial class
selection.  And as first application it implements the mentioned
dedup->special->normal fallbacks.  I have more plans for it later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17391
2025-05-31 19:12:16 -04:00
Fedor Uporov
e0edfcbd4e
ZVOL: Make zvol_volmode module parameter platform-independent
The module parameter name was not changed in FreeBSD sysctls
list: 'vfs.zfs.vol.mode'. Also, on Linux side the name is:
/sys/module/zfs/parameters/zvol_volmode.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17386
2025-05-31 19:09:50 -04:00
Fedor Uporov
e1677d9ee1
ZVOL: Make zvol_prefetch_bytes module parameter platform-independent
The module parameter now is represented in FreeBSD sysctls list
with name: 'vfs.zfs.vol.prefetch_bytes'. The default value is 131072,
same as on Linux side.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17385
2025-05-31 09:58:54 -04:00
Rob Norris
e8e602d987
zio_add_child: collapse into a single function
The child locking difference is simple enough to handle with a boolean.
The actual work is more involved, and it's easy to forget to change
things in both places when experimenting. Just collapse them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17382
2025-05-30 21:18:10 -04:00
Brian Behlendorf
90011644ce
CI: Retire Fedora 40 builder
Fedora 40 has gone EOL as of May 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17408
2025-05-30 21:15:11 -04:00
Alexander Motin
2d33c8edb6
Make rewrite use Uncached I/O
Rewrite is a one-time/rare bulk administrative operation, which
should minimally affect payload caching.  Plus some avoided memory
copies in its data path allow to significantly increase its speed.

My tests show reduction of time to rewrite 28GB of uncompressed
files on NVMe pool from 17 to 9 seconds and minimal ARC usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17407
2025-05-30 21:13:49 -04:00
Tino Reichardt
f03f9c9bde ZTS: Enable io_uring support on el9/el10
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-05-30 13:57:35 -07:00
Tino Reichardt
08bf660ac4 ZTS: Add AlmaLinux 10
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-05-30 13:56:20 -07:00
Fedor Uporov
a38376b37a
Rename zvol kernel module parameters sysctls on FreeBSD side
Make 'zvol_threads', 'zvol_num_taskqs' and 'zvol_request_sync' names
compatible with FreeBSD sysctl naming convention. Now the sysctls are
have a next form:

$ sysctl vfs.zfs.vol.threads
vfs.zfs.vol.threads: 0

$ sysctl vfs.zfs.vol.num_taskqs
vfs.zfs.vol.num_taskqs: 0

$ sysctl vfs.zfs.vol.request_sync
vfs.zfs.vol.request_sync: 0

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17406
2025-05-30 13:41:15 -07:00
Rob Norris
1bd225ed8a
abd_os: move headers from libzpool to libspl
5b9e695 added specific userspace versions of abd_os.h and abd_impl_os.h
for libzpool. However, abd.h and abd_impl.h, which include them, are
packaged with libzfs, so other programs building against libzfs can
fail to build, either because the headers aren't installed, or because
they aren't on any standard include path.

So, move abd_os.h and abd_impl_os.h to libspl, where they we will be
installed alongside abd.h and abd_impl.h in a known path.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16940
Closes #17390
Closes #17394
2025-05-30 13:38:20 -07:00
Alexander Motin
008c9666ef
Set spa_final_txg in spa_unload()
I've noticed that after some dedup tests system reboot ends up in
assertion about ms_defer tree not free.  It seems to be caused by
DDT flushing still freeing some blocks while ZFS trying to reach
a final steady state due to spa_final_txg, while being set by
spa_export_common() on pool export, is not set when spa_unload()
is called by spa_evict_all() on system reboot/shutdown.  Setting
spa_final_txg in spa_unload() fixes this issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17395
2025-05-30 14:44:45 -04:00
Ameer Hamza
f5a6dd8b70 zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message
Disks can be removed either by the administrator via hotplug or by the
kernel when a disk failure occurs. The previous message implied that
removal was always manual, which could be confusing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-05-30 09:15:10 -07:00
Ameer Hamza
b3b3cd1e4f vdev: skip faulting disks pending removal
This patch fixes a race where vdev_remove_wanted may be set after probe
initiation, which could otherwise trigger redundant fault and removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-05-30 09:14:37 -07:00
Brian Behlendorf
1d482ca6e3
CI: Retire Ubuntu 20.04 builder
Ubuntu 20.04 has gone EOL as of April 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17403
2025-05-30 10:32:24 -04:00
Rob Norris
5764e218ba
vdev_disk: remove classic IO submission
Since it was disabled for 2.3, there's been no confirmed sightings of
strange IO errors, misalignments or related shenanigans. Absence of
evidence and all that, but I'd rather fix bugs in the new code than in
the old.

  "It isn't hubris until he's failed."
    -- Chrisjen Avasarala

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17399
2025-05-30 10:31:02 -04:00
Rob Norris
44e3266894
events: include zio type in IO error reports
Usually the IO type can be inferred from the other fields (in
particular, priority and flags) sometimes it's not easy to see. This is
just another little debug helper.

    May 27 2025 00:54:54.024110493 ereport.fs.zfs.data
            class = "ereport.fs.zfs.data"
            ena = 0x1f5ecfae600801
            ...
            zio_delta = 0x0
            zio_type = 0x2 [WRITE]
            zio_priority = 0x3 [ASYNC_WRITE]
            zio_objset = 0x0

Document zio_type and zio_priority.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17381
2025-05-30 10:29:29 -04:00
Rob Norris
0c94d3838d
linux/zvol_os: don't try to set disk ops if alloc fails
If the kernel fails to allocate the gendisk, zvo_disk will be NULL, and
derefencing it will explode. So don't do that.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17396
2025-05-30 10:25:09 -04:00
Attila Fülöp
3084336ae4
Linux build: always use objtool
We silence `objtool` warnings on some object files using
`OBJECT_FILES_NON_STANDARD_some_file.o`. Nowadays `objtool` is
needed for CPU vulnerability mitigations and a lot more
functionality so its use is desirable.

Just remove the `OBJECT_FILES_NON_STANDARD` definitions. A follow-up
commit is needed to make the offending files standard and address
the compile time warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17401
Closes #17364
2025-05-29 18:04:20 -07:00
Fedor Uporov
3dfa98d013
ZVOL: Make zvol_inhibit_dev module parameter platform-independent
The module parameter now is represented in FreeBSD sysctls list with
name: 'vfs.zfs.vol.inhibit_dev'. The default value is '0', same as on
Linux side.

Sponsored-by: vStack, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17384
2025-05-29 09:37:41 -04:00
Alexander Motin
fa697b94e6
FreeBSD: Add posix_fadvise(POSIX_FADV_WILLNEED) support
As commit 320f0c6 did for Linux, connect POSIX_FADV_WILLNEED
up to dmu_prefetch() on FreeBSD.

While there, fix portability problems in tests/functional/fadvise.

1.  Instead of relying on the numerical values of POSIX_FADV_XXX macros,
    accept macro names as arguments to the file_fadvise program.  (The
    numbers happen to match on Linux and FreeBSD, but future systems may
    vary and it seems a little strange/raw to count on that.)

2.  For implementation reasons, SEQUENTIAL doesn't reach ZFS via FreeBSD
    VFS currently (perhaps something that should be investigated in
    FreeBSD).  Since on Linux we're treating SEQUENTIAL and WILLNEED the
    same, it doesn't really matter which one we use, so switch the test
    over to WILLNEED exercise the new prefetch code on both OSes the
    same way.

Reviewed-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Thomas Munro <tmunro@FreeBSD.org>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #17379
2025-05-29 09:34:07 -04:00
Rob Norris
00360efa35 tunables: fix spelling
Three occurences with an 'e', and all of them mine. Maybe it's an
British thing?

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
9392be427e tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
dd4e2f99f0 tunables: remove unused param get/set aliases
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
589d99171f tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
6e7e7ea7ef tunables: remove support for s64 tunables
Nothing uses them now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
58235f52af tunables: remove direct use of module_param_cb
The use for spl_taskq_kick was the only use, and the comment that
module_param_call is obsolete is no longer true - it's still very much
used even in recent kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
7b183f1918 tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
b0e053a10d tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.

This matters very little so long as we don't have userspace tunables,
but its still good hygeine.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
d1724b59dc tunables: don't assert initialisation in impl getters
It actually doesn't matter if it's not initialised when we first query
the current value; it just returns empty-string. A crash is quite
obnoxious even if it is a rare case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Rob Norris
5ef91c2bee zfs_log: make zfs_immediate_write_sz uint
Likely it's only int64 for comparison with ssize_t, which is signed.
However, it would make no sense for it to be less than 0 or greater than
4G, so making it a regular uint will make it safe for comparison and
remove the only S64 tunable in core.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-05-28 16:50:22 -07:00
Tony Hutter
906ced88df
Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17393
2025-05-28 16:28:02 -07:00
Paul Dagnelie
c464f1d014
Only interrupt active disk I/Os in failmode=continue
failmode=continue is in a sorry state. Originally designed to fix a very
specific problem, it causes crashes and panics for most people who end
up trying to use it. At this point, we should either remove it entirely,
or try to make it more usable.

With this patch, I choose the latter. While the feature is fundamentally
unpredictable and prone to race conditions, it should be possible to get
it to the point where it can at least sometimes be useful for some
users. This patch fixes one of the major issues with failmode=continue:
it interrupts even ZIOs that are patiently waiting in line behind stuck
IOs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17372
2025-05-28 15:31:32 -07:00
Rob Norris
0372def8c9 vdev_geom: converted injected EIO errors to ENXIO
By the assertion, vdev_geom_io_done() only expects ENXIO on an error
when the geom is a top-level (allocating) vdev[1][2]. However, zinject
currently can't insert ENXIO directly, possibly because on Solaris
outright disk failures were reported with EIO[2][3].

This is a narrow workaround to convert EIO to ENXIO when injections are
enabled, to avoid the assertion and allow the test suite to test
behaviour related to probe failure on FreeBSD.

1. freebsd/freebsd-src@37ec52ca7a
2. freebsd/freebsd-src@cd730bd6b2
3. illumos/illumos-gate@ea8dc4b6d2

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:29:11 -07:00
Rob Norris
2303775fea ZTS: test dmu_tx response when pool suspends
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:29:06 -07:00
Rob Norris
55d035e866 dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND
This is the cheap way to keep non-user functions working after
break-on-suspend becomes default.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:59 -07:00
Rob Norris
4653e2f7d3 dmu_tx: break tx assign/wait when pool suspends
This adjusts dmu_tx_assign/dmu_tx_wait to be interruptable if the pool
suspends while they're waiting, rather than just on the initial check
before falling back into a wait.

Since that's not always wanted, add a DMU_TX_SUSPEND flag to ignore
suspend entirely, effectively returning to the previous behaviour.

With that, it shouldn't be possible for anything with a standard
dmu_tx_assign/wait/abort loop to block under failmode=continue.

Also should be a bit tighter than the old behaviour, where a
VERIFY0(dmu_tx_assign(DMU_TX_WAIT)) could technically fail if the pool
is already suspended and failmode=continue.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:51 -07:00
Rob Norris
ac2e579521 dmu_tx: make DMU_TX_* flags an enum
Mostly for a little more type checking and debugging visibility.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:46 -07:00
Rob Norris
468d22d60c txg_wait_synced_flags: add TXG_WAIT_SUSPEND flag to not wait if pool suspended
This allows a caller to request a wait for txg sync, with an appropriate
error return if the pool is suspended or becomes suspended during the
wait.

To support this, txg_wait_kick() is added to signal the sync condvar,
which wakes up the waiters, causing them to loop and reconsider their
wait conditions again. zio_suspend() now calls this to trigger the break
if the pool suspends while waiting.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:27:46 -07:00
Pavel Snajdr
8487945034
zcp: get_prop: fix encryptionroot and encryption
It was reported that channel programs' zfs.get_prop doesn't work for
dataset properties encryption and encryptionroot.

They are handled in get_special_prop due to the need to call
dsl_dataset_crypt_stats to load those dsl props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Graham Christensen <graham@grahamc.com>
Closes #17280
2025-05-27 20:04:37 -04:00
Rob Norris
06fa8f3f69
zfs_cmd: reorganise zfs_cmd_t to match original size
2aa3fbe761 extended zinject_record_t, and in doing so inadvertently
extended zfs_cmd_t, which broke compatibility with userspace tools
without the change.

This fixes that by using some of the unused space in zfs_cmd_t for the
extra fields. We also add an assert to trigger a compile error if the
size ever changes.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17367
2025-05-27 20:01:06 -04:00
Fedor Uporov
087d7d80c7
ZVOL: Comment platform-specific empty functions bodies on FreeBSD side
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17383
2025-05-27 20:00:25 -04:00
Rob Norris
fc617645a3 vdev_disk: remove zfs_vdev_scheduler option
It has existed as a warning since 0.8.3, 5+ years ago. I think people
have had enough time.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-05-27 15:06:15 -07:00
Rob Norris
284580c878 dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-05-27 15:05:09 -07:00
Ameer Hamza
2a91d577b1
Expose dataset encryption status via fast stat path
In truenas_pylibzfs, we query list of encrypted datasets several times,
which is expensive. This commit exposes a public API zfs_is_encrypted()
to get encryption status from fast stat path without having to refresh
the properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17368
2025-05-26 22:11:03 -04:00
Don Brady
b048bfa9c1
Allow opt-in of zvol blocks in special class
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #14876
2025-05-24 16:44:26 -04:00
Alexander Motin
9d76950d67
ZIL: Improve write log size accounting
Before this change write log size TXG throttling mechanism was
accounting only user payload bytes.  But the actual ZIL both on
disk and especially in memory include headers of hundred(s) of
bytes.  Not accouting those may allow applications like
bonnie++, in their wisdom writing one byte at a time, to consume
excessive amount of memory and ZIL/SLOG in one TXG.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17373
2025-05-23 21:48:46 -04:00
George Amanakis
b01d7bd32d
ZTS: testing for leaked key mappings in encrypted non-raw send
This test covers a bug fixed by commit ea74cde: performing an
incremental non-raw send from an encrypted filesystem followed by
exporting the pool. Before that commit, exporting the sending pool
in this scenario would trigger a panic:

VERIFY(avl_is_empty(&sk->sk_dsl_keys)) failed
PANIC at dsl_crypt.c:353:spa_keystore_fini()
Call Trace:
 spl_dumpstack+0x29/0x2f [spl]
 spl_panic+0xd1/0xe9 [spl]
 spl_assert.constprop.0+0x1a/0x30 [zfs]
 spa_keystore_fini+0xc2/0xf0 [zfs]
 spa_deactivate+0x25f/0x610 [zfs]
 spa_evict_all+0xf4/0x200 [zfs]
 spa_fini+0x13/0x140 [zfs]
 zfs_kmod_fini+0x72/0xc0 [zfs]
 openzfs_fini_os+0x13/0x3a [zfs]
 openzfs_fini+0x9/0x6b8 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #17366
2025-05-23 21:46:51 -04:00
Cameron Harr
92157c840c Refactor man page and CLI help output per mandoc
The man page and the usage statement from the CLI have been refactored
to abide by the ManDoc standard. Style changes include:
 * Upper-case letters before lower-case
 * List short options w/o arguments first
 * Then list short options w/ arguments
 * Then list long arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #17357
2025-05-23 09:10:30 -07:00
Cameron Harr
cdb4c44684 Reformat cli help and man page to be in sync
The man page and CLI usage statements were both a little out
of sync and neither fully alphabetized correctly. That has
been fixed. One outstanding question is whether to get rid of
the ellipses on the CLI usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16004
Closes #17357
2025-05-23 09:10:21 -07:00
Paul Dagnelie
ddf28f27c5
Fix off-by-one bug in range tree code
Without this fix, zfs_range_tree_find_in could return an overlap when
the found range starts immediately after the searched range, with no
actual overlap.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17363
2025-05-23 10:33:33 -04:00
Alexander Motin
5c30b24381
Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
2025-05-22 10:47:43 -04:00
Andres
a6f20250de
Update 69-vdev.rules.in
Add support to alias md-type devices in udev rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andres <a-d-j-i@users.noreply.github.com>
Closes #17345
2025-05-21 10:05:11 -07:00
Rob Norris
a387b7599c lzc_ioctl_fd: add ZFS_IOC_TRACE envvar to enable ioctl tracing
When set, dumps all ZFS ioctl calls and returns and their nvlists to
STDERR, to make debugging and understanding a lot easier.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:53 -07:00
Rob Norris
c4c3917b2a lzc: move lzc_ioctl_fd() into lzc proper
Name the OS-specific call lzc_ioctl_fd_os(), and make lzc_ioctl_fd()
wrap it, so we can do more in the wrapper.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:46 -07:00
Rob Norris
f454cc1723 libzfs: ensure all ioctl calls go through lzc_ioctl_fd()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:23 -07:00
Richard Yao
2e5e4bb0f8
Add Quality Assurance to pull request template
PRs like #17352 have no applicable checkbox, so let us add one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17354
2025-05-20 08:49:22 -07:00
Richard Yao
83fa80a550
dmu_objset_hold_flags() should call dsl_dataset_rele_flags() on error
This was caught when doing a manual check to see if #17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in #17340.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17353
2025-05-20 08:35:45 -07:00
Ameer Hamza
f0baaa329a
arcstat: prevent ZeroDivisionError when L2ARC becomes empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard@ryao.dev>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17348
2025-05-19 16:27:24 -07:00
Rob Norris
841be1d049 Linux 6.2/6.15: del_timer_sync() renamed to timer_delete_sync()
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or
functional change apart from that, so a very minimal update for us.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
2025-05-19 11:13:00 -07:00
Rob Norris
bb740d66de Linux 6.15: mkdir now returns struct dentry *
The intent is that the filesystem may have a reference to an "old"
version of the new directory, eg if it was keeping it alive because a
remote NFS client still had it open.

We don't need anything like that, so this really just changes things so
we return error codes encoded in pointers.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
2025-05-19 11:12:49 -07:00
Richard Yao
d8a33bc0a5
icp: Use explicit_memset() exclusively in gcm_clear_ctx()
d634d20d1b had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.

When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17343
2025-05-19 10:04:05 -07:00
George Amanakis
ea74cdedda
Fix 2 bugs in non-raw send with encryption
Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <hbarta@gmail.com>
Contributions-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12014 
Closes #17340
2025-05-19 09:55:00 -07:00
Alexander Motin
e55225be3e
Add explicit DMU_DIRECTIO checks
UIO_DIRECT means we can do Direct I/O, while DMU_DIRECTIO we want
to do it.  First does not automatically means second.  Add few
checks to not use Direct I/O in few cases we don't want it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17342
2025-05-16 23:27:05 -04:00
Alexander Motin
d5616ad34a
Increase meta-dnode redundancy in "some" mode
Loss of one indirect block of the meta dnode likely means loss of
the whole dataset.  It is worse than one file that the man page
promises, and in my opinion is not much better than "none" mode.

This change restores redundancy of the meta-dnode indirect blocks,
while same time still corrects expectations in the man page.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17339
2025-05-16 13:23:32 -04:00
Paul Dagnelie
086105f4c4
Cause zpool scan resume commands to get logged in history
Currently, commands that resume a scrub/errorscrub from a paused state
don't get logged in the pool history. This is because resumes actually
return ECANCELED, instead of 0. This causes the tsd code in the common
ioctl logic to not think the ioctl succeeded, which causes the
log_history ioctl to fail with EPERM. However, for resuming a scrub from
a paused state, ECANCELED is success.

There are two options for how to deal with this. The first is the one
that I implemented here; I can't find a good reason for dmu_scan to
return ECANCELED on resume instead of 0, so let's just not. The only
place we check for the ECANCELED value is in zpool_scan, where we just
convert it back to zero.  However, I am aware that this is changing an
ioctl interface, which I believe is a breaking change. I don't think
it's an important change, but maybe there is someone who relies on it.

The other option that could be implemented is to either allow ECANCELED
specifically from dsl_scan in the common ioctl code, or add a generic
facility to the common ioctl code that allows each command to specify
whether or not success happened, regardless of the return values. I am
open to feedback on which option people think would be better.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17301
2025-05-16 13:19:04 -04:00
Allan Jude
b6916f995e
ARC: parallel eviction
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #16486
2025-05-14 10:38:32 -04:00
Alexander Motin
89a8a91582
ARC: Notify dbuf cache about target size reduction
ARC target size might drop significantly under memory pressure,
especially if current ARC size was much smaller than the target.
Since dbuf cache size is a fraction of the target ARC size, it
might need eviction too.  Aside of memory from the dbuf eviction
itself, it might help ARC by making more buffers evictable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17314
2025-05-14 10:34:14 -04:00
Alexander Motin
0aa83dce99
Linux: Stop using NR_FILE_PAGES for ARC scaling
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES.  But it is actually a non-evictable
anonymous memory.  Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.

Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17334
2025-05-14 09:29:02 -04:00
Tony Hutter
b55256e5bb
runners: Add option to install custom kernel on Fedora
Allow installing a custom kernel version from the Fedora experimental
kernel repos onto the github runners.  This is useful for testing if
ZFS works against a newer kernel.

Fedora has a number of repos with experimental kernel packages. This
PR allows installs from kernels in these repos:

@kernel-vanilla/stable
@kernel-vanilla/mainline
(https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories)

You will need to manually kick of a github runner to test with a custom
kernel version.  To do that, go to the github actions tab under
'zfs-qemu' and click the drop-down for 'run workflow'.  In there you
will see a text box to specify the version (like '6.14').  The scripts
will do their best to match the version to the newest matching version
that the repos support (since they're may be multiple nightly versions
of, say, '6.14').  A full list of kernel versions can be seen in the
dependency stage output if you kick off a manual run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17156
2025-05-13 14:43:35 -07:00
Alexander Motin
734eba251d
Wire O_DIRECT also to Uncached I/O (#17218)
Before Direct I/O was implemented, I've implemented lighter version
I called Uncached I/O.  It uses normal DMU/ARC data path with some
optimizations, but evicts data from caches as soon as possible and
reasonable.  Originally I wired it only to a primarycache property,
but now completing the integration all the way up to the VFS.

While Direct I/O has the lowest possible memory bandwidth usage,
it also has a significant number of limitations.  It require I/Os
to be page aligned, does not allow speculative prefetch, etc.  The
Uncached I/O does not have those limitations, but instead require
additional memory copy, though still one less than regular cached
I/O.  As such it should fill the gap in between.  Considering this
I've disabled annoying EINVAL errors on misaligned requests, adding
a tunable for those who wants to test their applications.

To pass the information between the layers I had to change a number
of APIs.  But as side effect upper layers can now control not only
the caching, but also speculative prefetch.  I haven't wired it to
VFS yet, since it require looking on some OS specifics.  But while
there I've implemented speculative prefetch of indirect blocks for
Direct I/O, controllable via all the same mechanisms.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17027
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-05-13 14:26:55 -07:00
diwakar-kristappagari
e2ba0f7643
vdev_id: symlinks creation for multipath disk partitions (#17331)
It has been observed that the symlinks are not being created
for the disk partitions on multipath enabled systems.
This fix addresses the issue.

Signed-off-by: Diwakar Kristappagari <diwakar-k@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
2025-05-13 13:49:13 -07:00
Rob Norris
485a2d0112 AUTHORS/mailmap: update with new contributors
Thanks everyone!

Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
ae2caf9cb0 update_authors: output possible mailmap additions
Once we've selected a best ident for the AUTHORS file, it makes sense to
set up a corresponding mailmap entry for any other ident for that
committer, to ensure the git history also reflects this into the future.

So, here we output potential mailmap updates for a human to consider.

For the moment, this needs to be done by a human, because update_authors
uses git to get the author names, and thus is reliant on the mailmap
contents to generate its output, so having it update mailmap directly
would introduce a circular dependency that I'm not totally sure about.
It's definitely better than having to go back through the history and
check each commit by hand though.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
8e318fda80 update_authors: consider Signed-off-by trailers for committer idents
I've increasingly found that commits from new contributors have the
author set in the "Github noreply" obfuscated style. If they do have a
better canonical choice, it's usually in the Signed-off-by: trailer in
the commit message.

I had avoided using these in the first version of this program because
they aren't always present, aren't always correct, and some commits have
multiple signoffs. It seems however that requiring either the name or
the email address to match the commit author sufficiently narrows the
scope to be useful for the "Github noreply" situation, which is really
the main sticking point. And of course, if it gets it wrong, overriding
in .mailmap or AUTHORS is always an option.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
b2284aedab
metaslab_alloc: make hint BP and DVA const (#17324)
Nothing modifies them, and nothing should, so lets try to enforce that.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2025-05-12 10:52:46 -07:00
Alexander Motin
49fbdd4533
Introduce zfs rewrite subcommand (#17246)
This allows to rewrite content of specified file(s) as-is without
modifications, but at a different location, compression, checksum,
dedup, copies and other parameter values.  It is faster than read
plus write, since it does not require data copying to user-space.
It is also faster for sync=always datasets, since without data
modification it does not require ZIL writing.  Also since it is
protected by normal range range locks, it can be done under any
other load.  Also it does not affect file's modification time or
other properties.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-05-12 10:22:17 -07:00
Rob Norris
9aae14a14a
test-runner: rework output dir construction
The old code would compare all the test group names to work out some
sort of common path, but it didn't appear to work consistently,
sometimes placing output in a top-level dir, other times in one or more
subdirs. (I confess, I do not quite understand what it's supposed to
do).

This is a very simple rework that simply looks at all the test group
paths, removes common leading components, and uses the remainder as the
output directory. This should work because groups paths are unique, and
means we get a output dir tree of roughly the same shape as the test
groups in the runfiles and the test source dirs themselves.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: @ImAwsumm
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17167
2025-05-11 15:24:05 -04:00
Mariusz Zaborski
8b9c4e643b
spa: clear checkpoint information during retry
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17319
2025-05-11 12:49:38 -04:00
Rob Norris
2ee5b51a57
linux/uio: remove "skip" offset for UIO_ITER
For UIO_ITER, we are just wrapping a kernel iterator. It will take care
of its own offsets if necessary. We don't need to do anything, and if we
do try to do anything with it (like advancing the iterator by the skip
in zfs_uio_advance) we're just confusing the kernel iterator, ending up
at the wrong position or worse, off the end of the memory region.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17298
2025-05-11 12:46:40 -04:00
Alan Somers
c17bdc4914
More aggressively assert that db_mtx protects db.db_data
db.db_mtx must be held any time that db.db_data is accessed.  All of
these functions do have the lock held by a parent; add assertions to
ensure that it stays that way.

See https://github.com/openzfs/zfs/discussions/17118

* Refactor dbuf_read_bonus to make it obvious why db_rwlock isn't
required.

* Refactor dbuf_hold_copy to eliminate the db_rwlock
Copy data into the newly allocated buffer before assigning it to the db.
That way, there will be no need to take db->db_rwlock.

* Refactor dbuf_read_hole
In the case of an indirect hole, initialize the newly allocated buffer
before assigning it to the dmu_buf_impl_t.

Sponsored by:	ConnectWise
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #17209
2025-05-09 09:02:26 -04:00
Fedor Uporov
1a8f5ad3b0
zvol: Enable zvol threading functionality on FreeBSD
Make zvol I/O requests processing asynchronous on FreeBSD side in some
cases. Clone zvol threading logic and required module parameters from
Linux side. Make zvol threadpool creation/destruction logic shared for
both Linux and FreeBSD.
The IO requests are processed asynchronously in next cases:
- volmode=geom: if IO request thread is geom thread or cannot sleep.
- volmode=cdev: if IO request passed thru struct cdevsw .d_strategy
routine, mean is AIO request.
In all other cases the IO requests are processed synchronously. The
volthreading zvol property is ignored on FreeBSD side.

Sponsored-by: vStack, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: @ImAwsumm
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17169
2025-05-08 15:25:40 -04:00
Alan Somers
f13d760aa8
Delete dead code: dbuf_loan_arcbuf
It's been dead ever since 5fa356ea44

Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17119
2025-05-08 10:34:11 -04:00
Rob Norris
4800181b3b
ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17308
2025-05-08 10:32:52 -04:00
Olivier Certner
78628a5c15
FreeBSD: Use new SYSCTL_SIZEOF()
SYSCTL_SIZEOF() has been introduced in FreeBSD by commit "sysctl(9):
Ease exporting struct sizes; Discourage doing that" (713abc9880aa) in
branch 'main'.  It will soon be backported to 'stable/14'.  We will thus
be able to remove the old, alternate version left in the '#else' branch
as soon as 'stable/13' goes out of support (April 30, 2026).

Sponsored-by:   The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17309
2025-05-08 10:31:43 -04:00
Alexander Motin
b1ccab1721
ARC: Avoid overflows in arc_evict_adj() (#17255)
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes #17210
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-05-06 09:31:38 -07:00
Rob Norris
c5a6b4417d
zfs_valstr: update zio_flag strings for ZIO_FLAG_PREALLOCATED
Missed in 246e5883bb.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17306
2025-05-06 11:35:30 -04:00
Paul Dagnelie
246e5883bb
Implement allocation size ranges and use for gang leaves (#17111)
When forced to resort to ganging, ZFS currently allocates three child
blocks, each one third of the size of the original. This is true
regardless of whether larger allocations could be made, which would
allow us to have fewer gang leaves. This improves performance when
fragmentation is high enough to require ganging, but not so high that
all the free ranges are only just big enough to hold a third of the
recordsize. This is also useful for improving the behavior of a future
change to allow larger gang headers.

We add the ability for the allocation codepath to allocate a range of
sizes instead of a single fixed size. We then use this to pre-allocate
the DVAs for the gang children. If those allocations fail, we fall back
to the normal write path, which will likely re-gang.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-05-02 15:32:18 -07:00
Rob Norris
a7de203c86
txg: generalise txg_wait_synced_sig() to txg_wait_synced_flags() (#17284)
txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We
expect that future development will require similar "wait unless X"
behaviour.

This generalises the API as txg_wait_synced_flags(), where the provided
flags describe the events that should cause the call to return.

Instead of a boolean, the return is now an error code, which the caller
can use to know which event caused the call to return.

The existing call to txg_wait_synced_sig() is now
txg_wait_synced_flags(TXG_WAIT_SIGNAL).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-05-02 15:29:50 -07:00
Alexander Motin
f85c96edf7 ZTS: Restore some delays in online_offline tests
After more CI runs and code reading after #17259 I've found that
online starts resilver via async mechanism, which does not provide
wait primitives at this time.  Restore some delays to restore CI
until this is properly fixed.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
2025-05-02 15:19:24 -07:00
Alexander Motin
f86d9af16b Fix race between resilver wait and offline/detach
We should not clear scn_state and notify waiters until we call
vdev_dtl_reassess(), otherwise following offline/detach request
may fail with "no valid replicas".

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
2025-05-02 15:19:24 -07:00
José Luis Salvador Rufo
634c172ee8
tests: fix S_IFMT undeclared at statx.c
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header
because it redeclares the `statx` function with different argument
types. Therefore, we define `S_IFMT` ourselves, in the same way as the
other definitions.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #17293
Closes #17294
2025-05-02 17:49:25 -04:00
Tony Hutter
a6cca8a7da
ZTS: Stop zpool_status tests from spamming stdout (#17292)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of
a file without specifying 'of=/dev/null'.  This spams the ZTS logs
with ~20MB of garbage data. This commit adds 'of=/dev/null'.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-05-02 11:18:19 -07:00
Tony Hutter
f40ab9e399
Fix double spares for failed vdev
It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk.  This commit checks for that condition and disallows it.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #16547
Closes: #17231
2025-05-02 09:03:11 -07:00
Tino Reichardt
3b18877269
ZTS: Fix replacement/resilver_restart_001 on FreeBSD
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.

Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1

The test time decreases also from around 00:42 to 00:24 seconds.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16822
Closes #17279
2025-05-02 12:02:14 -04:00
Artem
27f3d94940
Sort the blocking snapshots list #12751 (#17264)
When multiple snapshots prevent the destruction/rollback of the
respective dataset/snapshot/volume via zfs destroy or zfs rollback,
the error message does not list the blocking snapshots sorted
according to their order of creation. This causes inconvenience and can
lead to confusion, and also creates a contrast with a returned message
from zfs list -t snap function.

Closes: #12751

Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-05-01 17:40:23 -07:00
Aleksandr Liber
aa46cc9812
Double quote variables to prevent globbing and word splitting
This change goes through and quotes variables where appropriate to
avoid issues with incorrect splitting. The performance tests ran into
an issue with $SUDO_COMMAND splitting incorrectly because it was not
quoted. This change fixes that issue and hopefully gets ahead of any
other similar problems.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Aleksandr Liber <aleksandr.liber@perforce.com>
Closes #17235
2025-04-30 21:15:19 -04:00
Tony Hutter
81dec433b0
RPM: Hold back incompatible kernel packages on Fedora
A user reported that when your upgrade your kernel packages on Fedora
with ZFS installed, only the kernel-devel package gets held back to the
ZFS-supported version, but not the other kernel packages. So if ZFS only
supports the 6.13 kernel, Fedora will still happily upgrade the kernel
RPM to 6.14, but hold back kernel-devel at 6.13, for example.

This commit includes version checks for the 'kernel-uname-r' dependency,
typically provided by the 'kernel-core' package.

Original-patch-by: @jkool702
Reviewed-by: @jkool702
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17265
Closes #17271
2025-04-30 10:52:20 -07:00
Rob Norris
c8fa39b46c
cred: properly pass and test creds on other threads (#17273)
### Background

Various admin operations will be invoked by some userspace task, but the
work will be done on a separate kernel thread at a later time. Snapshots
are an example, which are triggered through zfs_ioc_snapshot() ->
dsl_dataset_snapshot(), but the actual work is from a task dispatched to
dp_sync_taskq.

Many such tasks end up in dsl_enforce_ds_ss_limits(), where various
limits and permissions are enforced. Among other things, it is necessary
to ensure that the invoking task (that is, the user) has permission to
do things. We can't simply check if the running task has permission; it
is a privileged kernel thread, which can do anything.

However, in the general case it's not safe to simply query the task for
its permissions at the check time, as the task may not exist any more,
or its permissions may have changed since it was first invoked. So
instead, we capture the permissions by saving CRED() in the user task,
and then using it for the check through the secpolicy_* functions.

### Current implementation

The current code calls CRED() to get the credential, which gets a
pointer to the cred_t inside the current task and passes it to the
worker task. However, it doesn't take a reference to the cred_t, and so
expects that it won't change, and that the task continues to exist. In
practice that is always the case, because we don't let the calling task
return from the kernel until the work is done.

For Linux, we also take a reference to the current task, because the
Linux credential APIs for the most part do not check an arbitrary
credential, but rather, query what a task can do. See
secpolicy_zfs_proc(). Again, we don't take a reference on the task, just
a pointer to it.

### Changes

We change to calling crhold() on the task credential, and crfree() when
we're done with it. This ensures it stays alive and unchanged for the
duration of the call.

On the Linux side, we change the main policy checking function
priv_policy_ns() to use override_creds()/revert_creds() if necessary to
make the provided credential active in the current task, allowing the
standard task-permission APIs to do the needed check. Since the task
pointer is no longer required, this lets us entirely remove
secpolicy_zfs_proc() and the need to carry a task pointer around as
well.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-29 16:27:48 -07:00
Tino Reichardt
ba17cedf65
ZTS: Optimize KSM on Linux and remove it for FreeBSD
Don't use KSM on the FreeBSD VMs and optimize KSM settings for
Linux to have faster run times.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17247
2025-04-29 15:27:47 -04:00
Quentin Thébault
63de2d2dbd
zfs-rollback.8: fix typo in example number
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Ziaee <ziaee@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr>
Closes #17282
2025-04-28 15:38:08 -04:00
Tino Reichardt
88ec6c4f40
ZTS: Use Ubuntu default url for cloud-image
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17278
2025-04-28 15:35:26 -04:00
Alexander Motin
d947b9aedd
ZTS: Make zvol_stress write some more
Sometimes it fails unable to see any injected write errors.
I guess writing 25KB of zeroes might be not enough to trigger
errors with probability set to 10%.  Lets try to write more.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17270
2025-04-24 20:49:09 -04:00
Alexander Motin
1ef706c4ad
ZTS: Reduce extra caching in pool_checkpoint (#17268)
Those tests are write-mostly at the nested pool.  Considering we have
3 more layers of caching underneath, we can hint ZFS how to use the
memory better by setting primarycache=metadata.

While there, add missing zpool sync after rm in checkpoint_capacity
before we could potentially see the freed space, would not there be
a pool checkpoint.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-24 16:31:10 -07:00
Sebastian Pauka
1b4826b9a2
Support using llvm-libunwind
This commit adds support for using llvm-libunwind for kernels built
using llvm and clang. The two differences are that the largest register
index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check
whether the register is a floating point register and the prototype
for unw_regname takes the unwind cursor as the first argument.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Sebastian Pauka <me@spauka.se>
Closes #17230
2025-04-24 13:58:48 -04:00
Brian Atkinson
7031a48c70
Export correct symbols for Lustre Direct I/O
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs
for supporting Direct I/O with ZFS. However, this has changed to using
abd_t structs instead. This exports the proper symbols that will be used
by the Lustre ZFS OSD code.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #17256
2025-04-24 13:55:21 -04:00
Artem-OSSRevival
37a3e26552
Add more descriptive destroy error message
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Fixes: #14538
Closes: #17234
2025-04-23 21:17:52 -04:00
Alexander Motin
38c3a8be83
ZTS: Fix 256MB file leak in zed_cksum_reported
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17267
2025-04-23 21:08:39 -04:00
Tino Reichardt
6afb405d96
ZTS: Update FreeBSD version numbers
All defined variants:
- freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE)
- freebsd13-5s, freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes: #17260
2025-04-22 21:17:29 -04:00
Alexander Motin
8f2c2dea3c
ZTS: Remove fixed sleeps from slog_006_pos
Replace `sleep 15` with `zpool wait`, which should take much less
than the 15 seconds.  And considering it is called 16 times, this
should save us up to 4 minutes total.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17257
2025-04-22 21:03:52 -04:00
Alexander Motin
cb49e7701f
ZTS: Polish online_offline tests
- Kill workload first for faster cleanup.
 - Use `zpool wait` for resilver instead of `sleep`.
 - Remove irrelevant workload from `online_offline_003_neg`.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17259
2025-04-22 21:02:31 -04:00
Paul Dagnelie
5f5321effa
Handle interaction between gang blocks, copies, and FDT.
With the advent of fast dedup, there are no longer separate dedup tables
for different copies values. There is now logic that will add DVAs to
the dedup table entry if more copies are needed for new writes. However,
this interacts poorly with ganging. There are two different cases that
can result in mixed gang/non-gang BPs, which are illegal in ZFS.

This change modifies updates of existing FDT; if there are already gang
DVAs in the FDT, we prevent the new write from extending the DDT
entry. We cannot safely mix different gang trees in one block
pointer. if there are non-gang DVAs in the FDT, then this allocation may
not be gangs. If it would gang, we have to redo the whole write as a
non-dedup write.

This change also fixes a refcount leak that could occur if the lead DDT
write failed.

Sponsored by: iXsystems, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes: #17123
2025-04-21 11:26:30 -04:00
Alexander Motin
1d8f625233
ZTS: Remove ashift setting from dedup_quota test (#17250)
The test writes 1M of 1KB blocks, which may produce up to 1GB of
dirty data.  On top of that ashift=12 likely produces additional
4GB of ZIO buffers during sync process.  On top of that we likely
need some page cache since the pool reside on files.  And finally
we need to cache the DDT.  Not surprising that the test regularly
ends up in OOMs, possibly depending on TXG size variations.

Also replace fio with pretty strange parameter set with a set of
dd writes and TXG commits, just as we neeed here.

While here, remove compression.  It has nothing to do here, but
waste CI CPU time.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-18 14:42:52 -07:00
Tony Hutter
8d1489735b
nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist()
Add nvlist_snprintf() to print a nvlist to a buffer.  This is basically
the snprintf() version of dump_nvlist().  Along with that, add a
zfs_dbgmsg_nvlist() to print out an nvlist to dbgmsg.  This will aid in
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17215
2025-04-18 09:22:16 -04:00
Tony Hutter
ba03054c83
CI: Add Fedora 42 runner (#17249)
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-16 17:00:59 -07:00
Tony Hutter
155847c72d
GCC 15: Fix unterminated-string-initialization (#17244)
Fix build errors on Fedora 42 like:

  module/zcommon/zfs_valstr.c:193:16: error: initializer-string for
  array of 'char' truncates NUL terminator but destination lacks
  'nonstring' attribute (3 chars into 2 available)

The arrays in zpool_vdev_os.c and zfs_valstr.c don't need to be
NULL terminated, but we do so to make GCC happy.

Closes: #17242

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:33:29 -07:00
Alexander Motin
4866c2fabf
Cleanup VERIFY() macros (#17163)
- Fix VERIFY3B() when given non-boolean values.
 - Map EQUIV() into VERIFY3B(,==,) as equivalent.
 - Tune messages for better readability and to closer match source
code for easier search.  Unify user-space messages with kernel.
 - Tune printed types and remove %px outside of Linux kernel.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: @ImAwsumm
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:01:32 -07:00
Rob Norris
131df3bbf2
vdev_to_nvlist_iter: ignore draid parameters when matching names (#17228)
Various tools will display draid vdev names with parameters embedded in
them, but would not accept them as valid vdev names when looking them
up, making it difficult to build pipelines involving draid vdevs.

This commit makes it so that if a full draid name is offered for match,
it gets truncated at the first ':' character.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-14 17:10:48 -07:00
IIIPr0t0typ3III
189dc26296
Fixed zfs_notify_email for programs like sendmail
zfs_notify_email will now include an empty line separating the header
from the body of the email in case the subject is not provided via a
command line argument. This is necessary for programs like sendmail to
function correctly (everything up to the first empty line is interpreted
as header, which previously resulted in either missing message parts or
unsent emails)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Felix Schmidt <felixschmidt20@aol.com>
Closed #17238
2025-04-12 11:58:19 -04:00
Rob Norris
5ab601771c
config: fix ZFS_LINUX_TEST_RESULT_SYMBOL with --enable-linux-builtin
The tiniest typo in dd2a46b5e6 (#17106) broke it, by setting the wrong
var with the test var, resulting in it always producing "no".

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17236
2025-04-11 14:56:14 -04:00
Tony Hutter
ab9bb193f9
Linux 6.0 compat: Check for migratepage VFS (#17217)
The 6.0 kernel removes the 'migratepage' VFS op. Check for
migratepage.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org
2025-04-10 17:36:31 -07:00
Alexander Motin
a497c5fc8b
Improve L2 caching control for prefetched indirects
dbuf_prefetch_impl() should look on level of current indirect, not
the target prefetch level.  dbuf_prefetch_indirect_done() should
call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it.
It should fix some both false positive and negative L2ARC caching.

While there, fix redacted feature activation assertions.  One was
always true, while another could give false positive if dpa_dnode
is NULL.

George Amanakis <gamanakis@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17204
2025-04-08 19:43:32 -04:00
Tony Hutter
8f08dbfbe1
debian: Add libtirpc-dev dependency (#17220)
Debian requires libtirpc-dev.  Update our debian/control file to
match Debian's upstream one.

Closes: #17197

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: @manfromafar
2025-04-07 17:06:44 -07:00
Martin Matuška
88e3885cf4
freebsd: unbreak module/Makefile.bsd build on 15-CURRENT-arm64
- don't include foreign machine assembly files
- reduce diff to FreeBSD module Makefile

Discovered in FreeBSD port filesystems/openzfs-kmod

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #17219
2025-04-05 19:43:41 -04:00
Richard Kojedzinszky
09fc7bb47e
Fix memory leaks in pool properties handling
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #17208
2025-04-05 19:40:55 -04:00
Syed Shahrukh Hussain
78a7c78bdf
Added fix for zpool get state segfaults with two or more vdevs (#15972). (#17213)
The problem was identified in handling of the zpool get state command
line arguments. A pointer vdev was used to point to the argv[1], and
its address set to cb.cb_vdevs.cb_names(pointer to array of strings)
so any increment to cb_names resulted in a segfault. Fix covers a
special case of root parameter at argv[1] and remaining cases are
handled by passing in the argv + 1, which allows cb_names iteration
of next command line arguments (vdevs).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>

Signed-off-by: Syed Shahrukh Hussain <syed.shahrukh@ossrevival.org>
2025-04-04 15:34:38 -07:00
Paul Dagnelie
b14b3e3985
Fix FDT rollback to not overwrite unnecessary fields (#17205)
When a dedup write fails, we try to roll the DDT entry back to a known
good state. However, this also rolls the refcounts and the last-update
time back to the state they were at when we started this write. This
doesn't appear to be able to cause any refcount leaks (after the fix in
17123). This PR prevents that from happening by only rolling back the
parts of the DDT entry that have been updated by the write so far.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-04 11:10:44 -07:00
Ameer Hamza
c050b7315d zts: add spdx license tags to default_quota tests
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:37:09 -07:00
Ameer Hamza
7bb13950b4 Add tests for defaultprojectquota
Extend project quota test coverage to verify defaultprojectquota
behavior. These build on existing project quota tests with additional
cases specific to defaultprojectquota functionality.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:37:04 -07:00
Todd Seidelmann
c967faf19e Add tests for default user/group quota functionality
Extend test coverage to verify default user and group quota
functionality. These build on existing user/group quota tests with
additional cases specific to default quotas functionality.
Added on top of: https://github.com/openzfs/zfs/pull/16283/commits/e08cd97

Signed-off-by: Todd Seidelmann <seidelma@wharton.upenn.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:36:53 -07:00
Ameer Hamza
7c4ff2a051 zfsprops.7 manpage changes for default quotas
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:36:49 -07:00
Ameer Hamza
6f6c504700 Show default quotas in zfs userspace tools
Update zfs userspace, groupspace, and projectspace to display the
default quotas when no per-ID specific quota is configured. This
ensures tool outputs align with enforced limits.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:36:45 -07:00
Ameer Hamza
9cb9a59e1c Report default quotas via kernel interfaces
Ensure default user/group/project quotas are visible through quota
tools and filesystem stats when no per-ID quota is configured. This
maintains consistency between quota visibility and configured defaults.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:36:38 -07:00
Ameer Hamza
20705a8430 Enforce default quotas when no per-ID quota is set
Update zfs_id_overobjquota() and zfs_id_overblockquota() to enforce
default user/group/project quotas (block and object-based) when no
per-user, per-group, or per-project quota exists. If a specific quota
is not configured for an ID, the default quota value is applied.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:36:25 -07:00
Ameer Hamza
2a8d9d9607 Add default user/group/project quota properties
This adds default userquota, groupquota, and projectquota properties to
MASTER_NODE_OBJ to make them accessible during zfsvfs_init() (regular
DSL properties require dsl_config_lock, which cannot be safely acquired
in this context). The zfs_fill_zplprops_impl() logic is updated to read
these default properties directly from MASTER_NODE_OBJ.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-03 10:35:22 -07:00
Paul Dagnelie
7be9fa259e
Fix nonrot property being incorrectly unset (#17206)
When opening a vdev and setting the nonrot property, we used to wait for
each child to be opened before examining its nonrot property. When the
change was made to open vdevs asynchronously, we didn't move the nonrot
check out of the main loop. As a result, the nonrot property is almost
always set to false, regardless of the actual type of the underlying
disks. The fix is simply to move the nonrot check to a separate loop
after the taskq has been waited for.

Sponsored-by: Klara, Inc.
Sponsored-by: Eshtek, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-04-02 12:11:33 -07:00
Attila Fülöp
5b0c27cd14 ZTS: Fix zpool dry run tests output formating
Signed-off-by: Attila Fülöp <attila@fueloep.org>
2025-04-01 16:11:17 -07:00
Attila Fülöp
029c4ae03a ZTS: Fix zpool dry run tests depending on output format
Signed-off-by: Attila Fülöp <attila@fueloep.org>
2025-04-01 16:11:11 -07:00
Friedrich Weber
047803e906
contrib/initramfs: use LVM autoactivation for activating VGs (#17125)
Currently, the zfs initramfs-tools boot script under local-top calls
`vgchange -ay`, which unconditionally activates all logical volumes
(LVs) in all discovered volume groups (VGs). This causes all LVs to be
active after boot. However, users may prefer to not activate certain
VGs/LVs on boot. They might normally use the `--setautoactivation n`
VG/LV flag or the `auto_activation_volume_list` LVM config option to
achieve this, but since the script unconditionally activates all LVs,
neither has an effect.

To fix this, call `vgchange -aay` instead. This triggers LVM
autoactivation, which honors autoactivation settings such as the
`--setautoactivation` flag. It is also more in line with the LVM
documentation, which says autoactivation is "meant to be used by
activation commands that are run automatically by the system" [1].

Note that this change might break misconfigured setups that have ZFS
on top of an LV for which autoactivation is disabled.

[1] https://gitlab.com/lvmteam/lvm2/-/blob/cff93e4d/conf/example.conf.in#L1579


Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>

Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-01 16:07:46 -07:00
Martin Matuška
87f8bf6b0c
Multiple printf() size fixes (#17199)
cmd/zinject/zinject.c:
 - use PRIu64 when printing uint64_t

tests/zfs-tests/cmd/clonefile.c:
 - use an unsigned long long to store result from strtoull()
 - use %jd for printing off_t, %zu for size_t, %zd for ssize_t

tests/zfs-tests/tests/functional/vdev_disk/page_alignment.c:
 - use %zx to print size_t

Discovered when compiling on FreeBSD i386.

Signed-off-by: Martin Matuska <mm@FreeBSD.org>

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: @ImAwsumm
2025-04-01 15:27:03 -07:00
Alexander Motin
301da593ad
Fix lock reversal on device removal cancel
FreeBSD kernel's WITNESS code detected lock ordering violation in
spa_vdev_remove_cancel_sync().  It took svr_lock while holding
ms_lock, which is opposite to other places.  I was thinking to
resolve it similar to #17145, but looking closer I don't think
we even need svr_lock at that point, since we already asserted
svr_allocd_segs is empty, and we don't need to add there segments
we are going to call free_mapped_segment_cb for.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17164
2025-04-01 09:31:24 -04:00
Paul Dagnelie
367d34b3aa
Fix dspace underflow bug
Since spa_dspace accounts only normal allocation class space,
spa_nonallocating_dspace should do the same.  Otherwise we may get
negative overflow or respective assertion spa_update_dspace() if
removed special/dedup vdev is bigger than all normal class space.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17183
2025-04-01 09:23:43 -04:00
Piotr Kubaj
11ca12dbd3
simd_powerpc.h: enable FPU on FreeBSD
FreeBSD nowadays supports FPU in the kernel on powerpc*, so enable it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Closes #17191
2025-04-01 09:18:38 -04:00
Rob Norris
75e921da6f
kstat: silence "maybe uninitialized" warnings
Firmly in the "shouldn't happen" camp, but at least GCC 7.4 (Ubuntu
18.04) complained about them, and it's easy to shut up, so do so.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17189
2025-03-28 22:25:47 -04:00
Alexander Motin
5b29e70ae1
Remove mg_allocators (#17192)
Previous code allowed each metaslab group to have different number
of allocators.  But in practice it worked only for embedded SLOGs,
relying on a number of conditions and creating a significant mine
field if any of those change.  I just stepped on one myself.

This change makes all groups to have spa_alloc_count allocators.
It may cost us extra 192 bytes of memory per normal top-level vdev
on large systems, but I find it a small price for cleaner and more
reliable code.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17188
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
2025-03-28 13:11:10 -07:00
Ameer Hamza
30cc2331f4
zed: Ensure spare activation after kernel-initiated device removal
In addition to hotplug events, the kernel may also mark a failing vdev
as REMOVED. This was observed in a customer report and reproduced by
forcing the NVMe host driver to disable the device after a failed reset
due to command timeout. In such cases, the spare was not activated
because the device had already transitioned to a REMOVED state before
zed processed the event.
To address this, explicitly attempt hot spare activation when the
kernel marks a device as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17187
2025-03-28 15:48:38 -04:00
Rob Norris
dd2a46b5e6
config: cache results of kernel checks (#17106)
Kernel checks are the heaviest part of the configure checks. This allows
the results to be cached through the normal autoconf cache.

Since we don't want to reuse cached values for different kernels, but
don't want to discard the entire cache on every kernel, we instead add a
short checksum to kernel config cache keys, based on the version and
path, so the cache can hold results for multiple different kernels.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-27 16:44:54 -07:00
Alexander Motin
4abc21b28c
Block remap for cloned blocks on device removal
When after device removal we handle block pointers remap, skip blocks
that might be cloned.  BRTs are indexed by vdev id and offset from
block pointer's DVA[0].  So if we start addressing the same block by
some different DVA, we won't get the proper reference counter.  As
result, we might either remap the block twice, that may result in
assertion during indirect mapping condense, or free it prematurely,
that may result in data overwrite, or free it twice, that may result
in assertion in spacemap code.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15604
Closes #17180
2025-03-26 16:45:34 -07:00
Tony Hutter
50d87fed6a
runners: Fix tarball build for zfs-qemu-packages workflow (#17158)
The initial tarballs we built for for zfs-2.3.1 were incorrect since
they did not have a ./configure script, and their files were not
in a top level zfs-2.3.1/ directory.  This commit copies the way we
built them on buildbot so the tarballs are created as expected.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-26 13:13:33 -07:00
Tony Hutter
240fc4a6d1
runners: Fix zfs-release RPM creation (#17173)
The zfs-qemu-packages workflow was incorrectly copying the built
zfs-release RPMs to ~/zfsonlinux.github.com rather than ~/zfs.  This
meant that the RPMs were not being correctly picked in the artifacts
files.  This fixes the issue.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @ImAwsumm
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-03-26 12:57:07 -07:00
Pavel Snajdr
a0e62718cf
Linux: Fix zfs_prune panics v2 (#17121)
It turns out that approach taken in the original version of the patch
was wrong. So now, we're taking approach in-line with how kernel
actually does it - when sb is being torn down, access to it
is serialized via sb->s_umount rwsem, only when that lock is taken
is it okay to work with s_flags - and the other mistake I was doing
was trying to make SB_ACTIVE work, but apparently the kernel checks
the negative variant - not SB_DYING and not SB_BORN.

Kernels pre-6.6 don't have SB_DYING, but check if sb is hashed
instead.

Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-25 15:20:16 -07:00
Tony Hutter
9611dfdc70
Linux 6.14 compat: META (#17098) (#17172)
Update the META file to reflect compatibility with the 6.14
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @ImAwsumm
2025-03-25 10:35:01 -07:00
Tony Hutter
885f87fa3e
ZTS: Fix zpool_status_features_001_pos local test (#17174)
Update 'zfs-helpers.sh -i' to install the compatibility.d/ file
symlinks. These are need to run the zpool_status_features_001_pos test
from a local workspace (as opposed to running ZTS from a formal
'make install' or install from RPMs, which are unaffected).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: @ImAwsumm
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-25 10:30:48 -07:00
Simon Howard
fd018248d5 Disambiguate reference to kibibytes, not kilobytes
A minor nitpick that is kind of obvious based on the surrounding context
and reference to powers of two. It's better to be explicit, though.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:43 -07:00
Simon Howard
ef81812726 Fix spelling errors
Unlike some of my other fixes which are more subtle, these are
unambigously spelling errors.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:40 -07:00
Simon Howard
e759a86fa5 Correct "umount" to "unmount" in a couple of places
This is admittedly a nitpicky change, but `umount` is the command that
performs an *unmount*. So if we are talking about unmounting something
we should phrase it that way.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:36 -07:00
Simon Howard
1d4505d7a1 Capitalize in various places where appropriate
These are mostly acronyms (CPUs; ZILs) but also proper nouns such as
"Unix" and "Unicode" which should also be capitalized.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:34 -07:00
Simon Howard
b386bf87c1 Fix cases where "descendent" is used as a noun
As per Wiktionary: "descendent" may be used as an adjective (e.g.
"a descendent dataset") but for nouns (e.g. "descendants of this
dataset"), "descendant" is the correct spelling.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:31 -07:00
Simon Howard
73494f3352 Make use of "i.e." (id est) consistent
This is the most common way it is written throughout the manpages, but
there are a few cases where it is written slightly differently.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:37:26 -07:00
Simon Howard
530ddcd5f1 Harmonize on American spelling in several places
Most of the documentation is written in American English, so it makes
sense to be consistent.

Signed-off-by: Simon Howard <fraggle@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-24 14:36:34 -07:00
Alexander Motin
94a3fabcb0
Unified allocation throttling (#17020)
Existing allocation throttling had a goal to improve write speed
by allocating more data to vdevs that are able to write it faster.
But in the process it completely broken the original mechanism,
designed to balance vdev space usage.  With severe vdev space use
imbalance it is possible that some with higher use start growing
fragmentation sooner than others and after getting full will stop
any writes at all.  Also after vdev addition it might take a very
long time for pool to restore the balance, since the new vdev does
not have any real preference, unless the old one is already much
slower due to fragmentation.  Also the old throttling was request-
based, which was unpredictable with block sizes varying from 512B
to 16MB, neither it made much sense in case of I/O aggregation,
when its 32-100 requests could be aggregated into few, leaving
device underutilized, submitting fewer and/or shorter requests,
or in opposite try to queue up to 1.6GB of writes per device.

This change presents a completely new throttling algorithm. Unlike
the request-based old one, this one measures allocation queue in
bytes.  It makes possible to integrate with the reworked allocation
quota (aliquot) mechanism, which is also byte-based.  Unlike the
original code, balancing the vdevs amounts of free space, this one
balances their free/used space fractions.  It should result in a
lower and more uniform fragmentation in a long run.

This algorithm still allows to improve write speed by allocating
more data to faster vdevs, but does it in more controllable way.
On top of space-based allocation quota, it also calculates minimum
queue depth that vdev is allowed to maintain, and respectively the
amount of extra allocations it can receive if it appear faster.
That amount is based on vdev's capacity and space usage, but also
applied only when the pool is busy.  This way the code can choose
between faster writes when needed and better vdev balance when not,
with the choice gradually reducing together with the free space.

This change also makes allocation queues per-class, allowing them
to throttle independently and in parallel.  Allocations that are
bounced between classes due to allocation errors will be able to
properly throttle in the new class.  Allocations that should not
be throttled (ZIL, gang, copies) are not, but may still follow
the rotor and allocation quota mechanism of the class without
disrupting it.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
2025-03-24 09:25:01 -07:00
Alexander Motin
3862ebbf1f
CI: Remove FreeBSD 13.3 and 14.1 tests (#17162)
They are out of support and we are really low on CI resources.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2025-03-20 17:10:32 -07:00
Rob Norris
45e9b54e9e freebsd/kstat: allow multi-level module names
This extends the existing special-case for zfs/poolname to split and
create any number of intermediate sysctl names, so that multi-level
module names are possible.

Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-20 16:25:32 -07:00
Rob Norris
d28d2e3007 linux/kstat: allow multi-level module names
Module names are mapped directly to directory names in procfs, but
nothing is done to create the intermediate directories, or remove them.
This makes it impossible to sensibly present kstats about sub-objects.

This commit loops through '/'-separated names in the full module name,
creates a separate module for each, and hooks them up with a parent
pointer and child counter, and then unrolls this on the other side when
deleting a module.

Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-20 16:24:50 -07:00
Rob Norris
5b5a514955
zts: add spdx license tags to gang_blocks tests (#17160)
Missed in #17073, probably because that PR was branched before #17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-20 09:01:11 -07:00
Paul Dagnelie
9250403ba6
Make ganging redundancy respect redundant_metadata property (#17073)
The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-19 15:58:29 -07:00
Brian Atkinson
94b9cbbe1e
Updating dio_read_verify ZTS test (#16830)
There was a recent CI ZTS test failure on FreeBSD 14 for the
dio_read_verify test case. The failure reported there was no ARC reads
while the buffer wes being manipulated. All checksum verify errors for
Direct I/O reads are rerouted through the ARC, so there should be ARC
reads accounted for. In order to help debug any future failures of this
test case, the order of checks has been changed. First there is a check
for DIO verify failures for the reads and then ARC read counts are
checked.

This PR also contains general cleanup of the comments in the test
script.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2025-03-19 13:37:49 -07:00
Alexander Motin
676b7ef104
Fix deadlock on I/O errors during device removal
spa_vdev_remove_thread() should not hold svr_lock while loading a
metaslab.  It may block ZIO threads, required to handle metaslab
loading, at least in case of read errors causing recovery writes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17145
2025-03-19 14:48:47 -04:00
aokblast
83fa051ceb
spl_vfs: fix vrele task runner signature mismatch
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: SHENGYI HONG <aokblast@FreeBSD.org>
Closes #17101
2025-03-19 11:26:45 -04:00
Alan Somers
d033f26765
Always perform bounds-checking in metaslab_free_concrete
The vd->vdev_ms access can overflow due to on-disk corruption, not just
due to programming bugs.  So it makes sense to check its boundaries even
in production builds.

Sponsored by:	ConnectWise
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17136
2025-03-19 11:24:43 -04:00
Alexander Motin
3cd9934a48
Some arc_release() cleanup
- Don't drop L2ARC header if we have more buffers in this header.
Since we leave them the header, leave them the L2ARC header also.
Honestly we are not required to drop it even if there are no other
buffers, but then we'd need to allocate it a separate header, which
we might drop soon if the old block is really deleted.  Multiple
buffers in a header likely mean active snapshots or dedup, so we
know that the block in L2ARC will remain valid.  It might be rare,
but why not?
 - Remove some impossible assertions and conditions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17126
2025-03-18 21:25:50 -04:00
Rob Norris
6e89095873 convert_wycheproof: don't check tag len on invalid tests
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-03-18 17:09:49 -07:00
Rob Norris
cb4b854838 convert_wycheproof: fix compile failure
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-03-18 17:07:32 -07:00
Rob Norris
f69631992d
dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143)
This helps to avoids confusion with the similarly-named
txg_wait_synced().

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-18 16:04:22 -07:00
Alexander Motin
21850f519b
ZTS: Remove TXG_TIMEOUT from dedup_quota test (#17150)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system
with the amount of dirty data caused by DDT updates within one
TXG due to tiny 1KB records used, while I see no reason for this
test to extend the TXGs beyond default.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: @ImAwsumm
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-17 09:54:50 -07:00
Rob Norris
f5312d2996 spdxcheck: program to check SPDX license tags
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:51 -07:00
Rob Norris
a8847a7e4f SPDX: license tags: LicenseRef-OpenZFS-ThirdParty-PublicDomain
SPDX have repeatedly rejected the creation of a tag for a public domain
dedication, as not all dedications are clear and unambiguious in their
meaning and not all jurisdictions permit relinquishing a copyright
anyway.

A reasonably common workaround appears to be to create a local
(project-specific) identifier to convey whatever meaning the project
wishes it to. To cover OpenZFS' use of third-party code with a public
domain dedication, we use this custom tag.

Further reading:
- https://github.com/spdx/old-wiki/blob/main/Pages/Legal%20Team/Decisions/Dealing%20with%20Public%20Domain%20within%20SPDX%20Files.md
- https://spdx.github.io/spdx-spec/v2.3/other-licensing-information-detected/
- https://cr.yp.to/spdx.html

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:31 -07:00
Rob Norris
bf754d6010 SPDX: license tags: OpenSSL-standalone
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:27 -07:00
Rob Norris
44dc09614e SPDX: license tags: Brian-Gladman-3-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:24 -07:00
Rob Norris
9fd7401d75 SPDX: license tags: BSD-2-Clause OR GPL-2.0-only
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:21 -07:00
Rob Norris
ff28d9fc6f SPDX: license tags: BSD-3-Clause OR GPL-2.0-only
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:17 -07:00
Rob Norris
f8d29099e7 SPDX: license tags: LGPL-2.1-or-later
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:12 -07:00
Rob Norris
f83431b3bd SPDX: license tags: GPL-2.0-or-later
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:09 -07:00
Rob Norris
c404ffb446 SPDX: license tags: Apache-2.0
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:57:05 -07:00
Rob Norris
7d8dd8d9a5 SPDX: license tags: MIT
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:56:54 -07:00
Rob Norris
4eafa9e5e8 SPDX: license tags: BSD-3-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:56:50 -07:00
Rob Norris
137045be98 SPDX: license tags: BSD-2-Clause
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:56:46 -07:00
Rob Norris
eb9098ed47 SPDX: license tags: CDDL-1.0
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-03-13 17:56:27 -07:00
shodanshok
201d262949
Add receive:append permission for limited receive
Force receive (zfs receive -F) can rollback or destroy snapshots and
file systems that do not exist on the sending side (see zfs-receive man
page). This means an user having the receive permission can effectively
delete data on receiving side, even if such user does not have explicit
rollback or destroy permissions.

This patch adds the receive:append permission, which only permits
limited, non-forced receive. Behavior for users with full receive
permission is not changed in any way.

Fixes #16943
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #17015
2025-03-13 13:54:14 -04:00
Paul Dagnelie
1b495eeab3
FDT dedup log sync -- remove incremental
This PR condenses the FDT dedup log syncing into a single sync
pass. This reduces the overhead of modifying indirect blocks for the
dedup table multiple times per txg. In addition, changes were made to
the formula for how much to sync per txg. We now also consider the
backlog we have to clear, to prevent it from growing too large, or
remaining large on an idle system.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Don Brady <don.brady@klarasystems.com>
Authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17038
2025-03-13 13:47:03 -04:00
Alan Somers
084531bcce
Update FreeBSD CI images
* FreeBSD 12 is EoL.  Drop it.
* Use the latest FreeBSD 13 and 14 versions.
* Add FreeBSD 15.0-CURRENT.
* Use the current python version.

Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17139
2025-03-13 13:31:31 -04:00
Alexander Motin
0ea44e576b
Fix deduplication of overridden blocks
Implementation of DDT pruning introduced verification of DVAs in
a block pointer during ddt_lookup() to not by mistake free previous
pruned incarnation of the entry.  But when writing a new block in
zio_ddt_write() we might have the DVAs only from override pointer,
which may never have "D" flag to be confused with pruned DDT entry,
and we'll abandon those DVAs if we find a matching entry in DDT.

This fixes deduplication for blocks written via dmu_sync() for
purposes of indirect ZIL write records, that I have tested.  And
I suspect it might actually allow deduplication for Direct I/O,
even though in an odd way -- first write block directly and then
delete it later during TXG commit if found duplicate, which part
I haven't tested.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17120
2025-03-13 13:27:57 -04:00
Alexander Motin
09f4dd06c3
Prefer embedded blocks to dedup
Since embedded blocks introduction 11 years ago, their writing was
blocked if dedup is enabled.  After searching through the modern
code I see no reason for this restriction to exist.  Same time
embedded blocks are dramatically cheaper.  Even regular write of
so small blocks would likely be cheaper than deduplication, even
if the last is successful, not mentioning otherwise.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17113
2025-03-13 13:27:15 -04:00
Rob Norris
13ec35ce3b
Linux/vnops: implement STATX_DIOALIGN
This statx(2) mask returns the alignment restrictions for O_DIRECT
access on the given file.

We're expected to return both memory and IO alignment. For memory, it's
always PAGE_SIZE. For IO, we return the current block size for the file,
which is the required alignment for an arbitrary block, and for the
first block we'll fall back to the ARC when necessary, so it should
always work.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16972
2025-03-13 13:15:14 -04:00
Alan Somers
0433523ca2
Verify every block pointer is either embedded, hole, or has a valid DVA
Now instead of crashing when attempting to read the corrupt block
pointer, ZFS will return ECKSUM, in a stack that looks like this:

```
none:set-error
zfs.ko`arc_read+0x1d82
zfs.ko`dbuf_read+0xa8c
zfs.ko`dmu_buf_hold_array_by_dnode+0x292
zfs.ko`dmu_read_uio_dnode+0x47
zfs.ko`zfs_read+0x2d5
zfs.ko`zfs_freebsd_read+0x7b
kernel`VOP_READ_APV+0xd0
kernel`vn_read+0x20e
kernel`vn_io_fault_doio+0x45
kernel`vn_io_fault1+0x15e
kernel`vn_io_fault+0x150
kernel`dofileread+0x80
kernel`sys_read+0xb7
kernel`amd64_syscall+0x424
kernel`0xffffffff810633cb
```

This patch should hopefully also prevent such corrupt block pointers
from being written to disk in the first place.

And in zdb, don't crash when printing a block pointer with no valid
DVAs.  If a block pointer isn't embedded yet doesn't have any valid
DVAs, that's a data corruption bug.  zdb should be able to handle the
situation gracefully.

Finally, remove an extra check for gang blocks in SNPRINTF_BLKPTR.  This
check, which compares the asizes of two different DVAs within the same
BP, was added by illumos-gate commit b24ab67[^1], and I can't understand
why.  It doesn't appear to do anything useful, so remove it.

[^1]: b24ab67627

Fixes		#17077
Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17078
2025-03-13 13:07:48 -04:00
Rob Norris
4ddaf45ba4
AUTHORS: refresh with recent new contributors
The reward for good work is more work.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closed #17141
2025-03-13 10:35:31 -04:00
Fabian-Gruenbichler
41823a0ede
linux: zvols: correctly detect flush requests (#17131)
since 4.10, bio->bi_opf needs to be checked to determine all kinds of
flush requests. this was the case prior to the commit referenced below,
but the order of ifdefs was not the usual one (newest up top), which
might have caused this to slip through.

this fixes a regression when using zvols as Qemu block devices, but
might have broken other use cases as well. the symptoms are that all
sync writes from within a VM configured to use such a virtual block
devices are ignored and treated as async writes by the host ZFS layer.

this can be verified using fio in sync mode inside the VM, for example
with

 fio \
 --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \
 --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \
 --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \
 --numjobs=1 --sync=1

which shows an IOPS number way above what the physical device underneath
supports, with "zpool iostat -r 1" on the hypervisor side showing no
sync IO occuring during the benchmark.

with the regression fixed, both fio inside the VM and the IO stats on
the host show the expected numbers.

Fixes: 846b598519
"config: remove HAVE_REQ_OP_* and HAVE_REQ_*"

Signed-off-by: Fabian-Gruenbichler <f.gruenbichler@proxmox.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-03-12 14:39:01 -07:00
Tony Hutter
62a9d372f8
zed: Print return code on failed zpool_prepare_disk
We had a case where we were autoreplacing a disk and
zpool_prepare_disk failed for some reason, and ZED
didn't log the return code.  This commit logs the code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17124
2025-03-12 09:52:36 -04:00
Alexander Motin
fe674998bb
Check portable objset MAC even if local is zeroed
PR #14161 made spa_do_crypt_objset_mac_abd() to ignore MAC errors
if local MAC can not be calculated at the time.  But it does not
mean we should also ignore portable MAC errors there.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17122
2025-03-08 21:15:11 -05:00
Tony Hutter
57f192fcaa
Add 'zfs-qemu-packages' workflow for RPM building
Add a new 'zfs-qemu-packages' GH workflow for manually building RPMs
and test installing ZFS RPMs from a yum repo. The workflow has a
dropdown menu in the Github runners tab with two options:

Build RPMs - Build release RPMs and tarballs and put them into an
             artifact ZIP file.  The directory structure used in
             the ZIP file mirrors the ZFS yum repo.

Test repo -  Test install the ZFS RPMs from the ZFS repo.  On
             Almalinux, this will do a DKMS and KMOD test install
             from both the regular and testing repos.  On Fedora,
             it will do a DKMS install from the regular repo.  All
             test install results will be displayed in the Github
             runner Summary page. Note that the workflow provides an
             optional text box where you can specify the full URL to
             an alternate repo.  If left blank, it will install from
             the default repo from the zfs-release RPM.

Most developers will never need to use this workflow.  It is intended
to be used by the ZFS admins for building and testing releases.

This commit also modularizes many of the runner scripts so they can
be used by both the zfs-qemu and zfs-qemu-packages workflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17005
2025-03-05 09:19:56 -08:00
Paul Dagnelie
fc460bfbaf
Add more DDT tests
The new Fast Dedup feature has a lot of moving parts, and only some of
them have tests. We have some tests for prefetch and quota, and a
generic ZAP shrinking test, but we don't have anything for the pruning
command or specific to DDT zap shrinking. Here we add a couple small new
tests for zpool ddtprune and DDT-specific ZAP shrinking.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17049
2025-03-04 20:02:34 -05:00
Rob Norris
a44f423b00 ZTS: replace uses of TMPDIR with mktemp
Most of these are trying to use TMPDIR to put their work files somewhere
sensible. Now that we've set up correctly, they can all just use mktemp
to do the job.

In a couple of places cleaning up temp files wasn't being done
correctly, which has been fixed.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:39:23 -08:00
Rob Norris
72c0fde609 ZTS: make uses of mktemp consistent
In all cases, rely on mktemp itself to make the best decision about
where to place the file or directory. In all cases, that decision will
be $TMPDIR, which we have set globally.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:39:17 -08:00
Rob Norris
60031906b4 ZTS: zfs-tests: set TMPDIR to FILEDIR
Many tests use mktemp to create temporary files and dirs, which will
usually put them in /tmp unless instructed otherwise. This had led to
many tests trying to give mktemp a useful temp path in ad-hoc ways, and
others just using it directly without knowing they're potentially
leaving stuff lying around.

So we set TMPDIR to FILEDIR, which makes the simplest uses of mktemp put
things in the wanted work dir.

Included here is a hack to get TMPDIR into the test. If a test has to be
run as a different user (most of them), it is run through sudo. ld.so
from glibc will not pass TMPDIR to a setuid program, so instead we
re-set TMPDIR after sudo before running the target command.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:39:10 -08:00
Rob Norris
2c897e0666 ZTS: test-runner: always apply timestamp to outputdir before updating
The default outputdir had a timestamp appended in TestRun.__init__, and
then the timestamp was unconditionally applied again after the runfile
had been loaded, assuming that an outputdir would be set in the runfile
too. If the runfile didn't have an outputdir, then the outputdir would
get a second timestamp appended.

Further, if test groups or individual tests themselves specificed an
outputdir, those would be set on their config, but would not get a
timestamp appended. It's not entirely clear if that's wrong or not, but
it is certainly not consistent with the rest.

To clean all this up, change things to append a timestamp to a received
outputdir (from arg or runfile) before setting it in any TestRun,
TestGroup or Test object.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:39:03 -08:00
Rob Norris
4581c4fcbe ZTS: runfiles: remove explicit outputdir
The config file value overrides any set by the operator, making it quite
difficult to put the test output elsewhere. The default is
/var/tmp/test_results (via BASEDIR in test-runner) so this shouldn't
change anything for the default case.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:38:57 -08:00
Rob Norris
cc9180d338 ZTS: zfs-tests: use configured FILEDIR for all temp paths
The default file vdevs, constrained binpath and temporary runfiles were
all explicitly places in /var/tmp. Instead, put them under FILEDIR,
which is set from -d and defaults to /var/tmp. TEST_BASE_DIR is also
initialised from FILEDIR, which means all data for the run will now end
up under the operator-specified data dir.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:38:47 -08:00
Rob Norris
4afec534cc ZTS: replace all uses of /var/tmp with TEST_BASE_DIR
The operator can override TEST_BASE_DIR by setting its source var
FILEDIR through zfs-tests.sh -d. There were a handful of cases where
this was not honoured.

By default FILEDIR (and so TEST_BASE_DIR) is /var/tmp, so there should
be no functional change if the operator does nothing.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
2025-02-27 14:37:13 -08:00
Tony Hutter
f65fc98a8c
Linux 6.13 compat: META (#17098)
Update the META file to reflect compatibility with the 6.13 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-02-27 11:24:36 -08:00
Rob Norris
7f05fface3 gcm_avx_init: zero the ghash state after hashing the IV
IVs != 96 bits get hashed with GHASH to bring them to 96 bits. Any call
to GHASH will mix the ghash state in gcm_ghash. This is expected to be
zero at first use in an encrypt or decrypt operation, so it needs to be
zeroed after using GHASH in setup.

gcm_init() does this, but gcm_avx_init() zeroed it before setup, not
after, resulting in incorrect encrypt/decrypt results when using AVX GCM
with an IV != 96 bits.

OpenZFS _always_ uses a 96 bit IV (ZIO_DATA_IV_LEN) so this will never
have been hit in any real-world use, which is extremely fortunate, as we
would have incorrectly-encrypted data on-disk. Still, as long as we have
this code here we should make sure it's correct.

Thanks-to: Joel Low <joel@joelsplace.sg>
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-02-25 17:31:08 -08:00
Rob Norris
88b0594f93 ZTS: ICP encryption tests
This commit adds tests that ensure that the ICP crypto_encrypt() and
crypto_decrypt() produce the correct results for all implementations
available on this platform.

The actual ZTS scripts are simple drivers for the crypto_test program in
it's "correctness" mode. This mode takes a file full of test vectors
(inputs and expected outputs), runs them, and checks that the results
are expected. It will run the tests for each implementation of the
algorithm provided by the ICP.

The test vectors are taken from Project Wycheproof, which provides a
huge number of tests, including exercising many edge cases and common
implementation mistakes. These tests are provided are JSON files, so a
program is included here to convert them into a simpler line-based
format for crypto_test to consume.

crypto_test also has a "performance" mode, which will run simple
benchmarks against all implementations provded by the ICP and output
them for comparison. This is not used by ZTS, but is available to assist
with development of new implementations of the underlying primitives.

Thanks-to: Joel Low <joel@joelsplace.sg>
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
2025-02-25 17:29:57 -08:00
Tony Hutter
ece35e0e66
zpool: allow relative vdev paths
`zpool create` won't let you use relative paths to disks.  This is
annoying when you want to do:

	zpool create tank ./diskfile

But have to do..

	zpool create tank `pwd`/diskfile

This fixes it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17042
2025-02-25 14:40:20 -05:00
Ameer Hamza
ab3db6d15d
arc: avoid possible deadlock in arc_read
In l2arc_evict(), the config lock may be acquired in reverse order
(e.g., first the config lock (writer), then a hash lock) unlike in
arc_read() during scenarios like L2ARC device removal. To avoid
deadlocks, if the attempt to acquire the config lock (reader) fails
in arc_read(), release the hash lock, wait for the config lock, and
retry from the beginning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17071
2025-02-25 14:32:12 -05:00
Paul Dagnelie
701093c44f
Don't try to get mg of hole vdev in removal
Don't try to get mg of hole vdev in removal

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17080
2025-02-25 14:30:51 -05:00
aokblast
a5fb5c55be
spa: fix signature mismatch for spa_boot_init as eventhandler required
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: SHENGYI HONG <aokblast@FreeBSD.org>
Closes #17088
2025-02-25 14:28:57 -05:00
Alexander Motin
d7d2744711
Better fill empty metaslabs
Before this change zfs_metaslab_switch_threshold tunable switched
metaslabs each time ones index reduced by two (which means biggest
contiguous chunk reduced to 1/4).  It is a good idea to balance
metaslabs fragmentation.  But for empty metaslabs (having power-
of-2 sizes) this means switching when they get just below the half
of their capacity.  Inspection with zdb after filling new pool to
half capacity shown most of its metaslabs filled to half capacity.
I consider this sub-optimal for pool fragmentation in a long run.

This change blocks the metaslabs switching if most of the metaslab
free space (15/16) is represented by a single contiguous range.
Such metaslab should not be considered fragmented until it actually
fail some big allocation.  More contiguous filling should improve
data locality and increase time before previously filled and
partially freed metaslab is touched again, giving it more time to
free more contiguous chunks for lower fragmentation.  It should
also slightly reduce spacemap traffic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17081
2025-02-25 14:26:34 -05:00
Rob Norris
523e3adac9
suspend_resume_single: clear pool errors on fail
If the timing is unfortunate, the pool can suspend just as we're failing
because it didn't suspend. If we don't resume the pool, we hang trying
to destroy it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17054
2025-02-23 15:22:00 -05:00
Rob Norris
ecc44c45cb
include: move zio_priority_t into zfs.h
It's included so it's effectively already part of it, but it's not
always installed as a userspace header, making zfs.h effectively
useless. Might as well just combine it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #17066
2025-02-22 18:46:26 -05:00
Rob Norris
ee8803adc2
vdev_file: make FLUSH and TRIM asynchronous
zfs_file_fsync() and zfs_file_deallocate() are both blocking ops, so the
zio_taskq thread is active and blocked both while waiting for the IO
call and then while calling zio_execute() for the next stage. This is a
particular issue for FLUSH, as the z_flush_iss queue typically only has
one thread; multiple flushes arriving at once can cause long delays if
the underlying fsync() response is particularly slow.

To fix this, we dispatch both FLUSH and TRIM to the z_vdev_file taskq,
just as we do for reads and writes. Further, we return all results
through zio_interrupt(), so neither the issue nor the file taskqs are
blocked.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17064
2025-02-22 14:16:54 -05:00
Chunwei Chen
682c5f6a0a
Fix wrong free function in arc_hdr_decrypt
Need to use arc_free_data_abd to free abd type buffer.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #17079
2025-02-22 13:50:33 -05:00
Rob Norris
c43df8bbbf
vdev_file: unify FreeBSD and Linux implementations (#17046)
Kernel & userspace specifics are in zfs_file_os.c, so there's no
particular reason these have to be separate.

The one platform-specific part is in the Linux kernel part, to offload
flushes to a taskq if we're already inside a filesystem transaction.
This would be normally be an unsatisfying wart, but I'm intending to
remove this shortly, so I'm content to leave it gated for the moment.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2025-02-20 10:42:42 -08:00
Alexander Motin
6a2f7b3844
Fix metaslab group fragmentation math (#17037)
Since we are calculating a free space fragmentation, we should
weight metaslabs by the amount of their free space, not a full
size.  Fragmentation of full metaslabs may not matter in presence
empty ones.  The old algorithm did not differentiate metaslabs
having only one free 4KB block from metaslabs having 50% of space
free in 4KB blocks, reporting higher fragmentation.

While there, move metaslab_group_alloc_update() call after setting
mg_fragmentation, otherwise the effect may be delayed by one TXG.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-18 10:45:42 -08:00
Rob Norris
68473c4fd8 range_tree: convert remaining range_* defs to zfs_range_*
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-02-14 15:37:56 -08:00
Ivan Volosyuk
d4a5a7e3aa Linux 6.12 compat: Rename range_tree_* to zfs_range_tree_*
Linux 6.12 has conflicting range_tree_{find,destroy,clear} symbols.

Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-02-14 15:37:48 -08:00
vandanarungta
db62886d98
Free memory in an error path in spl-kmem-cache.c
skc->skc_name also needs to be freed in an error path.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Vandana Rungta <vrungta@amazon.com>
Closes #17041
2025-02-11 20:37:17 -05:00
Umer Saleem
b901d4a0b6
Update the dataset name in handle after zfs_rename (#17040)
For zfs_rename, after the dataset name is successfully updated,
the dataset handle that was passed to zfs_rename, still contains
the old name, due to which, the dataset handle becomes invalid.
The following operations performed using this handle result in
error since the dataset with old name cannot be found anymore.

changelist_rename does update the names in dataset handles,
but those are temporary handles that were created during
changelist_gather. The original handle that was used to call
zfs_rename is not updated.

We should update the name in original ZFS handle after the IOCTL
for rename returns success for the operation.

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-11 09:07:29 -08:00
Rob Norris
b8c73ab780
zio: do no-op injections just before handing off to vdevs
The purpose of no-op is to simulate a failure between a device cache and
its permanent store. We still want it to go through the queue and
respond in the same way to everything else.

So, inject "success" as the very last thing, and then move on to
VDEV_IO_DONE to be dequeued and so any followup work can occur.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17029
2025-02-07 20:42:24 -05:00
Dr. Christian Kohlschütter
d2147de319
Fix "make install" with DESTDIR set (#16995)
"DESTDIR=/path/to/target/root/ make install" may fail when installing to
a root that contains an existing lib/modules structure. When run as root
we may even affect the wrong kernel (the build system's one, or, if
running a different version, some other directory in /lib/modules, but
not the desired one installed in DESTDIR).

Add a missing reference to the INSTALL_MOD_PATH root when calling
"depmod" during "make install"

Also add a switch "DONT_DELETE_MODULES_FILES=1" that skips the removal
of files named "modules.*" prior to running depmod.

Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
Closes #16994

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-07 16:38:58 -08:00
George Amanakis
c2458ba921
optimize recv_fix_encryption_hierarchy()
recv_fix_encryption_hierarchy() in its present state goes through all
stream filesystems, and for each one traverses the snapshots in order to
find one that exists locally. This happens by calling guid_to_name() for
each snapshot, which iterates through all children of the filesystem.
This results in CPU utilization of 100% for several minutes (for ~1000
filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive
(-w, regardless whether encrypted or not, dryrun or not).

Fix this by following a different logic: using the top_fs name, call
gather_nvlist() to gather the nvlists for all local filesystems. For
each one filesystem, go through the snapshots to find the corresponding
stream's filesystem (since we know the snapshots guid and can search
with it in stream_avl for the stream's fs). Then go on to fix the
encryption roots and locations as in its present state.

Avoiding guid_to_name() iteratively makes
recv_fix_encryption_hierarchy() significantly faster (from several
minutes to seconds for ~1000 filesystems on a Ryzen 4350G).

Another problem is the following: in case we have promoted a clone of
the filesystem outside the top filesystem specified in zfs send, zfs
receive does not fail but returns an error:
recv_incremental_replication() fails to find its origin and errors out
with needagain=1. This results in recv_fix_hierarchy() not being called
which may render some children of the top fs not mountable since their
encryption root was not updated. To circumvent this make
recv_incremental_replication() silently ignore this error.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16929
2025-02-06 15:43:47 -05:00
Paul Dagnelie
88020b993c
Add kstats tracking gang allocations
Gang blocks have a significant impact on the long and short term
performance of a zpool, but there is not a lot of observability into
whether they're being used.  This change adds gang-specific kstats to
ZFS, to better allow users to see whether ganging is happening.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17003
2025-02-06 15:42:50 -05:00
Paul Dagnelie
40496514b8
Expand fragmentation table to reflect larger possibile allocation sizes
When you are using large recordsizes in conjunction with raidz, with
incompressible data, you can pretty reliably be making 21 MB
allocations. Unfortunately, the fragmentation metric in ZFS considers
any metaslabs with 16 MB free chunks completely unfragmented, so you can
have a metaslab report 0% fragmented and be unable to satisfy an
allocation. When using the segment-based metaslab weight, this is
inconvenient; when using the space-based one, it can seriously degrade
performance.

We expand the fragmentation table to extend up to 512MB, and redefine
the table size based on the actual table, rather than having a static
define. We also tweak the one variable that depends on fragmentation
directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #16986
2025-02-06 15:40:01 -05:00
Mateusz Piotrowski
2cccbacefc
Fix typos in zpool_do_scrub() error messages (#17028)
Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-05 15:53:33 -08:00
mnrx
0be1da26cb
Clarify documentation of zfs destroy on snapshots (#17021)
The current documentation of `zfs destroy` in application to snapshots
is particularly difficult to understand. The following changes are made:

- Remove circular reference to `zfs destroy` in the documentation of
  that command.
- Remove use of "for example", which implies there are more,
  undocumented reasons that ZFS may fail to destroy a snapshot
  immediately.
- Mention properties `defer_destroy` and `userrefs`.
- Add `zfsprops(8)` to "SEE ALSO" list.
- Clarify meaning of `-d` option.



Requires-builders: none

Signed-off-by: mnrx <83848843+mnrx@users.noreply.github.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-05 14:47:03 -08:00
Rob Norris
2ca91ba3cf Linux 6.14: BLK_MQ_F_SHOULD_MERGE was removed
According to the upstream change, all callers set it, and all block
devices either honoured it or ignored it, so removing it entirely allows
a bunch of handling for the "unset" case to be removed, and it becomes
effectively implied.

We follow suit, and keep setting it for older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-05 09:43:45 -08:00
Rob Norris
7ef6b70e96 Linux 6.14: dops->d_revalidate now takes four args
This is a convenience for filesystems that need the inode of their
parent or their own name, as its often complicated to get that
information. We don't need those things, so this is just detecting which
prototype is expected and adjusting our callback to match.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-02-05 09:42:37 -08:00
Rob Norris
390f6c1190
zio: lock parent zios when updating wait counts on reexecute
As zios are reexecuted after resume from suspension, their ready and
wait states need to be propagated to wait counts on all their parents.

It's possible for those parents to have active children passing through
READY or DONE, which then end up in zio_notify_parent(), take their
parent's lock, and decrement the wait count. Without also taking a lock
here, it's possible for an increment race to occur, which leads to
either there being no references left (tripping the assert in
zio_notify_parent()), or a parent waiting forever for a nonexistent
child to complete.

To protect against this, we simply take the appropriate zio locks in
zio_reexecute() before updating the wait counts.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17016
2025-02-04 08:47:50 -05:00
Jaydeep Kshirsagar
21205f6488
Avoid ARC buffer transfrom operations in prefetch
This change will prevent prefetch to perform unnecessary ARC buffer
fill when reading from disk.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jaydeep Kshirsagar <jkshirsagar@maxlinear.com>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #17013
2025-02-01 11:15:24 -05:00
Jerzy Kołosowski
387ed5ca41
Add recursive dataset mounting and unmounting support to pam_zfs_key (#16857)
Introduced functionality to recursively mount datasets with a new
config option `mount_recursively`. Adjusted existing functions to
handle the recursive behavior and added tests to validate the feature.
This enhances support for managing hierarchical ZFS datasets within
a PAM context.

Signed-off-by: Jerzy Kołosowski <jerzy@kolosowscy.pl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-31 11:00:59 -08:00
Brian Atkinson
1e32c57893
Update pin_user_pages() calls for Direct I/O
Originally #16856 updated Linux Direct I/O requests to use the new
pin_user_pages API. However, it was an oversight that this PR only
handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter
types may try and use the pin_user_pages API if it is available. This
can lead to panics as the iov_iter is not being iterated over correctly
in zfs_uio_pin_user_pages().

Unfortunately, generic iov_iter API's that call pin_user_page_fast() are
protected as GPL only. Rather than update zfs_uio_pin_user_pages() to
account for all iov_iter types, we can simply just call
zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC
or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the
iov_iter_get_pages() calls that can handle any iov_iter type.

In the future it might be worth using the exposed iov_iter iterator
functions that are included in the header iov_iter.h since v6.7. These
functions allow for any iov_iter type to be iterated over and advanced
while applying a step function during iteration. This could possibly be
leveraged in zfs_uio_pin_user_pages().

A new ZFS test case was added to test that a ITER_BVEC is handled
correctly using this new code path. This test case was provided though
issue #16956.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16956 
Closes #17006
2025-01-30 15:53:59 -08:00
Alan Somers
12f0baf348
Make the vfs.zfs.vdev.raidz_impl sysctl cross-platform
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #16980
2025-01-29 09:18:09 -05:00
rmacklem
34205715e1
FreeBSD: Add setting of the VFCF_FILEREV flag
The flag VFCF_FILEREV was recently defined in FreeBSD
so that a file system could indicate that it increments
va_filerev by one for each change.

Since ZFS does do this, set the flag if defined for the
kernel being built.  This allows the NFSv4.2 server to
reply with the correct change_attr_type attribute value.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closed #16976
2025-01-22 19:33:43 -05:00
Rob Norris
26e38aec46 zinject: add "probe" device injection type
Injecting a device probe failure is not possible by matching IO types,
because probe IO goes to the label regions, which is explicitly excluded
from injection. Even if it were possible, it would be awkward to do,
because a probe is sequence of reads and writes.

This commit adds a new IO "type" to match for injection, which looks for
the ZIO_FLAG_PROBE flag instead. Any probe IO will be match the
injection record and recieve the wanted error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16947
2025-01-22 16:13:21 -08:00
Rob Norris
dfdc5ea993 zinject: make iotype extendable
I'm about to add a new "type", and I need somewhere to put it!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16947
2025-01-22 16:12:31 -08:00
Rob Norris
59a77512ca ZTS: remove get_arcstat
It's now a simple wrapper, so lets just call kstat direct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-21 10:13:12 -08:00
Rob Norris
4acce93fde ZTS: update existing kstat users to new helper
Removes other custom helpers and direct accesses to /proc.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-21 10:12:42 -08:00
Rob Norris
722cf7bfd5 ZTS: reimplement kstat helper function
The old kstat helper function was barely used, I suspect in part because
it was very limited in the kinds of kstats it could gather.

This adds new functions to replace it, for each kind of thing that can
have stats: global, pool and dataset. There's options in there to get a
single stat value, or all values within a group.

Most importantly, the interface is the same for both platforms.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-21 10:12:07 -08:00
Tim Smith
b8c0c154ad
Fix several typos in the man pages
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tim Smith <tsmith84@gmail.com>
Closes #16965
2025-01-21 10:30:17 -05:00
Alexander Ziaee
919bc4d10e
zfs-destroy.8: Fix minor formatting typo
The warning at the end of the second example in the description section
was actually inside the options table. Move the El macro to match what
is done in the first section for improved readability.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #16962
2025-01-20 14:37:52 -05:00
Tony Hutter
788e69ca5d
Update RELEASES.md LTS release to 2.2
2.3.0 is out now, so make 2.2.x the LTS release.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16945
Closes #16948
2025-01-17 11:04:36 -05:00
Peng Liu
aaeb7fa3d0
style: remove unnecessary spaces in sa.h
Removed three unnecessary spaces in the definition of the
sa_attr_reg_t structure to improve code style consistency
and adhere to OpenZFS coding standards.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Peng Liu <littlenewton6@gmail.com>
Closes #16955
2025-01-16 21:09:45 -05:00
Rob Norris
fe44c5ae27
Makefile.in: pass ARCH for modules_install as well
To do a cross-build using only kbuild rather than a full source tree,
ARCH= needs to be passed for the kbuild Makefile to find the
archspecific Makefile.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16944
2025-01-13 16:51:37 -08:00
Rob Norris
2aa3fbe761
zinject: count matches and injections for each handler
When building tests with zinject, it can be quite difficult to work out
if you're producing the right kind of IO to match the rules you've set
up.

So, here we extend injection records to count the number of times a
handler matched the operation, and how often an error was actually
injected (ie after frequency and other exclusions are applied).

Then, display those counts in the `zinject` output.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16938
2025-01-13 08:33:31 -05:00
Alexander Motin
fae4c664a4
FreeBSD: Use ashift in vdev_check_boot_reserve()
We should not hardcode 512-byte read size when checking for loader
in the boot area before RAIDZ expansion.  Disk might be unable to
handle that I/O as is, and the code zio_vdev_io_start() handling
the padding asserts doing it only for top-level vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16942
2025-01-11 04:26:42 -05:00
Rob Norris
b8e09c7007
ZTS: remove empty zpool_add--allow-ashift-mismatch test
Added in b1e46f869, but empty, so no point keeping it around.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16931
2025-01-07 15:43:01 -08:00
n0-1
18c67d2418
Support for cross-compiling kernel modules
In order to correctly cross-compile, one has to pass ARCH and
CROSS_COMPILE make flags to kernel module build calls. Facilitate this
in the same way as for custom CC flag by recognizing KERNEL_-prefixed
configure environment variables of same name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Closes #16924
2025-01-05 17:27:19 -08:00
Robert Evans
3a445f2ef5
Remove duplicate dedup_legacy_create in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16926
2025-01-05 17:25:22 -08:00
Richard Kojedzinszky
dc0324bfa9
fix: make zfs_strerror really thread-safe and portable
#15793 wanted to make zfs_strerror threadsafe, unfortunately, it
turned out that strerror_l() usage was wrong, and also, some libc 
implementations dont have strerror_l().

zfs_strerror() now simply calls original strerror() and copies the 
result to a thread-local buffer, then returns that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
Closes #16640
Closes #16923
2025-01-04 10:33:27 -08:00
Don Brady
939e0237c5
Too many vdev probe errors should suspend pool
Similar to what we saw in #16569, we need to consider that a
replacing vdev should not be considered as fully contributing
to the redundancy of a raidz vdev even though current IO has
enough redundancy.

When a failed vdev_probe() is faulting a disk, it now checks
if that disk is required, and if so it suspends the pool until
the admin can return the missing disks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16864
2025-01-04 10:28:33 -08:00
Robert Evans
50cbb14641
Add Makefile dependencies for scripts/zfs-tests.sh -c
This updates the Makefile to be more correct for parallel make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16030
Closes #16922
2025-01-03 19:04:01 -08:00
Toomas Soome
ee3bde9dad
ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 14:48:30 -08:00
Rob Norris
c02e1cf055
vdev_open: clear async remove flag after reopen
It's possible for a vdev to be flagged for async remove after the pool
has suspended. If the removed device has been returned when the pool is
resumed, the ASYNC_REMOVE task will still run at the end of txg, and
remove the device from the pool again.

To fix, we clear the async remove flag at reopen, just as we did for the
async fault flag in 5de3ac223.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16921
2025-01-03 14:42:06 -08:00
Toomas Soome
e94549d868
ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 14:41:03 -08:00
pstef
478b09577a
zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD
This works around
/usr/lib/go-1.18/pkg/tool/linux_amd64/link:
mapping output file failed: invalid argument

It's happened to me under a Linux jail, but it's also happened to other
people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: pstef <pstef@users.noreply.github.com>
Closes #16918
2025-01-03 09:03:14 -08:00
Toomas Soome
d35f9f2e84
ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 15:57:24 -08:00
Toomas Soome
d6b4110d71
ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 15:53:53 -08:00
Toomas Soome
8dc15ef4b3
ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 13:29:12 -08:00
James Reilly
3c2267a873
ZTS: add centos stream10 (#16904)
Added centos as optional runners via workflow_dispatch

removed centos-stream9 from the FULL_OS runner list as CentOS is not
officially support by ZFS. This commit will add preliminary support for
EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6
months

Signed-off-by: James Reilly <jreilly1821@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-01-02 09:28:56 -08:00
Andrew Walker
25238baad5
Add missing zfs_exit() when snapdir is disabled (#16912)
zfs_vget doesn't zfs_exit when erroring out due to snapdir
being disabled.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: @bmeagherix
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-12-30 17:06:48 -08:00
shodanshok
54126fdb5b
set zfs_arc_shrinker_limit to 0 by default
zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to
aggressive kernel reclaim. While useful, the current default (10000) is
too prone to OOM especially when MGLRU-enabled kernels with default
min_ttl_ms are used. Even when no OOM happens, it often causes too much
swap usage.

This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim
requests. ARC now plays better with both kernel shrinker and pagecache
but, should ARC collapse happen again, MGLRU behavior can be tuned or
even disabled.

Anyway, zfs should not cause OOM when ARC can be released.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16909
2024-12-29 11:50:19 -08:00
Ameer Hamza
9dd5fe1095
zvol: implement platform-independent part of block cloning
In Linux, block devices currently lack support for `copy_file_range`
API because the kernel does not provide the necessary functionality.
However, there is an ongoing upstream effort to address this
limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/.
We have adopted this upstream kernel patch into the TrueNAS kernel and
made some additional modifications to enable block cloning specifically
for the zvol block device. This patch implements the platform-
independent portions of these changes for inclusion in OpenZFS.
This patch does not introduce any new functionality directly into
OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when
zvols are migrated to non-TrueNAS systems that support Clone Range
replay in the ZIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16901
2024-12-29 11:41:30 -08:00
Alexander Motin
a153397f41 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:19:32 -08:00
Alexander Motin
b66d910113 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:19:25 -08:00
Alexander Motin
8bf1e83eef ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:17:27 -08:00
Rob Norris
c4e5fa5e17 ZTS: test clearing pool and vdev userprops
Confirming that clearing pool and vdev userprops produce the same
result: an empty value, with default source.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:12:16 -08:00
Rob Norris
03b7cfdef3 spa_sync_props: remove pool userprops by setting empty-string
People have noted there's no way to remove a pool userprop, only zero
it. Turns vdev userprops had a method, by setting empty-string. So this
makes pool userprops follow the same behaviour.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:12:04 -08:00
Rob Norris
779c5a5deb zpool_get_vdev_prop_value: show missing vdev userprops
If a vdev userprop is not found, present it as value '-', default
source, so it matches the output from pool userprops.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:11:40 -08:00
Alexander Motin
89f796dec6
ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-27 10:01:22 -05:00
Rob Norris
c37a2ddaaa
microzap: set hard upper limit of 1M
The count of chunks in a microzap block is stored as an uint16_t
(mze_chunkid). Each chunk is 64 bytes, and the first is used to store a
header, so there are 32767 usable chunks, which is just under 2M. 1M is
the largest power-2-rounded block size under 2M, so we must set the
limit there.

If it goes higher, the loop in mzap_addent can overflow and fall into
the PANIC case.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16888
2024-12-26 17:10:09 -05:00
Alexander Motin
1acd246964
Fix readonly check for vdev user properties
VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real
property.  That's why vdev_prop_readonly() does not work right for
it.  In particular it may declare all vdev user properties readonly
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16890
2024-12-20 17:25:35 -05:00
Umer Saleem
219a89cbbf
Skip iterating over snapshots for share properties
Setting sharenfs and sharesmb properties on a dataset can become costly
if there are large number of snapshots, since setting the share
properties iterates over all snapshots present for a dataset. If it is
the root dataset for which we are trying to set the share property,
snapshots for all child datasets and their children will also be
iterated.

There is no need to iterate over snapshots for share properties
because we do not allow share properties or any other property,
to be set on a snapshot itself execpt for user properties.

This commit skips iterating over snapshots for share properties,
instead iterate over all child dataset and their children for share
properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16877
2024-12-19 15:02:58 -05:00
Rob Norris
f00a57a786
zfs_main: fix alignment on props usage output
I guess we've got some long property names since this was first set up!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16883
2024-12-19 11:04:56 -05:00
Tino Reichardt
e5ac7786bd
CI: Fix FreeBSD 13.4 STABLE build
In #16869 we added FreeBSD 13.4 STABLE, but forget the special
thing, that the virtio nic within FreeBSD 13.x is buggy.

This fix adds the needed rtl8139 nic to the VM.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16885
2024-12-19 11:01:34 -05:00
Rob Norris
ab7cbbe789
zprop: fix value help for ZPOOL_PROP_CAPACITY
It's a percentage and documented as such, but we were showing it as
<size>.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16881
2024-12-18 15:25:12 -08:00
Brian Behlendorf
830a531249
CI: Add FreeBSD 14.2 RELEASE+STABLE builds
Update the CI to include FreeBSD 14.2 as a regularly tested platform.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16869
2024-12-17 08:58:33 -08:00
Brian Atkinson
882a809983 Use pin_user_pages API for Direct I/O requests
As of kernel v5.8, pin_user_pages* interfaced were introduced. These
interfaces use the FOLL_PIN flag. This is preferred interface now for
Direct I/O requests in the kernel. The reasoning for using this new
interface for Direct I/O requests is explained in the kernel
documenetation:
Documentation/core-api/pin_user_pages.rst

If pin_user_pages_unlocked is available, the all Direct I/O requests
will use this new API to stay uptodate with the kernel API requirements.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:24:30 -08:00
Brian Atkinson
c6442bd3b6 Removing old code outside of 4.18 kernsls
There were checks still in place to verify we could completely use
iov_iter's on the Linux side. All interfaces are available as of kernel
4.18, so there is no reason to check whether we should use that
interface at this point. This PR completely removes the UIO_USERSPACE
type. It also removes the check for the direct_IO interface checks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:23:45 -08:00
Shengqi Chen
acda137d8c
simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel
CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined
on armel. Add a guard to avoid compilation errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16871
2024-12-16 09:40:41 -08:00
Brian Behlendorf
22259fb24d
Fix stray "no" in configure output
This is purely a cosmetic fix which removes a stray "no" from
the configure output.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16867
2024-12-14 14:05:12 -08:00
Alexander Motin
ff6266ee9b
Fix use-afer-free regression in RAIDZ expansion
We should not dereference rra after the last zio_nowait() is called.
It seems very unlikely, but ASAN in ztest managed to catch it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16868
2024-12-14 14:02:11 -08:00
kotauskas
586304ac44
Remount datasets on soft-reboot
The one-shot zfs-mount.service is incorrectly deemed active by 
Systemd after a systemctl soft-reboot. As such, soft-rebooting
prevents zfs mount -a from being ran automatically.

This commit makes it so that zfs-mount.service is marked as being 
undone by the time umount.target is reached, so that zfs.target then 
pulls it in again and gets it restarted after a soft reboot.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: kotauskas <v.toncharov@gmail.com>
Closes #16845
2024-12-13 13:50:50 -08:00
Rob Norris
46e06feded flush: only detect lack of flush support in one place
It seems there's no good reason for vdev_disk & vdev_geom to explicitly
detect no support for flush and set vdev_nowritecache.  Instead, just
signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess()
take care of it in one place.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-13 12:19:54 -08:00
Rob Norris
fbea92432a flush: don't report flush error when disabling flush support
The first time a device returns ENOTSUP in repsonse to a flush request,
we set vdev_nowritecache so we don't issue flushes in the future and
instead just pretend the succeeded. However, we still return an error
for the initial flush, even though we just decided such errors are
meaningless!

So, when setting vdev_nowritecache in response to a flush error, also
reset the error code to assume success.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-13 12:19:20 -08:00
Poscat
76f57ab9f7
build: use correct bashcompletiondir on arch
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: poscat <poscat@poscat.moe>
Closes #16861
2024-12-13 12:06:40 -08:00
Rob Norris
ecc0970e3e
backtrace: fix off-by-one on string output
sizeof("foo") includes the trailing null byte, so all the output had
nulls through it. Most terminals quietly ignore it, but it makes some
tools misdetect file types and other annoyances.

Easy fix: subtract 1.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16862
2024-12-13 10:12:14 -08:00
Chunwei Chen
6c9b4f18d3
Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:18:45 -08:00
Alexander Motin
19a04e5ad2
BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 10:22:41 -08:00
Rob Norris
e0039c7057 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 08:53:57 -08:00
Rob Norris
0de8ae56f7 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 08:53:53 -08:00
Rob Norris
ba00a6f9a3 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 08:53:36 -08:00
Rob Norris
903895ea5f cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 08:53:12 -08:00
Alexander Motin
a44eaf1690
Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 08:50:16 -08:00
Alexander Motin
e8b333e4d3
Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:48:08 -08:00
Don Brady
44446dccdb
During pool export flush the ARC asynchronously
This also includes removing L2 vdevs asynchronously.

This commit also guarantees that spa_load_guid is unique.

The zpool reguid feature introduced the spa_load_guid, which is a
transient value used for runtime identification purposes in the ARC.
This value is not the same as the spa's persistent pool guid.

However, the value is seeded from spa_generate_load_guid() which
does not check for uniqueness against the spa_load_guid from other
pools.  Although extremely rare, you can end up with two different
pools sharing the same spa_load_guid value! So we guarantee that
the value is always unique and additionally not still in use by an
async arc flush task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16215
2024-12-05 08:58:20 -08:00
Rob Norris
2507db612d
zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-04 19:03:20 -05:00
Alexander Motin
a01504b35c
Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-04 15:19:05 -08:00
Alexander Motin
c33a55b0c2
Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-04 15:15:58 -08:00
Mark Johnston
0e020bf3e1
FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-04 14:24:50 -05:00
Mariusz Zaborski
4b4e346b9f
Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-04 14:21:45 -05:00
shodanshok
1cd2419ece
Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-04 05:36:10 -05:00
Alexander Motin
654ade8ca2 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-03 09:32:54 -08:00
Alexander Motin
ae00c807dc FreeBSD: Return ifndef IN_BASE back to fix the build
FreeBSD's libprocstat seems to build kernel code in user space,
which does not work here due to undefined vnode_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-03 09:32:07 -08:00
Rob Norris
027b3e06ed
zinject(8): rename "ioctl" to "flush"
Doc bug missed in d7605ae77.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16827
2024-12-02 17:21:55 -08:00
Alexander Motin
6e3c109bc0
Fix regression in dmu_buf_will_fill()
Direct I/O implementation added condition to call dbuf_undirty()
only in case of block cloning.  But the condition is not right if
the block is no longer dirty in this TXG, but still in DB_NOFILL
state.  It resulted in block not reverting to DB_UNCACHED and
following NULL de-reference on attempt to access absent db_data.

While there, add assertions for db_data to make debugging easier.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16829
2024-12-02 17:08:40 -08:00
Alexander Motin
3e9ba0f223
Add missing parenthesis in VERIFYF()
Without them the order of operations might get unexpected.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16826
2024-12-02 16:55:51 -08:00
Pavel Snajdr
c8a326aab7
Linux: fix zfs_uio_dio_check_for_zero_page
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16812 
Closes #16689
Closes #16642
2024-12-02 16:25:35 -08:00
Ivan Volosyuk
f29dcc25c7
Linux: Fix detection of register_sysctl_sz
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620
Closes: #16805
2024-11-29 22:50:37 -05:00
Rob Norris
0ffa6f3464
zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-11-26 19:28:19 -05:00
Pavel Snajdr
d2b0ca953f
Assert if we're logging after final txg was set
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
2024-11-25 18:37:56 -05:00
Alexander Motin
d0a91b9f88
FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared.  FreeBSD was doing
it also, but then was changed to exclusive, partially because KPI
of that time was doing so, and partially seems out of caution.
Considering zfs_clone_range() uses range locks on both source and
destination, neither should require exclusive vnode locks. But one
step at a time, just sync it with Linux for now.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16789
Closes #16797
2024-11-23 14:29:03 -08:00
Alexander Motin
b3b0ce64d5
FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux.  It required
locking it in vn_flush_cached_data(), which recursed on the lock
if called from zfs_clone_range(), having the vnode locked.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16789
Closes #16796
2024-11-23 14:26:52 -08:00
Pavel Snajdr
38c0324c0f
Linux: Fix zfs_prune panics
by protecting against sb->s_shrink eviction on umount with newer kernels

deactivate_locked_super calls shrinker_free and only then
sops->kill_sb cb, resulting in UAF on umount when trying
to reach for the shrinker functions in zpl_prune_sb of
in-umount dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16770
2024-11-21 15:30:43 -08:00
Alexander Motin
ae1d11882d
BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not.  Should not matter in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16791
2024-11-21 08:20:22 -08:00
Brian Behlendorf
225e76cd7d
Linux 6.12 compat: META
Update the META file to reflect compatibility with the 6.12 kernel.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16793
2024-11-21 07:56:30 -08:00
Alexander Motin
9a81484e35
ZAP: Reduce leaf array and free chunks fragmentation
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order.  Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order.  Together this created a mess in the free list, making
following allocations much more fragmented than necessary.

This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().

With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively.  Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16766
2024-11-20 13:37:52 -08:00
tleydxdy
d02257c280
fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes #16139
2024-11-20 13:19:07 -08:00
Mark Johnston
d76d79fd27
zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16785
2024-11-20 08:23:08 -08:00
Alexander Motin
457f8b76e7
BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16773
2024-11-20 06:20:14 -08:00
Brian Behlendorf
49a377aa30
ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-20 06:13:32 -08:00
Alexander Motin
0ca82c5680
L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782
2024-11-20 06:11:51 -08:00
Alexander Motin
534688948c
Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16759
2024-11-19 07:00:16 -08:00
Rob Norris
ffe2112795
Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16776
2024-11-19 06:55:27 -08:00
Steve Mokris
e08e832b10
Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #16777
2024-11-19 06:52:04 -08:00
Alexander Motin
0d6306be8c
Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780
2024-11-19 06:48:06 -08:00
Ameer Hamza
ff3df1211c
zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16751
2024-11-15 15:08:16 -08:00
Alexander Motin
fd6e8c1d2a BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-15 15:04:11 -08:00
Alexander Motin
309ce6303f ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-15 15:04:02 -08:00
Alexander Motin
1ee251bdde BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-15 15:03:57 -08:00
Alexander Motin
483087b06f ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-15 15:02:44 -08:00
Rob Norris
648873f020
AUTHORS: refresh with recent new contributors
Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16762
2024-11-15 15:00:06 -08:00
José Luis Salvador Rufo
de2e9a5c6b
tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:12 -08:00
Ameer Hamza
3462f3bd50
zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16750
2024-11-14 14:14:33 -08:00
Mark Johnston
8dc452d907
Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16758
2024-11-14 14:12:57 -08:00
Rob Norris
46c4f2ce0b
dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #6783
Closes #16708
Closes #16722 
Closes #16697
2024-11-13 07:38:42 -08:00
Alexander Motin
a60ed3822b
L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
Closes #16743
2024-11-13 07:31:50 -08:00
Mark Johnston
178682506f Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-13 07:25:39 -08:00
Mark Johnston
25eb538778 Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-13 07:24:14 -08:00
Umer Saleem
1c9a4c8cb4 Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-11 09:46:45 -08:00
Umer Saleem
3a0a142f1c JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-11 09:46:19 -08:00
Umer Saleem
57fc5971f6
JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:31:14 -08:00
Sam James
4a7a0a0290
Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:20:37 -08:00
Brian Atkinson
187f931372
Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:11:05 -08:00
Chunwei Chen
5945676bcc
ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:52:01 -08:00
tstabrawa
7b6e9675da Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:44:10 -08:00
tstabrawa
f38e2d239f Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:43:05 -08:00
наб
60c202cca4 module: unicode: remove unused tolower transformations
With the previous patch this yields
  $ size -G ./module/zfs.ko ./module/zfs.new.ko
        text       data        bss      total filename
     2865126    1597982     755768    5218876 ./module/zfs.ko
     2864038    1429784     755768    5049590 ./module/zfs.new.ko
       -1088    -168198
         -1k      -164k

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #16704
2024-11-04 17:26:35 -08:00
наб
e17a9698fa module: unicode: #ifdef out unused U8_UNICODE_320
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #16704
2024-11-04 17:26:17 -08:00
наб
5e726779f5 module: unicode: u8_textprep_data.h tables: 2 => U8_UNICODE_LATEST+1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #16704
2024-11-04 17:25:37 -08:00
Alexander Motin
5a2333b10f
CI: Automate some GitHub PR status labels manipulations
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
 - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.

I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels.  It is better to have no
labels than incorrect ones saying there is nothing to look at.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16721
2024-11-04 17:16:32 -08:00
Uglymotha
c8aed9f973
Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:44:38 -08:00
Alexander Motin
b16e096198
Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:42:06 -08:00
Rob Norris
91bd12dfeb
zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-01 14:43:25 -07:00
наб
1c7d4b4c94
module: unicode: remove unused uconv.c
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #16702
2024-11-01 12:12:13 -07:00
Tony Hutter
db2354534d
ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:02:03 -07:00
Rob Norris
673efbbf5d
zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 12:08:33 -04:00
Rich Ercolani
acb6e71eda
Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-10-31 22:32:31 -04:00
Rob Norris
3c650bec15 Revert "Workaround issue of Linux vdev_disk.c, (#16678)"
Now that we can handle these different alignments, we don't this
workaround.

This reverts commit aefc2da8a5.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-10-31 17:00:53 -07:00
Rob Norris
e7425ae624 vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-10-31 17:00:53 -07:00
Rob Norris
63bafe60ec vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-10-31 17:00:42 -07:00
Serapheim Dimitropoulos
ae93aeb849
Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-10-30 20:11:40 -04:00
Alexander Motin
6187b19434
On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-10-29 15:23:24 -04:00
Dimitry Andric
2bf1520211
Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-10-29 15:05:02 -04:00
Dimitry Andric
c480e06d88
Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-10-29 14:49:54 -04:00
Tony Hutter
e5d1f68167
ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-10-25 09:07:44 -07:00
Alexander Motin
94a03dd1e4
Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-10-25 09:03:37 -07:00
Tino Reichardt
2e4e092822
Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-10-24 14:05:45 -07:00
Alexander Motin
aefc2da8a5
Workaround issue of Linux vdev_disk.c, (#16678)
in some cases not linearizing buffers with disk sector crossing a
page boundary. It is fine for hardware, but somehow required by LUKS.
It is not typical for ZFS to produce such buffers, but it may happen
if 6KB block is compressed to 4KB, while still having 2KB alignment.
Banning the 6KB buffers helps vdevs with ashifh=12.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-10-23 10:19:46 -07:00
Brian Behlendorf
152ae5c9bc
ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 11:46:20 -07:00
Rob Norris
96f382d113
spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- 1aad641c79
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 11:38:30 -07:00
Rob Norris
21cba06bef
config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-20 19:50:13 -07:00
Rob Norris
b2f6de7b58
zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-20 10:01:49 -07:00
Tomohiro Kusumi
a9851ea3dd
Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-20 09:43:16 -07:00
Alexander Motin
fba6a90696
zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-20 09:39:05 -07:00
Rob Norris
b85c564161 libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-20 09:36:02 -07:00
Rob Norris
2596a75306 libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-20 09:36:00 -07:00
Rob Norris
c7e47b3d9a libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-20 09:35:55 -07:00
Rob Norris
0a001f3088 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-20 09:35:43 -07:00
Umer Saleem
27e8f56102
Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-17 09:09:39 -04:00
Warner Losh
38a04f0a7c
freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-16 13:00:40 -04:00
Tino Reichardt
e0bf43d64e ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:20:49 -07:00
Tino Reichardt
e7b64159f8 ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:19:34 -07:00
Brian Behlendorf
c642e985e5
Revert "Temporarily disable Direct IO by default"
This partially reverts commit 41210597.  Now that b4e4cbeb2 has
been merged Direct IO can be enabled by default for Linux, but
for FreeBSD there still remains a potentially insufficient range
locking in zfs_getpages() which needs to be resolved.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16629
2024-10-12 13:51:35 -07:00
Brian Behlendorf
48dfe39747
Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-12 13:48:56 -07:00
Brian Behlendorf
97ba7c210c
ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:12:16 -07:00
Brian Behlendorf
9f3f80c0cc
ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 14:22:24 -07:00
Brian Behlendorf
34efa8e2d8
CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 14:16:00 -07:00
Martin Matuška
7e4be92750
zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 09:55:17 -07:00
Rob Norris
7bf525530a
zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 09:37:57 -07:00
Brian Atkinson
b4e4cbeb20
Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 12:28:08 -07:00
Martin Matuška
efeb60b86a
FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 09:27:46 -07:00
Brian Behlendorf
4319e71402
ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-08 20:41:17 -07:00
Matthew Heller
cefef28e98
vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-08 17:43:04 -07:00
Brian Behlendorf
75dda92dc3
ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-08 09:35:13 -07:00
Umer Saleem
65a94ffa80
Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-08 09:27:38 -07:00
Rob Norris
ca0141f325
zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-07 13:09:08 -07:00
Martin Matuška
ab777f436c
Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-07 10:31:46 -07:00
Shengqi Chen
e8f0aa143e Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-06 14:49:33 -07:00
Shengqi Chen
c59d5495fe contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-06 14:49:06 -07:00
JKDingwall
0b4dcbe5b4
Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-06 14:36:33 -07:00
Tino Reichardt
995a3a61fd
ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-06 14:32:08 -07:00
Tino Reichardt
87ca6ba9a8
ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-06 14:29:20 -07:00
Brian Behlendorf
437227a9cc Update META
Increase the version to 2.3.99 to indicate the master branch is
newer than the 2.3.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-04 14:20:10 -07:00
3791 changed files with 114344 additions and 39047 deletions

View File

@ -1,21 +0,0 @@
env:
CIRRUS_CLONE_DEPTH: 1
ARCH: amd64
build_task:
matrix:
freebsd_instance:
image_family: freebsd-12-4
freebsd_instance:
image_family: freebsd-13-2
freebsd_instance:
image_family: freebsd-14-0-snap
prepare_script:
- pkg install -y autoconf automake libtool gettext-runtime gmake ksh93 py39-packaging py39-cffi py39-sysctl
configure_script:
- env MAKE=gmake ./autogen.sh
- env MAKE=gmake ./configure --with-config="user" --with-python=3.9
build_script:
- gmake -j `sysctl -n kern.smp.cpus`
install_script:
- gmake install

View File

@ -14,7 +14,7 @@ Please check our issue tracker before opening a new feature request.
Filling out the following template will help other contributors better understand your proposed feature.
-->
### Describe the feature would like to see added to OpenZFS
### Describe the feature you would like to see added to OpenZFS
<!--
Provide a clear and concise description of the feature.

View File

@ -2,11 +2,6 @@
<!--- Provide a general summary of your changes in the Title above -->
<!---
Documentation on ZFS Buildbot options can be found at
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html
-->
### Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
@ -27,6 +22,7 @@ https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Quality assurance (non-breaking change which makes the code more robust against bugs)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)

View File

@ -2,3 +2,4 @@ name: "Custom CodeQL Analysis"
queries:
- uses: ./.github/codeql/custom-queries/cpp/deprecatedFunctionUsage.ql
- uses: ./.github/codeql/custom-queries/cpp/dslDatasetHoldReleMismatch.ql

View File

@ -0,0 +1,34 @@
/**
* @name Detect mismatched dsl_dataset_hold/_rele pairs
* @description Flags instances of issue #12014 where
* - a dataset held with dsl_dataset_hold_obj() ends up in dsl_dataset_rele_flags(), or
* - a dataset held with dsl_dataset_hold_obj_flags() ends up in dsl_dataset_rele().
* @kind problem
* @severity error
* @tags correctness
* @id cpp/dslDatasetHoldReleMismatch
*/
import cpp
from Variable ds, Call holdCall, Call releCall, string message
where
ds.getType().toString() = "dsl_dataset_t *" and
holdCall.getASuccessor*() = releCall and
(
(holdCall.getTarget().getName() = "dsl_dataset_hold_obj_flags" and
holdCall.getArgument(4).(AddressOfExpr).getOperand().(VariableAccess).getTarget() = ds and
releCall.getTarget().getName() = "dsl_dataset_rele" and
releCall.getArgument(0).(VariableAccess).getTarget() = ds and
message = "Held with dsl_dataset_hold_obj_flags but released with dsl_dataset_rele")
or
(holdCall.getTarget().getName() = "dsl_dataset_hold_obj" and
holdCall.getArgument(3).(AddressOfExpr).getOperand().(VariableAccess).getTarget() = ds and
releCall.getTarget().getName() = "dsl_dataset_rele_flags" and
releCall.getArgument(0).(VariableAccess).getTarget() = ds and
message = "Held with dsl_dataset_hold_obj but released with dsl_dataset_rele_flags")
)
select releCall,
"Mismatched release: held with $@ but released with " + releCall.getTarget().getName() + " for dataset $@",
holdCall, holdCall.getTarget().getName(),
ds, ds.toString()

View File

@ -19,7 +19,7 @@ jobs:
run: |
# for x in lxd core20 snapd; do sudo snap remove $x; done
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps.sh ubuntu22
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps-vm.sh ubuntu22
sudo apt-get install -y cppcheck devscripts mandoc pax-utils shellcheck
sudo python -m pipx install --quiet flake8
# confirm that the tools are installed

View File

@ -11,7 +11,7 @@ concurrency:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
permissions:
actions: read
contents: read

49
.github/workflows/labels.yml vendored Normal file
View File

@ -0,0 +1,49 @@
name: labels
on:
pull_request_target:
types: [ opened, synchronize, reopened, converted_to_draft, ready_for_review ]
permissions:
pull-requests: write
jobs:
open:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'opened' && github.event.pull_request.draft }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --add-label "Status: Work in Progress"
push:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'synchronize' || github.event.action == 'reopened' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Inactive,Status: Revision Needed,Status: Stale"
draft:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'converted_to_draft' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Code Review Needed,Status: Inactive,Status: Revision Needed,Status: Stale" --add-label "Status: Work in Progress"
rfr:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'ready_for_review' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Inactive,Status: Revision Needed,Status: Stale,Status: Work in Progress" --add-label "Status: Code Review Needed"

View File

@ -3,13 +3,16 @@
"""
Determine the CI type based on the change list and commit message.
Prints "quick" if (explicity required by user):
- the *last* commit message contains 'ZFS-CI-Type: quick'
or if (heuristics):
- the files changed are not in the list of specified directories, and
- all commit messages do not contain 'ZFS-CI-Type: full'
Output format: "<type> <source>" where source is "manual" (from
ZFS-CI-Type commit tag) or "auto" (from file change heuristics).
Otherwise prints "full".
Prints "quick manual" if:
- the *last* commit message contains 'ZFS-CI-Type: quick'
or "quick auto" if (heuristics):
- the files changed are not in the list of specified directories, and
- all commit messages do not contain 'ZFS-CI-Type: (full|linux|freebsd)'
Otherwise prints "full auto" (or "<type> manual" if explicitly requested).
"""
import sys
@ -29,6 +32,7 @@ FULL_RUN_IGNORE_REGEX = list(map(re.compile, [
Patterns of files that are considered to trigger full CI.
"""
FULL_RUN_REGEX = list(map(re.compile, [
r'\.github/workflows/scripts/.*',
r'cmd.*',
r'configs/.*',
r'META',
@ -57,19 +61,21 @@ if __name__ == '__main__':
head, base = sys.argv[1:3]
def output_type(type, reason):
print(f'{prog}: will run {type} CI: {reason}', file=sys.stderr)
print(type)
def output_type(type, source, reason):
print(f'{prog}: will run {type} CI ({source}): {reason}',
file=sys.stderr)
print(f'{type} {source}')
sys.exit(0)
# check last (HEAD) commit message
last_commit_message_raw = subprocess.run([
'git', 'show', '-s', '--format=%B', 'HEAD'
'git', 'show', '-s', '--format=%B', head
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in last_commit_message_raw.stdout.decode().splitlines():
if line.strip().lower() == 'zfs-ci-type: quick':
output_type('quick', f'explicitly requested by HEAD commit {head}')
output_type('quick', 'manual',
f'requested by HEAD commit {head}')
# check all commit messages
all_commit_message_raw = subprocess.run([
@ -82,8 +88,15 @@ if __name__ == '__main__':
for line in all_commit_message:
if line.startswith('ZFS-CI-Commit:'):
commit_ref = line.lstrip('ZFS-CI-Commit:').rstrip()
if line.strip().lower() == 'zfs-ci-type: freebsd':
output_type('freebsd', 'manual',
f'requested by commit {commit_ref}')
if line.strip().lower() == 'zfs-ci-type: linux':
output_type('linux', 'manual',
f'requested by commit {commit_ref}')
if line.strip().lower() == 'zfs-ci-type: full':
output_type('full', f'explicitly requested by commit {commit_ref}')
output_type('full', 'manual',
f'requested by commit {commit_ref}')
# check changed files
changed_files_raw = subprocess.run([
@ -99,9 +112,10 @@ if __name__ == '__main__':
for r in FULL_RUN_REGEX:
if r.match(f):
output_type(
'full',
'full', 'auto',
f'changed file "{f}" matches pattern "{r.pattern}"'
)
# catch-all
output_type('quick', 'no changed file matches full CI patterns')
output_type('quick', 'auto',
'no changed file matches full CI patterns')

View File

@ -6,73 +6,117 @@
set -eu
# The default 'azure.archive.ubuntu.com' mirrors can be really slow.
# Prioritize the official Ubuntu mirrors.
#
# The normal apt-mirrors.txt will look like:
#
# http://azure.archive.ubuntu.com/ubuntu/ priority:1
# https://archive.ubuntu.com/ubuntu/ priority:2
# https://security.ubuntu.com/ubuntu/ priority:3
#
# Just delete the 'azure.archive.ubuntu.com' line.
sudo sed -i '/azure.archive.ubuntu.com/d' /etc/apt/apt-mirrors.txt
echo "Using mirrors:"
cat /etc/apt/apt-mirrors.txt
# install needed packages
export DEBIAN_FRONTEND="noninteractive"
sudo apt-get -y update
sudo apt-get install -y axel cloud-image-utils daemonize guestfs-tools \
ksmtuned virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
# generate ssh keys
rm -f ~/.ssh/id_ed25519
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=10
KSM_NPAGES_BOOST=300
KSM_NPAGES_DECAY=-50
KSM_NPAGES_MIN=64
KSM_NPAGES_MAX=2048
KSM_THRES_COEF=25
KSM_THRES_CONST=2048
LOGFILE=/var/log/ksmtuned.log
DEBUG=1
EOF
sudo systemctl restart ksm
sudo systemctl restart ksmtuned
# not needed
sudo systemctl stop docker.socket
sudo systemctl stop multipathd.socket
# remove default swapfile and /mnt
sudo swapoff -a
sudo umount -l /mnt
DISK="/dev/disk/cloud/azure_resource-part1"
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
sudo wipefs -aq $DISK
sudo systemctl daemon-reload
# Special case:
#
# For reasons unknown, the runner can boot-up with two different block device
# configurations. On one config you get two 75GB block devices, and on the
# other you get a single 150GB block device. Here's what both look like:
#
# --- Two 75GB block devices ---
# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
# sda 8:0 0 150G 0 disk
# ├─sda1 8:1 0 149G 0 part /
# ├─sda14 8:14 0 4M 0 part
# ├─sda15 8:15 0 106M 0 part /boot/efi
# └─sda16 259:0 0 913M 0 part /boot
#
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_root -> ../../sda
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_root-part1 -> ../../sda1
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part14 -> ../../sda14
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part15 -> ../../sda15
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part16 -> ../../sda16
#
# --- One 150GB block device ---
# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
# sda 8:0 0 75G 0 disk
# ├─sda1 8:1 0 74G 0 part /
# ├─sda14 8:14 0 4M 0 part
# ├─sda15 8:15 0 106M 0 part /boot/efi
# └─sda16 259:0 0 913M 0 part /boot
# sdb 8:16 0 75G 0 disk
# └─sdb1 8:17 0 75G 0 part
#
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_resource -> ../../sdb
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_resource-part1 -> ../../sdb1
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_root -> ../../sda
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_root-part1 -> ../../sda1
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part14 -> ../../sda14
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part15 -> ../../sda15
#
# If we have the azure_resource-part1 partition, umount it, partition it, and
# use it as our ZFS disk and swap partition. If not, just create a file VDEV
# and swap file and use that instead.
# remove default swapfile and /mnt
if [ -e /dev/disk/cloud/azure_resource-part1 ] ; then
sudo umount -l /mnt
DISK="/dev/disk/cloud/azure_resource-part1"
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
sudo wipefs -aq $DISK
sudo systemctl daemon-reload
fi
sudo modprobe loop
sudo modprobe zfs
# partition the disk as needed
DISK="/dev/disk/cloud/azure_resource"
sudo sgdisk --zap-all $DISK
sudo sgdisk -p \
if [ -e /dev/disk/cloud/azure_resource-part1 ] ; then
echo "We have two 75GB block devices"
# partition the disk as needed
DISK="/dev/disk/cloud/azure_resource"
sudo sgdisk --zap-all $DISK
sudo sgdisk -p \
-n 1:0:+16G -c 1:"swap" \
-n 2:0:0 -c 2:"tests" \
$DISK
sync
sleep 1
$DISK
sync
sleep 1
# swap with same size as RAM
sudo mkswap $DISK-part1
sudo swapon $DISK-part1
sudo fallocate -l 12G /test.ssd2
DISKS="$DISK-part2 /test.ssd2"
# 60GB data disk
SSD1="$DISK-part2"
SWAP=$DISK-part1
else
echo "We have a single 150GB block device"
sudo fallocate -l 72G /test.ssd2
SWAP=/swapfile.ssd
sudo fallocate -l 16G $SWAP
sudo chmod 600 $SWAP
DISKS="/test.ssd2"
fi
# 10GB data disk on ext4
sudo fallocate -l 10G /test.ssd1
SSD2=$(sudo losetup -b 4096 -f /test.ssd1 --show)
# swap with same size as RAM (16GiB)
sudo mkswap $SWAP
sudo swapon $SWAP
# adjust zfs module parameter and create pool
exec 1>/dev/null
@ -81,11 +125,11 @@ ARC_MAX=$((1024*1024*512))
echo $ARC_MIN | sudo tee /sys/module/zfs/parameters/zfs_arc_min
echo $ARC_MAX | sudo tee /sys/module/zfs/parameters/zfs_arc_max
echo 1 | sudo tee /sys/module/zfs/parameters/zvol_use_blk_mq
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 \
-O relatime=off -O atime=off -O xattr=sa -O compression=lz4 \
-O mountpoint=/mnt/tests
sudo zpool create -f -o ashift=12 zpool $DISKS -O relatime=off \
-O atime=off -O xattr=sa -O compression=lz4 -O sync=disabled \
-O redundant_metadata=none -O mountpoint=/mnt/tests
# no need for some scheduler
for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i > /dev/null
echo "none" | sudo tee $i
done

View File

@ -12,19 +12,23 @@ OS="$1"
# OS variant (virt-install --os-variant list)
OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-09-16"
URLzs=""
# FreeBSD urls's
FREEBSD_REL="https://download.freebsd.org/releases/CI-IMAGES"
FREEBSD_SNAP="https://download.freebsd.org/snapshots/CI-IMAGES"
URLxz=""
# Ubuntu mirrors
#UBMIRROR="https://cloud-images.ubuntu.com"
UBMIRROR="https://cloud-images.ubuntu.com"
#UBMIRROR="https://mirrors.cloud.tencent.com/ubuntu-cloud-images"
UBMIRROR="https://mirror.citrahost.com/ubuntu-cloud-images"
#UBMIRROR="https://mirror.citrahost.com/ubuntu-cloud-images"
# default nic model for vm's
NIC="virtio"
# additional options for virt-install
OPTS[0]=""
OPTS[1]=""
case "$OS" in
almalinux8)
OSNAME="AlmaLinux 8"
@ -34,16 +38,30 @@ case "$OS" in
OSNAME="AlmaLinux 9"
URL="https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2"
;;
almalinux10)
OSNAME="AlmaLinux 10"
OSv="almalinux9"
URL="https://repo.almalinux.org/almalinux/10/cloud/x86_64/images/AlmaLinux-10-GenericCloud-latest.x86_64.qcow2"
;;
alpine3-23)
OSNAME="Alpine Linux 3.23.2"
# Alpine Linux v3.22 and v3.23 are unknown to osinfo as of 2025-12-26.
OSv="alpinelinux3.21"
URL="https://dl-cdn.alpinelinux.org/alpine/v3.23/releases/cloud/generic_alpine-3.23.2-x86_64-bios-cloudinit-r0.qcow2"
;;
archlinux)
OSNAME="Archlinux"
URL="https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2"
# dns sometimes fails with that url :/
echo "89.187.191.12 geo.mirror.pkgbuild.com" | sudo tee /etc/hosts > /dev/null
;;
centos-stream9)
OSNAME="CentOS Stream 9"
URL="https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-latest.x86_64.qcow2"
;;
centos-stream10)
OSNAME="CentOS Stream 10"
OSv="centos-stream9"
URL="https://cloud.centos.org/centos/10-stream/x86_64/images/CentOS-Stream-GenericCloud-10-latest.x86_64.qcow2"
;;
debian11)
OSNAME="Debian 11"
URL="https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-amd64.qcow2"
@ -52,47 +70,67 @@ case "$OS" in
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
fedora39)
OSNAME="Fedora 39"
OSv="fedora39"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.qcow2"
debian13)
OSNAME="Debian 13"
# TODO: Overwrite OSv to debian13 for virt-install until it's added to osinfo
OSv="debian12"
URL="https://cloud.debian.org/images/cloud/trixie/latest/debian-13-generic-amd64.qcow2"
OPTS[0]="--boot"
OPTS[1]="uefi=on"
;;
fedora40)
OSNAME="Fedora 40"
OSv="fedora39"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
fedora42)
OSNAME="Fedora 42"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-42-1.1.x86_64.qcow2"
;;
freebsd13r)
OSNAME="FreeBSD 13.4-RELEASE"
fedora43)
OSNAME="Fedora 43"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/43/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-43-1.6.x86_64.qcow2"
;;
freebsd13-5r)
FreeBSD="13.5-RELEASE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_REL/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_REL/../amd64/$FreeBSD/src.txz"
NIC="rtl8139"
;;
freebsd13)
OSNAME="FreeBSD 13.4-STABLE"
freebsd14-4r)
FreeBSD="14.4-RELEASE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLxz="$FREEBSD_REL/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_REL/../amd64/$FreeBSD/src.txz"
;;
freebsd13-5s)
FreeBSD="13.5-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
NIC="rtl8139"
;;
freebsd14r)
OSNAME="FreeBSD 14.1-RELEASE"
freebsd14-4s)
FreeBSD="14.4-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
freebsd14)
OSNAME="FreeBSD 14.1-STABLE"
freebsd15-0s)
FreeBSD="15.0-STABLE"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
freebsd15)
OSNAME="FreeBSD 15.0-CURRENT"
freebsd16-0c)
FreeBSD="16.0-CURRENT"
OSNAME="FreeBSD $FreeBSD"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"
BASH="/usr/local/bin/bash"
URLxz="$FREEBSD_SNAP/$FreeBSD/amd64/Latest/FreeBSD-$FreeBSD-amd64-BASIC-CI-ufs.raw.xz"
KSRC="$FREEBSD_SNAP/../amd64/$FreeBSD/src.txz"
;;
tumbleweed)
OSNAME="openSUSE Tumbleweed"
@ -100,11 +138,6 @@ case "$OS" in
MIRROR="http://opensuse-mirror-gce-us.susecloud.net"
URL="$MIRROR/tumbleweed/appliances/openSUSE-MicroOS.x86_64-OpenStack-Cloud.qcow2"
;;
ubuntu20)
OSNAME="Ubuntu 20.04"
OSv="ubuntu20.04"
URL="$UBMIRROR/focal/current/focal-server-cloudimg-amd64.img"
;;
ubuntu22)
OSNAME="Ubuntu 22.04"
OSv="ubuntu22.04"
@ -128,7 +161,7 @@ echo "ENV=$ENV" >> $ENV
# result path
echo 'RESPATH="/var/tmp/test_results"' >> $ENV
# FreeBSD 13 has problems with: e1000+virtio
# FreeBSD 13 has problems with: e1000 and virtio
echo "NIC=$NIC" >> $ENV
# freebsd15 -> used in zfs-qemu.yml
@ -140,49 +173,84 @@ echo "OSv=\"$OSv\"" >> $ENV
# FreeBSD 15 (Current) -> used for summary
echo "OSNAME=\"$OSNAME\"" >> $ENV
# default vm count for testings
VMs=2
echo "VMs=\"$VMs\"" >> $ENV
# default cpu count for testing vm's
CPU=2
echo "CPU=\"$CPU\"" >> $ENV
sudo mkdir -p "/mnt/tests"
sudo chown -R $(whoami) /mnt/tests
DISK="/dev/zvol/zpool/openzfs"
sudo zfs create -ps -b 64k -V 80g zpool/openzfs
while true; do test -b $DISK && break; sleep 1; done
# we are downloading via axel, curl and wget are mostly slower and
# require more return value checking
IMG="/mnt/tests/cloudimg.qcow2"
if [ ! -z "$URLzs" ]; then
echo "Loading image $URLzs ..."
time axel -q -o "$IMG.zst" "$URLzs"
zstd -q -d --rm "$IMG.zst"
IMG="/mnt/tests/cloud-image"
if [ ! -z "$URLxz" ]; then
echo "Loading $URLxz ..."
time axel -q -o "$IMG" "$URLxz"
echo "Loading $KSRC ..."
time axel -q -o ~/src.txz $KSRC
else
echo "Loading image $URL ..."
echo "Loading $URL ..."
time axel -q -o "$IMG" "$URL"
fi
DISK="/dev/zvol/zpool/openzfs"
FORMAT="raw"
sudo zfs create -ps -b 64k -V 80g zpool/openzfs
while true; do test -b $DISK && break; sleep 1; done
echo "Importing VM image to zvol..."
sudo qemu-img dd -f qcow2 -O raw if=$IMG of=$DISK bs=4M
if [ ! -z "$URLxz" ]; then
xzcat -T0 $IMG | sudo dd of=$DISK bs=4M
else
sudo qemu-img dd -f qcow2 -O raw if=$IMG of=$DISK bs=4M
fi
rm -f $IMG
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
cat <<EOF > /tmp/user-data
if [ ${OS:0:7} != "freebsd" ]; then
cat <<EOF > /tmp/user-data
#cloud-config
fqdn: $OS
hostname: $OS
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
- name: root
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
- name: zfs
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- $PUBKEY
# Workaround for Alpine Linux.
lock_passwd: false
passwd: '*'
packages:
- sudo
- bash
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
EOF
else
cat <<EOF > /tmp/user-data
#cloud-config
hostname: $OS
# minimized config without sudo for nuageinit of FreeBSD
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
EOF
fi
sudo virsh net-update default add ip-dhcp-host \
"<host mac='52:54:00:83:79:00' ip='192.168.122.10'/>" --live --config
@ -198,8 +266,17 @@ sudo virt-install \
--graphics none \
--network bridge=virbr0,model=$NIC,mac='52:54:00:83:79:00' \
--cloud-init user-data=/tmp/user-data \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
--disk $DISK,bus=virtio,cache=none,format=raw,driver.discard=unmap \
--import --noautoconsole ${OPTS[0]} ${OPTS[1]} >/dev/null
# Give the VMs hostnames so we don't have to refer to them with
# hardcoded IP addresses.
#
# vm0: Initial VM we install dependencies and build ZFS on.
# vm1..2 Testing VMs
for ((i=0; i<=VMs; i++)); do
echo "192.168.122.1$i vm$i" | sudo tee -a /etc/hosts
done
# in case the directory isn't there already
mkdir -p $HOME/.ssh
@ -211,3 +288,49 @@ StrictHostKeyChecking no
# small timeout, used in while loops later
ConnectTimeout 1
EOF
if [ ${OS:0:7} != "freebsd" ]; then
# enable KSM on Linux
sudo virsh dommemstat --domain "openzfs" --period 5
sudo virsh node-memory-tune 100 50 1
echo 1 | sudo tee /sys/kernel/mm/ksm/run > /dev/null
else
# on FreeBSD we need some more init stuff, because of nuageinit
BASH="/usr/local/bin/bash"
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null root@vm0 "uname -a" && break
done
ssh root@vm0 "env IGNORE_OSVERSION=yes pkg install -y bash ca_root_nss git qemu-guest-agent python3 py311-cloud-init"
ssh root@vm0 "chsh -s $BASH root"
ssh root@vm0 'sysrc qemu_guest_agent_enable="YES"'
ssh root@vm0 'sysrc cloudinit_enable="YES"'
ssh root@vm0 "pw add user zfs -w no -s $BASH"
ssh root@vm0 'mkdir -p ~zfs/.ssh'
ssh root@vm0 'echo "zfs ALL=(ALL:ALL) NOPASSWD: ALL" >> /usr/local/etc/sudoers'
ssh root@vm0 'echo "PubkeyAuthentication yes" >> /etc/ssh/sshd_config'
scp ~/.ssh/id_ed25519.pub "root@vm0:~zfs/.ssh/authorized_keys"
ssh root@vm0 'chown -R zfs ~zfs'
ssh root@vm0 'service sshd restart'
scp ~/src.txz "root@vm0:/tmp/src.txz"
ssh root@vm0 'tar -C / -zxf /tmp/src.txz'
fi
#
# Config for Alpine Linux similar to FreeBSD.
#
if [ ${OS:0:6} == "alpine" ]; then
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@vm0 "uname -a" && break
done
# Enable community and testing repositories.
ssh zfs@vm0 "sudo rm -rf /etc/apk/repositories"
ssh zfs@vm0 "sudo setup-apkrepos -c1"
ssh zfs@vm0 "echo '@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing' | sudo tee -a /etc/apk/repositories"
# Upgrade to edge or latest-stable.
#ssh zfs@vm0 "sudo sed -i 's#/v[0-9]\+\.[0-9]\+/#/edge/#g' /etc/apk/repositories"
#ssh zfs@vm0 "sudo sed -i 's#/v[0-9]\+\.[0-9]\+/#/latest-stable/#g' /etc/apk/repositories"
# Update and upgrade after repository setup.
ssh zfs@vm0 "sudo apk update"
ssh zfs@vm0 "sudo apk add --upgrade apk-tools"
ssh zfs@vm0 "sudo apk upgrade --available"
fi

326
.github/workflows/scripts/qemu-3-deps-vm.sh vendored Executable file
View File

@ -0,0 +1,326 @@
#!/usr/bin/env bash
######################################################################
# 3) install dependencies for compiling and loading
#
# qemu-3-deps-vm.sh [--poweroff] OS_NAME [FEDORA_VERSION]
#
# --poweroff: Power off the VM after installing dependencies
# OS_NAME: OS name (like 'fedora41')
# FEDORA_VERSION: (optional) Experimental Fedora kernel version, like "6.14" to
# install instead of Fedora defaults.
######################################################################
set -eu
function alpine() {
echo "##[group]Install Development Tools"
sudo apk add \
acl alpine-sdk attr autoconf automake bash build-base clang21 coreutils \
cpio cryptsetup curl curl-dev dhcpcd eudev eudev-dev eudev-libs findutils \
fio gawk gdb gettext-dev git grep jq libaio libaio-dev libcurl \
libtirpc-dev libtool libunwind libunwind-dev linux-headers linux-tools \
linux-virt linux-virt-dev lsscsi m4 make nfs-utils openssl-dev parted \
pax procps py3-cffi py3-distlib py3-packaging py3-setuptools python3 \
python3-dev qemu-guest-agent rng-tools rsync samba samba-server sed \
strace sysstat util-linux util-linux-dev wget words xfsprogs xxhash \
zlib-dev pamtester@testing
echo "##[endgroup]"
echo "##[group]Switch to eudev"
sudo setup-devd udev
echo "##[endgroup]"
echo "##[group]Install ksh93 from Source"
git clone --depth 1 https://github.com/ksh93/ksh.git /tmp/ksh
cd /tmp/ksh
./bin/package make
sudo ./bin/package install /
echo "##[endgroup]"
}
function archlinux() {
echo "##[group]Running pacman -Syu"
sudo btrfs filesystem resize max /
sudo pacman -Syu --noconfirm
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba strace sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
function debian() {
export DEBIAN_FRONTEND="noninteractive"
echo "##[group]Wait for cloud-init to finish"
cloud-init status --wait
echo "##[endgroup]"
echo "##[group]Running apt-get update+upgrade"
sudo sed -i '/[[:alpha:]]-backports/d' /etc/apt/sources.list
sudo apt-get update -y
sudo apt-get upgrade -y
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo apt-get install -y \
acl alien attr autoconf bc cpio cryptsetup curl dbench dh-python dkms \
fakeroot fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev \
libaio-dev libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev \
libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging libtirpc-dev \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
rsync samba strace sysstat uuid-dev watchdog wget xfslibs-dev xxhash \
zlib1g-dev
echo "##[endgroup]"
}
function freebsd() {
export ASSUME_ALWAYS_YES="YES"
echo "##[group]Install Development Tools"
sudo pkg install -y autoconf automake autotools base64 checkbashisms fio \
gdb gettext gettext-runtime git gmake gsed jq ksh lcov libtool lscpu \
pkgconf python python3 pamtester pamtester qemu-guest-agent rsync xxhash
sudo pkg install -xy \
'^samba4[[:digit:]]+$' \
'^py3[[:digit:]]+-cffi$' \
'^py3[[:digit:]]+-sysctl$' \
'^py3[[:digit:]]+-setuptools$' \
'^py3[[:digit:]]+-packaging$'
echo "##[endgroup]"
}
# common packages for: almalinux, centos, redhat
function rhel() {
echo "##[group]Running dnf update"
echo "max_parallel_downloads=10" | sudo -E tee -a /etc/dnf/dnf.conf
sudo dnf clean all
sudo dnf update -y --setopt=fastestmirror=1 --refresh
echo "##[endgroup]"
echo "##[group]Install Development Tools"
# Alma wants "Development Tools", Fedora 41 wants "development-tools"
if ! sudo dnf group install -y "Development Tools" ; then
echo "Trying 'development-tools' instead of 'Development Tools'"
sudo dnf group install -y development-tools
fi
sudo dnf install -y \
acl attr bc bzip2 cryptsetup curl dbench dkms elfutils-libelf-devel fio \
gdb git jq kernel-rpm-macros ksh libacl-devel libaio-devel \
libargon2-devel libattr-devel libblkid-devel libcurl-devel libffi-devel \
ncompress libselinux-devel libtirpc-devel libtool libudev-devel \
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba strace sysstat systemd watchdog wget xfsprogs-devel \
xxhash zlib-devel
# These are needed for building Lustre. We only install these on EL VMs since
# we don't plan to test build Lustre on other platforms.
sudo dnf install -y libnl3-devel libyaml-devel libmount-devel
echo "##[endgroup]"
}
function tumbleweed() {
echo "##[group]Running zypper is TODO!"
sleep 23456
echo "##[endgroup]"
}
# $1: Kernel version to install (like '6.14rc7')
function install_fedora_experimental_kernel {
our_version="$1"
sudo dnf -y copr enable @kernel-vanilla/stable
sudo dnf -y copr enable @kernel-vanilla/mainline
all="$(sudo dnf list --showduplicates kernel-* python3-perf* perf* bpftool*)"
echo "Available versions:"
echo "$all"
# You can have a bunch of minor variants of the version we want '6.14'.
# Pick the newest variant (sorted by version number).
specific_version=$(echo "$all" | grep $our_version | awk '{print $2}' | sort -V | tail -n 1)
list="$(echo "$all" | grep $specific_version | grep -Ev 'kernel-rt|kernel-selftests|kernel-debuginfo' | sed 's/.x86_64//g' | awk '{print $1"-"$2}')"
sudo dnf install -y $list
sudo dnf -y copr disable @kernel-vanilla/stable
sudo dnf -y copr disable @kernel-vanilla/mainline
}
POWEROFF=""
if [ "$1" == "--poweroff" ] ; then
POWEROFF=1
shift
fi
# Install dependencies
case "$1" in
almalinux8)
echo "##[group]Enable epel and powertools repositories"
sudo dnf config-manager -y --set-enabled powertools
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-whitelists"
sudo dnf install -y kernel-abi-whitelists
echo "##[endgroup]"
;;
almalinux9|almalinux10|centos-stream9|centos-stream10)
echo "##[group]Enable epel and crb repositories"
sudo dnf config-manager -y --set-enabled crb
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-stablelists"
sudo dnf install -y kernel-abi-stablelists
echo "##[endgroup]"
;;
alpine*)
alpine
;;
archlinux)
archlinux
;;
debian*)
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
debian
echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms
echo "##[endgroup]"
;;
fedora*)
rhel
sudo dnf install -y libunwind-devel
# Fedora 42+ moves /usr/bin/script from 'util-linux' to 'util-linux-script'
sudo dnf install -y util-linux-script || true
# Optional: Install an experimental kernel ($2 = kernel version)
if [ -n "${2:-}" ] ; then
install_fedora_experimental_kernel "$2"
fi
;;
freebsd*)
freebsd
;;
tumbleweed)
tumbleweed
;;
ubuntu*)
debian
echo "##[group]Install Ubuntu specific"
sudo apt-get install -yq linux-tools-common libtirpc-dev \
linux-modules-extra-$(uname -r)
sudo apt-get install -yq dh-sequence-dkms
# Need 'build-essential' explicitly for ARM builder
# https://github.com/actions/runner-images/issues/9946
sudo apt-get install -yq build-essential
echo "##[endgroup]"
echo "##[group]Delete Ubuntu OpenZFS modules"
for i in $(find /lib/modules -name zfs -type d); do sudo rm -rvf $i; done
echo "##[endgroup]"
;;
esac
# This script is used for checkstyle + zloop deps also.
# Install only the needed packages and exit - when used this way.
test -z "${ONLY_DEPS:-}" || exit 0
# Start services
echo "##[group]Enable services"
case "$1" in
alpine*)
sudo -E rc-update add qemu-guest-agent
sudo -E rc-update add nfs
sudo -E rc-update add samba
sudo -E rc-update add dhcpcd
# Remove services related to cloud-init.
sudo -E rc-update del cloud-init default
sudo -E rc-update del cloud-final default
sudo -E rc-update del cloud-config default
;;
freebsd*)
# add virtio things
echo 'virtio_load="YES"' | sudo -E tee -a /boot/loader.conf
for i in balloon blk console random scsi; do
echo "virtio_${i}_load=\"YES\"" | sudo -E tee -a /boot/loader.conf
done
echo "fdescfs /dev/fd fdescfs rw 0 0" | sudo -E tee -a /etc/fstab
sudo -E mount /dev/fd
sudo -E touch /etc/zfs/exports
sudo -E sysrc mountd_flags="/etc/zfs/exports"
echo '[global]' | sudo -E tee /usr/local/etc/smb4.conf >/dev/null
sudo -E service nfsd enable
sudo -E service qemu-guest-agent enable
sudo -E service samba_server enable
;;
debian*|ubuntu*)
sudo -E systemctl enable nfs-kernel-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smbd
;;
*)
# All other linux distros
sudo -E systemctl enable nfs-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smb
;;
esac
echo "##[endgroup]"
# Setup Kernel cmdline
CMDLINE="console=tty0 console=ttyS0,115200n8"
CMDLINE="$CMDLINE selinux=0"
CMDLINE="$CMDLINE random.trust_cpu=on"
CMDLINE="$CMDLINE no_timer_check"
case "$1" in
almalinux*|centos*|fedora*)
GRUB_CFG="/boot/grub2/grub.cfg"
GRUB_MKCONFIG="grub2-mkconfig"
CMDLINE="$CMDLINE biosdevname=0 net.ifnames=0"
echo 'GRUB_SERIAL_COMMAND="serial --speed=115200"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
ubuntu24)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
echo 'GRUB_DISABLE_OS_PROBER="false"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
*)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
;;
esac
case "$1" in
alpine*|archlinux|freebsd*)
true
;;
*)
echo "##[group]Edit kernel cmdline"
sudo sed -i -e '/^GRUB_CMDLINE_LINUX/d' /etc/default/grub || true
echo "GRUB_CMDLINE_LINUX=\"$CMDLINE\"" \
| sudo tee -a /etc/default/grub >/dev/null
sudo $GRUB_MKCONFIG -o $GRUB_CFG
echo "##[endgroup]"
;;
esac
# reset cloud-init configuration and poweroff
sudo cloud-init clean --logs
if [ "$POWEROFF" == "1" ] ; then
sleep 2 && sudo poweroff &
fi
exit 0

View File

@ -1,221 +1,28 @@
#!/usr/bin/env bash
######################################################################
# 3) install dependencies for compiling and loading
# 3) Wait for VM to boot from previous step and launch dependencies
# script on it.
#
# $1: OS name (like 'fedora41')
# $2: (optional) Experimental kernel version to install on fedora,
# like "6.14".
######################################################################
set -eu
.github/workflows/scripts/qemu-wait-for-vm.sh vm0
function archlinux() {
echo "##[group]Running pacman -Syu"
sudo btrfs filesystem resize max /
sudo pacman -Syu --noconfirm
echo "##[endgroup]"
# SPECIAL CASE:
#
# If the user passed in an experimental kernel version to test on Fedora,
# we need to update the kernel version in zfs's META file to allow the
# build to happen. We update our local copy of META here, since we know
# it will be rsync'd up in the next step.
if [ -n "${2:-}" ] ; then
sed -i -E 's/Linux-Maximum: .+/Linux-Maximum: 99.99/g' META
fi
echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio dhclient dkms fakeroot \
fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils parted \
pax perf python-packaging python-setuptools qemu-guest-agent ksh samba \
sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
function debian() {
export DEBIAN_FRONTEND="noninteractive"
echo "##[group]Running apt-get update+upgrade"
sudo apt-get update -y
sudo apt-get upgrade -y
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo apt-get install -y \
acl alien attr autoconf bc cpio curl dbench dh-python dkms fakeroot \
fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev libaio-dev \
libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev libelf-dev \
libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev libtool \
libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
rsync samba sysstat uuid-dev watchdog wget xfslibs-dev xxhash zlib1g-dev
echo "##[endgroup]"
}
function freebsd() {
export ASSUME_ALWAYS_YES="YES"
echo "##[group]Install Development Tools"
sudo pkg install -y autoconf automake autotools base64 checkbashisms fio \
gdb gettext gettext-runtime git gmake gsed jq ksh93 lcov libtool lscpu \
pkgconf python python3 pamtester pamtester qemu-guest-agent rsync xxhash
sudo pkg install -xy \
'^samba4[[:digit:]]+$' \
'^py3[[:digit:]]+-cffi$' \
'^py3[[:digit:]]+-sysctl$' \
'^py3[[:digit:]]+-packaging$'
echo "##[endgroup]"
}
# common packages for: almalinux, centos, redhat
function rhel() {
echo "##[group]Running dnf update"
echo "max_parallel_downloads=10" | sudo -E tee -a /etc/dnf/dnf.conf
sudo dnf clean all
sudo dnf update -y --setopt=fastestmirror=1 --refresh
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo dnf group install -y "Development Tools"
sudo dnf install -y \
acl attr bc bzip2 curl dbench dkms elfutils-libelf-devel fio gdb git \
jq kernel-rpm-macros ksh libacl-devel libaio-devel libargon2-devel \
libattr-devel libblkid-devel libcurl-devel libffi-devel ncompress \
libselinux-devel libtirpc-devel libtool libudev-devel libuuid-devel \
lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester parted perf \
python3 python3-cffi python3-devel python3-packaging kernel-devel \
python3-setuptools qemu-guest-agent rng-tools rpcgen rpm-build rsync \
samba sysstat systemd watchdog wget xfsprogs-devel xxhash zlib-devel
echo "##[endgroup]"
}
function tumbleweed() {
echo "##[group]Running zypper is TODO!"
sleep 23456
echo "##[endgroup]"
}
# Install dependencies
case "$1" in
almalinux8)
echo "##[group]Enable epel and powertools repositories"
sudo dnf config-manager -y --set-enabled powertools
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-whitelists"
sudo dnf install -y kernel-abi-whitelists
echo "##[endgroup]"
;;
almalinux9|centos-stream9)
echo "##[group]Enable epel and crb repositories"
sudo dnf config-manager -y --set-enabled crb
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-stablelists"
sudo dnf install -y kernel-abi-stablelists
echo "##[endgroup]"
;;
archlinux)
archlinux
;;
debian*)
debian
echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms
echo "##[endgroup]"
;;
fedora*)
rhel
;;
freebsd*)
freebsd
;;
tumbleweed)
tumbleweed
;;
ubuntu*)
debian
echo "##[group]Install Ubuntu specific"
sudo apt-get install -yq linux-tools-common libtirpc-dev \
linux-modules-extra-$(uname -r)
if [ "$1" != "ubuntu20" ]; then
sudo apt-get install -yq dh-sequence-dkms
fi
echo "##[endgroup]"
echo "##[group]Delete Ubuntu OpenZFS modules"
for i in $(find /lib/modules -name zfs -type d); do sudo rm -rvf $i; done
echo "##[endgroup]"
;;
esac
# This script is used for checkstyle + zloop deps also.
# Install only the needed packages and exit - when used this way.
test -z "${ONLY_DEPS:-}" || exit 0
# Start services
echo "##[group]Enable services"
case "$1" in
freebsd*)
# add virtio things
echo 'virtio_load="YES"' | sudo -E tee -a /boot/loader.conf
for i in balloon blk console random scsi; do
echo "virtio_${i}_load=\"YES\"" | sudo -E tee -a /boot/loader.conf
done
echo "fdescfs /dev/fd fdescfs rw 0 0" | sudo -E tee -a /etc/fstab
sudo -E mount /dev/fd
sudo -E touch /etc/zfs/exports
sudo -E sysrc mountd_flags="/etc/zfs/exports"
echo '[global]' | sudo -E tee /usr/local/etc/smb4.conf >/dev/null
sudo -E service nfsd enable
sudo -E service qemu-guest-agent enable
sudo -E service samba_server enable
;;
debian*|ubuntu*)
sudo -E systemctl enable nfs-kernel-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smbd
;;
*)
# All other linux distros
sudo -E systemctl enable nfs-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smb
;;
esac
echo "##[endgroup]"
# Setup Kernel cmdline
CMDLINE="console=tty0 console=ttyS0,115200n8"
CMDLINE="$CMDLINE selinux=0"
CMDLINE="$CMDLINE random.trust_cpu=on"
CMDLINE="$CMDLINE no_timer_check"
case "$1" in
almalinux*|centos*|fedora*)
GRUB_CFG="/boot/grub2/grub.cfg"
GRUB_MKCONFIG="grub2-mkconfig"
CMDLINE="$CMDLINE biosdevname=0 net.ifnames=0"
echo 'GRUB_SERIAL_COMMAND="serial --speed=115200"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
ubuntu24)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
echo 'GRUB_DISABLE_OS_PROBER="false"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
*)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
;;
esac
case "$1" in
archlinux|freebsd*)
true
;;
*)
echo "##[group]Edit kernel cmdline"
sudo sed -i -e '/^GRUB_CMDLINE_LINUX/d' /etc/default/grub || true
echo "GRUB_CMDLINE_LINUX=\"$CMDLINE\"" \
| sudo tee -a /etc/default/grub >/dev/null
sudo $GRUB_MKCONFIG -o $GRUB_CFG
echo "##[endgroup]"
;;
esac
# reset cloud-init configuration and poweroff
sudo cloud-init clean --logs
sleep 2 && sudo poweroff &
exit 0
scp .github/workflows/scripts/qemu-3-deps-vm.sh zfs@vm0:qemu-3-deps-vm.sh
PID=`pidof /usr/bin/qemu-system-x86_64`
ssh zfs@vm0 '$HOME/qemu-3-deps-vm.sh' "$@"
# wait for poweroff to succeed
tail --pid=$PID -f /dev/null
sleep 5 # avoid this: "error: Domain is already active"
rm -f $HOME/.ssh/known_hosts

405
.github/workflows/scripts/qemu-4-build-vm.sh vendored Executable file
View File

@ -0,0 +1,405 @@
#!/usr/bin/env bash
######################################################################
# 4) configure and build openzfs modules. This is run on the VMs.
#
# Usage:
#
# qemu-4-build-vm.sh OS [--enable-debug][--dkms][--patch-level NUM]
# [--poweroff][--release][--repo][--tarball]
#
# OS: OS name like 'fedora41'
# --enable-debug: Build RPMs with '--enable-debug' (for testing)
# --dkms: Build DKMS RPMs as well
# --patch-level NUM: Use a custom patch level number for packages.
# --poweroff: Power-off the VM after building
# --release Build zfs-release*.rpm as well
# --repo After building everything, copy RPMs into /tmp/repo
# in the ZFS RPM repository file structure. Also
# copy tarballs if they were built.
# --tarball: Also build a tarball of ZFS source
######################################################################
ENABLE_DEBUG=""
DKMS=""
PATCH_LEVEL=""
POWEROFF=""
RELEASE=""
REPO=""
TARBALL=""
while [[ $# -gt 0 ]]; do
case $1 in
--enable-debug)
ENABLE_DEBUG=1
shift
;;
--dkms)
DKMS=1
shift
;;
--patch-level)
PATCH_LEVEL=$2
shift
shift
;;
--poweroff)
POWEROFF=1
shift
;;
--release)
RELEASE=1
shift
;;
--repo)
REPO=1
shift
;;
--tarball)
TARBALL=1
shift
;;
*)
OS=$1
shift
;;
esac
done
set -eu
function run() {
LOG="/var/tmp/build-stderr.txt"
echo "****************************************************"
echo "$(date) ($*)"
echo "****************************************************"
($@ || echo $? > /tmp/rv) 3>&1 1>&2 2>&3 | stdbuf -eL -oL tee -a $LOG
if [ -f /tmp/rv ]; then
RV=$(cat /tmp/rv)
echo "****************************************************"
echo "exit with value=$RV ($*)"
echo "****************************************************"
echo 1 > /var/tmp/build-exitcode.txt
exit $RV
fi
}
# Look at the RPMs in the current directory and copy/move them to
# /tmp/repo, using the directory structure we use for the ZFS RPM repos.
#
# For example:
# /tmp/repo/epel-testing/9.5
# /tmp/repo/epel-testing/9.5/SRPMS
# /tmp/repo/epel-testing/9.5/SRPMS/zfs-2.3.99-1.el9.src.rpm
# /tmp/repo/epel-testing/9.5/SRPMS/zfs-kmod-2.3.99-1.el9.src.rpm
# /tmp/repo/epel-testing/9.5/kmod
# /tmp/repo/epel-testing/9.5/kmod/x86_64
# /tmp/repo/epel-testing/9.5/kmod/x86_64/debug
# /tmp/repo/epel-testing/9.5/kmod/x86_64/debug/kmod-zfs-debuginfo-2.3.99-1.el9.x86_64.rpm
# /tmp/repo/epel-testing/9.5/kmod/x86_64/debug/libnvpair3-debuginfo-2.3.99-1.el9.x86_64.rpm
# /tmp/repo/epel-testing/9.5/kmod/x86_64/debug/libuutil3-debuginfo-2.3.99-1.el9.x86_64.rpm
# ...
function copy_rpms_to_repo {
# Pick a RPM to query. It doesn't matter which one - we just want to extract
# the 'Build Host' value from it.
rpm=$(ls zfs-*.rpm | head -n 1)
# Get zfs version '2.2.99'
zfs_ver=$(rpm -qpi $rpm | awk '/Version/{print $3}')
# Get "2.1" or "2.2"
zfs_major=$(echo $zfs_ver | grep -Eo [0-9]+\.[0-9]+)
# Get 'almalinux9.5' or 'fedora41' type string
build_host=$(rpm -qpi $rpm | awk '/Build Host/{print $4}')
# Get '9.5' or '41' OS version
os_ver=$(echo $build_host | grep -Eo '[0-9\.]+$')
# Our ZFS version and OS name will determine which repo the RPMs
# will go in (regular or testing). Fedora always gets the newest
# releases, and Alma gets the older releases.
case $build_host in
almalinux*)
case $zfs_major in
2.2)
d="epel"
;;
*)
d="epel-testing"
;;
esac
;;
fedora*)
d="fedora"
;;
esac
prefix=/tmp/repo
dst="$prefix/$d/$os_ver"
# Special case: move zfs-release*.rpm out of the way first (if we built them).
# This will make filtering the other RPMs easier.
mkdir -p $dst
mv zfs-release*.rpm $dst || true
# Copy source RPMs
mkdir -p $dst/SRPMS
cp $(ls *.src.rpm) $dst/SRPMS/
if [[ "$build_host" =~ "almalinux" ]] ; then
# Copy kmods+userspace
mkdir -p $dst/kmod/x86_64/debug
cp $(ls *.rpm | grep -Ev 'src.rpm|dkms|debuginfo') $dst/kmod/x86_64
cp *debuginfo*.rpm $dst/kmod/x86_64/debug
fi
if [ -n "$DKMS" ] ; then
# Copy dkms+userspace
mkdir -p $dst/x86_64
cp $(ls *.rpm | grep -Ev 'src.rpm|kmod|debuginfo') $dst/x86_64
fi
# Copy debug
mkdir -p $dst/x86_64/debug
cp $(ls *debuginfo*.rpm | grep -v kmod) $dst/x86_64/debug
}
function freebsd() {
extra="${1:-}"
export MAKE="gmake"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr/local \
--with-libintl-prefix=/usr/local \
--enable-pyzfs \
--enable-debuginfo $extra
echo "##[endgroup]"
echo "##[group]Build"
run gmake -j$(nproc)
echo "##[endgroup]"
echo "##[group]Install"
run sudo gmake install
echo "##[endgroup]"
}
function linux() {
extra="${1:-}"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debuginfo $extra
echo "##[endgroup]"
echo "##[group]Build"
run make -j$(nproc)
echo "##[endgroup]"
echo "##[group]Install"
run sudo make install
echo "##[endgroup]"
}
function rpm_build_and_install() {
extra="${1:-}"
# Build RPMs with XZ compression by default (since gzip decompression is slow)
echo "%_binary_payload w7.xzdio" >> ~/.rpmmacros
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
if [ -n "$PATCH_LEVEL" ] ; then
sed -i -E 's/(Release:\s+)1/\1'$PATCH_LEVEL'/g' META
fi
echo "##[group]Configure"
run ./configure --enable-debuginfo $extra
echo "##[endgroup]"
echo "##[group]Build"
run make pkg-kmod pkg-utils
echo "##[endgroup]"
if [ -n "$DKMS" ] ; then
echo "##[group]DKMS"
make rpm-dkms
echo "##[endgroup]"
fi
if [ -n "$REPO" ] ; then
echo "Skipping install since we're only building RPMs and nothing else"
else
echo "##[group]Install"
run sudo dnf -y --nobest install $(ls *.rpm | grep -Ev 'dkms|src.rpm')
echo "##[endgroup]"
fi
# Optionally build the zfs-release.*.rpm
if [ -n "$RELEASE" ] ; then
echo "##[group]Release"
pushd ~
sudo dnf -y install rpm-build || true
# Check out a sparse copy of zfsonlinux.github.com.git so we don't get
# all the binaries. We just need a few kilobytes of files to build RPMs.
git clone --depth 1 --no-checkout \
https://github.com/zfsonlinux/zfsonlinux.github.com.git
cd zfsonlinux.github.com
git sparse-checkout set zfs-release
git checkout
cd zfs-release
mkdir -p ~/rpmbuild/{BUILDROOT,SPECS,RPMS,SRPMS,SOURCES,BUILD}
cp RPM-GPG-KEY-openzfs* *.repo ~/rpmbuild/SOURCES
cp zfs-release.spec ~/rpmbuild/SPECS/
rpmbuild -ba ~/rpmbuild/SPECS/zfs-release.spec
# ZFS release RPMs are built. Copy them to the ~/zfs directory just to
# keep all the RPMs in the same place.
cp ~/rpmbuild/RPMS/noarch/*.rpm ~/zfs
cp ~/rpmbuild/SRPMS/*.rpm ~/zfs
popd
rm -fr ~/rpmbuild
echo "##[endgroup]"
fi
if [ -n "$REPO" ] ; then
echo "##[group]Repo"
copy_rpms_to_repo
echo "##[endgroup]"
fi
}
function deb_build_and_install() {
extra="${1:-}"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debuginfo $extra
echo "##[endgroup]"
echo "##[group]Build"
run make native-deb-kmod native-deb-utils
echo "##[endgroup]"
echo "##[group]Install"
# Do kmod install. Note that when you build the native debs, the
# packages themselves are placed in parent directory '../' rather than
# in the source directory like the rpms are.
run sudo apt-get -y install $(find ../ | grep -E '\.deb$' \
| grep -Ev 'dkms|dracut')
echo "##[endgroup]"
}
function build_tarball {
if [ -n "$REPO" ] ; then
./autogen.sh
./configure --with-config=srpm
make dist
mkdir -p /tmp/repo/releases
# The tarball name is based off of 'Version' field in the META file.
mv *.tar.gz /tmp/repo/releases/
fi
}
# Debug: show kernel cmdline
if [ -f /proc/cmdline ] ; then
cat /proc/cmdline || true
fi
# Set our hostname to our OS name and version number. Specifically, we set the
# major and minor number so that when we query the Build Host field in the RPMs
# we build, we can see what specific version of Fedora/Almalinux we were using
# to build them. This is helpful for matching up KMOD versions.
#
# Examples:
#
# rhel8.10
# almalinux9.5
# fedora42
source /etc/os-release
if which hostnamectl &> /dev/null ; then
# Fedora 42+ use hostnamectl
sudo hostnamectl set-hostname "$ID$VERSION_ID"
sudo hostnamectl set-hostname --pretty "$ID$VERSION_ID"
else
sudo hostname "$ID$VERSION_ID"
fi
# save some sysinfo
uname -a > /var/tmp/uname.txt
# Check if we're running this script from within a VM or on the runner itself.
# Most of the time we will be running in a VM, but the ARM builder actually
# runs this script on the runner. If we happen to be running on the ARM
# runner, we will start in the ZFS source directory. If we're running on a VM
# then we'll just start in our home directory, and will need to 'cd' into our
# source directory.
if [ ! -e META ] ; then
cd $HOME/zfs
fi
export PATH="$PATH:/sbin:/usr/sbin:/usr/local/sbin"
extra=""
if [ -n "$ENABLE_DEBUG" ] ; then
extra="--enable-debug"
fi
# build
case "$OS" in
freebsd*)
freebsd "$extra"
;;
alma*|centos*)
rpm_build_and_install "--with-spec=redhat $extra"
;;
fedora*)
rpm_build_and_install "$extra"
# Historically, we've always built the release tarballs on Fedora, since
# there was one instance long ago where we built them on CentOS 7, and they
# didn't work correctly for everyone.
if [ -n "$TARBALL" ] ; then
build_tarball
fi
;;
debian*|ubuntu*)
deb_build_and_install "$extra"
;;
*)
linux "$extra"
;;
esac
# building the zfs module was ok
echo 0 > /var/tmp/build-exitcode.txt
# reset cloud-init configuration and poweroff
if [ -n "$POWEROFF" ] ; then
sudo cloud-init clean --logs
sync && sleep 2 && sudo poweroff &
fi
exit 0

View File

@ -3,151 +3,9 @@
######################################################################
# 4) configure and build openzfs modules
######################################################################
echo "Build modules in QEMU machine"
set -eu
# Bring our VM back up and copy over ZFS source
.github/workflows/scripts/qemu-prepare-for-build.sh
function run() {
LOG="/var/tmp/build-stderr.txt"
echo "****************************************************"
echo "$(date) ($*)"
echo "****************************************************"
($@ || echo $? > /tmp/rv) 3>&1 1>&2 2>&3 | stdbuf -eL -oL tee -a $LOG
if [ -f /tmp/rv ]; then
RV=$(cat /tmp/rv)
echo "****************************************************"
echo "exit with value=$RV ($*)"
echo "****************************************************"
echo 1 > /var/tmp/build-exitcode.txt
exit $RV
fi
}
function freebsd() {
export MAKE="gmake"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr/local \
--with-libintl-prefix=/usr/local \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run gmake -j$(sysctl -n hw.ncpu)
echo "##[endgroup]"
echo "##[group]Install"
run sudo gmake install
echo "##[endgroup]"
}
function linux() {
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run make -j$(nproc)
echo "##[endgroup]"
echo "##[group]Install"
run sudo make install
echo "##[endgroup]"
}
function rpm_build_and_install() {
EXTRA_CONFIG="${1:-}"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure --enable-debug --enable-debuginfo $EXTRA_CONFIG
echo "##[endgroup]"
echo "##[group]Build"
run make pkg-kmod pkg-utils
echo "##[endgroup]"
echo "##[group]Install"
run sudo dnf -y --skip-broken localinstall $(ls *.rpm | grep -v src.rpm)
echo "##[endgroup]"
}
function deb_build_and_install() {
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run make native-deb-kmod native-deb-utils
echo "##[endgroup]"
echo "##[group]Install"
# Do kmod install. Note that when you build the native debs, the
# packages themselves are placed in parent directory '../' rather than
# in the source directory like the rpms are.
run sudo apt-get -y install $(find ../ | grep -E '\.deb$' \
| grep -Ev 'dkms|dracut')
echo "##[endgroup]"
}
# Debug: show kernel cmdline
if [ -f /proc/cmdline ] ; then
cat /proc/cmdline || true
fi
# save some sysinfo
uname -a > /var/tmp/uname.txt
cd $HOME/zfs
export PATH="$PATH:/sbin:/usr/sbin:/usr/local/sbin"
# build
case "$1" in
freebsd*)
freebsd
;;
alma*|centos*)
rpm_build_and_install "--with-spec=redhat"
;;
fedora*)
rpm_build_and_install
;;
debian*|ubuntu*)
deb_build_and_install
;;
*)
linux
;;
esac
# building the zfs module was ok
echo 0 > /var/tmp/build-exitcode.txt
# reset cloud-init configuration and poweroff
sudo cloud-init clean --logs
sync && sleep 2 && sudo poweroff &
exit 0
ssh zfs@vm0 '$HOME/zfs/.github/workflows/scripts/qemu-4-build-vm.sh' $@

View File

@ -12,37 +12,45 @@ source /var/tmp/env.txt
# wait for poweroff to succeed
PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
sudo virsh undefine --nvram openzfs
# cpu pinning
CPUSET=("0,1" "2,3")
# additional options for virt-install
OPTS[0]=""
OPTS[1]=""
# definitions of per operating system
case "$OS" in
freebsd*)
VMs=2
CPU=3
# FreeBSD needs only 6GiB
RAM=6
;;
debian13)
RAM=8
# Boot Debian 13 with uefi=on and secureboot=off (ZFS Kernel Module not signed)
OPTS[0]="--boot"
OPTS[1]="firmware=efi,firmware.feature0.name=secure-boot,firmware.feature0.enabled=no"
;;
*)
VMs=2
CPU=3
RAM=7
# Linux needs more memory, but can be optimized to share it via KSM
RAM=8
;;
esac
# this can be different for each distro
echo "VMs=$VMs" >> $ENV
# create snapshot we can clone later
sudo zfs snapshot zpool/openzfs@now
# setup the testing vm's
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
for i in $(seq 1 $VMs); do
# start testing VMs
for ((i=1; i<=VMs; i++)); do
echo "Creating disk for vm$i..."
DISK="/dev/zvol/zpool/vm$i"
FORMAT="raw"
sudo zfs clone zpool/openzfs@now zpool/vm$i
sudo zfs create -ps -b 64k -V 80g zpool/vm$i-2
sudo zfs clone zpool/openzfs@now zpool/vm$i-system
sudo zfs create -ps -b 64k -V 64g zpool/vm$i-tests
cat <<EOF > /tmp/user-data
#cloud-config
@ -50,13 +58,21 @@ for i in $(seq 1 $VMs); do
fqdn: vm$i
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
- name: root
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
- name: zfs
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- $PUBKEY
# Workaround for Alpine Linux.
lock_passwd: false
passwd: '*'
packages:
- sudo
- bash
growpart:
mode: auto
@ -73,49 +89,57 @@ EOF
--cpu host-passthrough \
--virt-type=kvm --hvm \
--vcpus=$CPU,sockets=1 \
--cpuset=${CPUSET[$((i-1))]} \
--memory $((1024*RAM)) \
--memballoon model=virtio \
--graphics none \
--cloud-init user-data=/tmp/user-data \
--network bridge=virbr0,model=$NIC,mac="52:54:00:83:79:0$i" \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-2,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
--disk $DISK-system,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-tests,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole ${OPTS[0]} ${OPTS[1]}
done
# check the memory state from time to time
# generate some memory stats
cat <<EOF > cronjob.sh
# $OS
exec 1>>/var/tmp/stats.txt
exec 2>&1
echo "*******************************************************"
date
echo "********************************************************************************"
uptime
free -m
df -h /mnt/tests
zfs list
EOF
sudo chmod +x cronjob.sh
sudo mv -f cronjob.sh /root/cronjob.sh
echo '*/5 * * * * /root/cronjob.sh' > crontab.txt
sudo crontab crontab.txt
rm crontab.txt
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for i in $(seq 1 $VMs); do
while true; do
ssh 2>/dev/null zfs@192.168.122.1$i "uname -a" && break
done
done
echo "All $VMs VMs are up now."
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
# Create the file so we can tail it, even if there's no output.
touch $RESPATH/vm$i/console.txt
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
# Write all VM boot lines to the console to aid in debugging failed boots.
# The boot lines from all the VMs will be munged together, so prepend each
# line with the vm hostname (like 'vm1:').
(while IFS=$'\n' read -r line; do echo "vm$i: $line" ; done < <(sudo tail -f $RESPATH/vm$i/console.txt)) &
done
echo "Console logging for ${VMs}x $OS started."
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for ((i=1; i<=VMs; i++)); do
.github/workflows/scripts/qemu-wait-for-vm.sh vm$i
done
echo "All $VMs VMs are up now."

View File

@ -0,0 +1,51 @@
#!/usr/bin/env bash
######################################################################
# 6) Test if Lustre can still build against ZFS
######################################################################
set -e
# Build from the latest Lustre tag rather than the master branch. We do this
# under the assumption that master is going to have a lot of churn thus will be
# more prone to breaking the build than a point release. We don't want ZFS
# PR's reporting bad test results simply because upstream Lustre accidentally
# broke their build.
#
# Skip any RC tags, or any tags where the last version digit is 50 or more.
# Versions with 50 or more are development versions of Lustre.
repo=https://github.com/lustre/lustre-release.git
tag="$(git ls-remote --refs --exit-code --sort=version:refname --tags $repo | \
awk -F '_' '/-RC/{next}; /refs\/tags\/v/{if ($NF < 50){print}}' | \
tail -n 1 | sed 's/.*\///')"
echo "Cloning Lustre tag $tag"
git clone --depth 1 --branch "$tag" "$repo"
cd lustre-release
# Include Lustre patches to build against master/zfs-2.4.x. Once these
# patches are merged we can remove these lines.
patches=('https://review.whamcloud.com/changes/fs%2Flustre-release~62101/revisions/2/patch?download'
'https://review.whamcloud.com/changes/fs%2Flustre-release~63267/revisions/9/patch?download')
for p in "${patches[@]}" ; do
curl $p | base64 -d > patch
patch -p1 < patch || true
done
echo "Configure Lustre"
./autogen.sh
# EL 9 needs '--disable-gss-keyring'
./configure --with-zfs --disable-gss-keyring
echo "Building Lustre RPMs"
make rpms
ls *.rpm
# There's only a handful of Lustre RPMs we actually need to install
lustrerpms="$(ls *.rpm | grep -E 'kmod-lustre-osd-zfs-[0-9]|kmod-lustre-[0-9]|lustre-osd-zfs-mount-[0-9]')"
echo "Installing: $lustrerpms"
sudo dnf -y install $lustrerpms
sudo modprobe -v lustre
# Should see some Lustre lines in dmesg
sudo dmesg | grep -Ei 'lnet|lustre'

View File

@ -4,7 +4,10 @@
# 6) load openzfs module and run the tests
#
# called on runner: qemu-6-tests.sh
# called on qemu-vm: qemu-6-tests.sh $OS $2/$3
# called on qemu-vm: qemu-6-tests.sh $OS $2 $3 [--lustre|--builtin] [quick|default]
#
# --lustre: Test build lustre in addition to the normal tests
# --builtin: Test build ZFS as a kernel built-in in addition to the normal tests
######################################################################
set -eu
@ -21,11 +24,13 @@ function prefix() {
S=$((DIFF-(M*60)))
CTR=$(cat /tmp/ctr)
echo $LINE| grep -q "^Test[: ]" && CTR=$((CTR+1)) && echo $CTR > /tmp/ctr
echo $LINE| grep -q '^\[.*] Test[: ]' && CTR=$((CTR+1)) && echo $CTR > /tmp/ctr
BASE="$HOME/work/zfs/zfs"
COLOR="$BASE/scripts/zfs-tests-color.sh"
CLINE=$(echo $LINE| grep "^Test[ :]" | sed -e 's|/usr/local|/usr|g' \
CLINE=$(echo $LINE| grep '^\[.*] Test[: ]' \
| sed -e 's|^\[.*] Test|Test|g' \
| sed -e 's|/usr/local|/usr|g' \
| sed -e 's| /usr/share/zfs/zfs-tests/tests/| |g' | $COLOR)
if [ -z "$CLINE" ]; then
printf "vm${ID}: %s\n" "$LINE"
@ -36,6 +41,54 @@ function prefix() {
fi
}
function do_lustre_build() {
local rc=0
$HOME/zfs/.github/workflows/scripts/qemu-6-lustre-tests-vm.sh &> /var/tmp/lustre.txt || rc=$?
echo "$rc" > /var/tmp/lustre-exitcode.txt
if [ "$rc" != "0" ] ; then
echo "$rc" > /var/tmp/tests-exitcode.txt
fi
}
export -f do_lustre_build
# Test build ZFS into the kernel directly
function do_builtin_build() {
local rc=0
# Get currently full kernel version (like '6.18.8')
fullver=$(uname -r | grep -Eo '^[0-9]+\.[0-9]+\.[0-9]+')
# Get just the major ('6')
major=$(echo $fullver | grep -Eo '^[0-9]+')
(
set -e
wget https://cdn.kernel.org/pub/linux/kernel/v${major}.x/linux-$fullver.tar.xz
tar -xf $HOME/linux-$fullver.tar.xz
cd $HOME/linux-$fullver
make tinyconfig
./scripts/config --enable EFI_PARTITON
./scripts/config --enable BLOCK
# BTRFS_FS is easiest config option to enable CONFIG_ZLIB_INFLATE|DEFLATE
./scripts/config --enable BTRFS_FS
yes "" | make oldconfig
make prepare
cd $HOME/zfs
./configure --with-linux=$HOME/linux-$fullver --enable-linux-builtin --enable-debug
./copy-builtin $HOME/linux-$fullver
cd $HOME/linux-$fullver
./scripts/config --enable ZFS
yes "" | make oldconfig
make -j `nproc`
) &> /var/tmp/builtin.txt || rc=$?
echo "$rc" > /var/tmp/builtin-exitcode.txt
if [ "$rc" != "0" ] ; then
echo "$rc" > /var/tmp/tests-exitcode.txt
fi
}
export -f do_builtin_build
# called directly on the runner
if [ -z ${1:-} ]; then
cd "/var/tmp"
@ -45,10 +98,26 @@ if [ -z ${1:-} ]; then
echo 0 > /tmp/ctr
date "+%s" > /tmp/tsstart
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
IP="192.168.122.1$i"
# We do an additional test build of Lustre against ZFS if we're vm2
# on almalinux*. At the time of writing, the vm2 tests were
# completing roughly 15min before the vm1 tests, so it makes sense
# to have vm2 do the build.
#
# In addition, we do an additional test build of ZFS as a Linux
# kernel built-in on Fedora. Again, we do it on vm2 to exploit vm2's
# early finish time.
extra=""
if [[ "$OS" == almalinux* ]] && [[ "$i" == "2" ]] ; then
extra="--lustre"
elif [[ "$OS" == fedora* ]] && [[ "$i" == "2" ]] ; then
extra="--builtin"
fi
daemonize -c /var/tmp -p vm${i}.pid -o vm${i}log.txt -- \
$SSH zfs@$IP $TESTS $OS $i $VMs $CI_TYPE
$SSH zfs@$IP $TESTS $OS $i $VMs $extra $CI_TYPE
# handly line by line and add info prefix
stdbuf -oL tail -fq vm${i}log.txt \
| while read -r line; do prefix "$i" "$line"; done &
@ -58,7 +127,7 @@ if [ -z ${1:-} ]; then
done
# wait for all vm's to finish
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
tail --pid=$(cat vm${i}.pid) -f /dev/null
pid=$(cat vm${i}log.pid)
rm -f vm${i}log.pid
@ -68,36 +137,93 @@ if [ -z ${1:-} ]; then
exit 0
fi
# this part runs inside qemu vm
#############################################
# Everything from here on runs inside qemu vm
#############################################
# Process cmd line args
OS="$1"
shift
NUM="$1"
shift
DEN="$1"
shift
BUILD_LUSTRE=0
BUILD_BUILTIN=0
if [ "$1" == "--lustre" ] ; then
BUILD_LUSTRE=1
shift
elif [ "$1" == "--builtin" ] ; then
BUILD_BUILTIN=1
shift
fi
if [ "$1" == "quick" ] ; then
export RUNFILES="sanity.run"
fi
export PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin"
case "$1" in
case "$OS" in
freebsd*)
TDIR="/usr/local/share/zfs"
sudo kldstat -n zfs 2>/dev/null && sudo kldunload zfs
sudo -E ./zfs/scripts/zfs.sh
TDIR="/usr/local/share/zfs"
sudo mv -f /var/tmp/*.txt /tmp
sudo newfs -U -t -L tmp /dev/vtbd1 >/dev/null
sudo mount -o noatime /dev/vtbd1 /var/tmp
sudo chmod 1777 /var/tmp
sudo mv -f /tmp/*.txt /var/tmp
;;
*)
# use xfs @ /var/tmp for all distros
TDIR="/usr/share/zfs"
sudo -E modprobe zfs
sudo mv -f /var/tmp/*.txt /tmp
sudo mkfs.xfs -fq /dev/vdb
sudo mount -o noatime /dev/vdb /var/tmp
sudo chmod 1777 /var/tmp
sudo mv -f /tmp/*.txt /var/tmp
sudo -E modprobe zfs
TDIR="/usr/share/zfs"
;;
esac
# Distribution-specific settings.
case "$OS" in
almalinux9|almalinux10|centos-stream*)
# Enable io_uring on Enterprise Linux 9 and 10.
sudo sysctl kernel.io_uring_disabled=0 > /dev/null
;;
alpine*)
# Ensure `/etc/zfs/zpool.cache` exists.
sudo mkdir -p /etc/zfs
sudo touch /etc/zfs/zpool.cache
sudo chmod 644 /etc/zfs/zpool.cache
;;
esac
# Lustre calls a number of exported ZFS module symbols. To make sure we don't
# change the symbols and break Lustre, do a quick Lustre build of the latest
# released Lustre against ZFS.
#
# Note that we do the Lustre test build in parallel with ZTS. ZTS isn't very
# CPU intensive, so we can use idle CPU cycles "guilt free" for the build.
# The Lustre build on its own takes ~15min.
if [ "$BUILD_LUSTRE" == "1" ] ; then
do_lustre_build &
elif [ "$BUILD_BUILTIN" == "1" ] ; then
# Try building ZFS directly into the Linux kernel (not as a module)
do_builtin_build &
fi
# run functional testings and save exitcode
cd /var/tmp
TAGS=$2/$3
if [ "$4" == "quick" ]; then
export RUNFILES="sanity.run"
fi
TAGS=$NUM/$DEN
sudo dmesg -c > dmesg-prerun.txt
mount > mount.txt
df -h > df-prerun.txt
$TDIR/zfs-tests.sh -vK -s 3GB -T $TAGS
$TDIR/zfs-tests.sh -vKO -s 3GB -T $TAGS
RV=$?
df -h > df-postrun.txt
echo $RV > tests-exitcode.txt

View File

@ -28,15 +28,16 @@ BASE="$HOME/work/zfs/zfs"
MERGE="$BASE/.github/workflows/scripts/merge_summary.awk"
# catch result files of testings (vm's should be there)
for i in $(seq 1 $VMs); do
rsync -arL zfs@192.168.122.1$i:$RESPATH/current $RESPATH/vm$i || true
scp zfs@192.168.122.1$i:"/var/tmp/*.txt" $RESPATH/vm$i || true
for ((i=1; i<=VMs; i++)); do
rsync -arL zfs@vm$i:$RESPATH/current $RESPATH/vm$i || true
scp zfs@vm$i:"/var/tmp/*.txt" $RESPATH/vm$i || true
scp zfs@vm$i:"/var/tmp/*.rpm" $RESPATH/vm$i || true
done
cp -f /var/tmp/*.txt $RESPATH || true
cd $RESPATH
# prepare result files for summary
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
file="vm$i/build-stderr.txt"
test -s $file && mv -f $file build-stderr.txt

View File

@ -31,6 +31,12 @@ EOF
rm -f tmp$$
}
function showfile_tail() {
echo "##[group]$2 (final lines)"
tail -n 80 $1
echo "##[endgroup]"
}
# overview
cat /tmp/summary.txt
echo ""
@ -45,7 +51,33 @@ fi
echo -e "\nFull logs for download:\n $1\n"
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
# Print Lustre build test results (the build is only done on vm2)
if [ -f vm$i/lustre-exitcode.txt ] ; then
rv=$(< vm$i/lustre-exitcode.txt)
if [ $rv = 0 ]; then
vm="vm$i"
else
vm="vm$i"
touch /tmp/have_failed_tests
fi
file="vm$i/lustre.txt"
test -s "$file" && showfile_tail "$file" "$vm: Lustre build"
fi
if [ -f vm$i/builtin-exitcode.txt ] ; then
rv=$(< vm$i/builtin-exitcode.txt)
if [ $rv = 0 ]; then
vm="vm$i"
else
vm="vm$i"
touch /tmp/have_failed_tests
fi
file="vm$i/builtin.txt"
test -s "$file" && showfile_tail "$file" "$vm: Linux built-in build"
fi
rv=$(cat vm$i/tests-exitcode.txt)
if [ $rv = 0 ]; then

View File

@ -11,12 +11,10 @@ function output() {
}
function outfile() {
test -s "$1" || return
cat "$1" >> "out-$logfile.md"
}
function outfile_plain() {
test -s "$1" || return
output "<pre>"
cat "$1" >> "out-$logfile.md"
output "</pre>"
@ -45,6 +43,8 @@ if [ ! -f out-1.md ]; then
tar xf "$tarfile"
test -s env.txt || continue
source env.txt
# when uname.txt is there, the other files are also ok
test -s uname.txt || continue
output "\n## Functional Tests: $OSNAME\n"
outfile_plain uname.txt
outfile_plain summary.txt

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
# Helper script to run after installing dependencies. This brings the VM back
# up and copies over the zfs source directory.
echo "Build modules in QEMU machine"
sudo virsh start openzfs
.github/workflows/scripts/qemu-wait-for-vm.sh vm0
rsync -ar $HOME/work/zfs/zfs zfs@vm0:./

115
.github/workflows/scripts/qemu-test-repo-vm.sh vendored Executable file
View File

@ -0,0 +1,115 @@
#!/bin/bash
#
# Do a test install of ZFS from an external repository.
#
# USAGE:
#
# ./qemu-test-repo-vm [--install] [URL]
#
# --lookup: When testing a repo, only lookup the latest package versions,
# don't try to install them. Installing all of them takes over
# an hour, so this is much quicker.
#
# URL: URL to use instead of http://download.zfsonlinux.org
# If blank, use the default repo from zfs-release RPM.
set -e
source /etc/os-release
OS="$ID"
VERSION="$VERSION_ID"
LOOKUP=""
if [ -n "$1" ] && [ "$1" == "--lookup" ] ; then
LOOKUP=1
shift
fi
ALTHOST=""
if [ -n "$1" ] ; then
ALTHOST="$1"
fi
# Write summary to /tmp/repo so our artifacts scripts pick it up
mkdir /tmp/repo
SUMMARY=/tmp/repo/$OS-$VERSION-summary.txt
# $1: Repo 'zfs' 'zfs-kmod' 'zfs-testing' 'zfs-testing-kmod'
# $2: (optional) Alternate host than 'http://download.zfsonlinux.org' to
# install from. Blank means use default from zfs-release RPM.
function test_install {
repo=$1
host=""
if [ -n "$2" ] ; then
host=$2
fi
args="--disablerepo=zfs --enablerepo=$repo"
# If we supplied an alternate repo URL, and have not already edited
# zfs.repo, then update the repo file.
if [ -n "$host" ] && ! grep -q $host /etc/yum.repos.d/zfs.repo ; then
sudo sed -i "s;baseurl=http://download.zfsonlinux.org;baseurl=$host;g" /etc/yum.repos.d/zfs.repo
fi
baseurl=$(grep -A 5 "\[$repo\]" /etc/yum.repos.d/zfs.repo | awk -F'=' '/baseurl=/{print $2; exit}')
# Just do a version lookup - don't try to install any RPMs
if [ "$LOOKUP" == "1" ] ; then
package="$(dnf list $args zfs | tail -n 1 | awk '{print $2}')"
echo "$repo ${package} $baseurl" >> $SUMMARY
return
fi
if ! sudo dnf -y install $args zfs zfs-test ; then
echo "$repo ${package}...[FAILED] $baseurl" >> $SUMMARY
return
fi
# Load modules and create a simple pool as a sanity test.
sudo /usr/share/zfs/zfs.sh -r
truncate -s 100M /tmp/file
sudo zpool create tank /tmp/file
sudo zpool status
# Print out repo name, rpm installed (kmod or dkms), and repo URL
package=$(sudo rpm -qa | grep zfs | grep -E 'kmod|dkms')
echo "$repo $package $baseurl" >> $SUMMARY
sudo zpool destroy tank
sudo rm /tmp/file
sudo dnf -y remove zfs
}
echo "##[group]Installing from repo"
# The openzfs docs are the authoritative instructions for the install. Use
# the specific version of zfs-release RPM it recommends.
case $OS in
almalinux*)
url='https://raw.githubusercontent.com/openzfs/openzfs-docs/refs/heads/master/docs/Getting%20Started/RHEL-based%20distro/index.rst'
name=$(curl -Ls $url | grep 'dnf install' | grep -Eo 'zfs-release-[0-9]+-[0-9]+')
sudo dnf -y install https://zfsonlinux.org/epel/$name$(rpm --eval "%{dist}").noarch.rpm 2>&1
sudo rpm -qi zfs-release
for i in zfs zfs-kmod zfs-testing zfs-testing-kmod zfs-latest \
zfs-latest-kmod zfs-legacy zfs-legacy-kmod zfs-2.2 \
zfs-2.2-kmod zfs-2.3 zfs-2.3-kmod zfs-2.4 zfs-2.4-kmod; do
test_install $i $ALTHOST
done
;;
fedora*)
url='https://raw.githubusercontent.com/openzfs/openzfs-docs/refs/heads/master/docs/Getting%20Started/Fedora/index.rst'
name=$(curl -Ls $url | grep 'dnf install' | grep -Eo 'zfs-release-[0-9]+-[0-9]+')
sudo dnf -y install -y https://zfsonlinux.org/fedora/$name$(rpm --eval "%{dist}").noarch.rpm
for i in zfs zfs-latest zfs-legacy zfs-2.2 zfs-2.3 zfs-2.4 ; do
test_install $i $ALTHOST
done
;;
esac
echo "##[endgroup]"
# Write out a simple version of the summary here. Later on we will collate all
# the summaries and put them into a nice table in the workflow Summary page.
echo "Summary: "
cat $SUMMARY

10
.github/workflows/scripts/qemu-wait-for-vm.sh vendored Executable file
View File

@ -0,0 +1,10 @@
#!/bin/bash
#
# Wait for a VM to boot up and become active. This is used in a number of our
# scripts.
#
# $1: VM hostname or IP address
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@$1 "uname -a" && break
done

View File

@ -0,0 +1,32 @@
#!/bin/bash
#
# Recursively go though a directory structure and replace duplicate files with
# symlinks. This cuts down our RPM repo size by ~25%.
#
# replace-dupes-with-symlinks.sh [DIR]
#
# DIR: Directory to traverse. Defaults to current directory if not specified.
#
src="$1"
if [ -z "$src" ] ; then
src="."
fi
declare -A db
pushd "$src"
while read line ; do
bn="$(basename $line)"
if [ -z "${db[$bn]}" ] ; then
# First time this file has been seen
db[$bn]="$line"
else
if diff -b "$line" "${db[$bn]}" &>/dev/null ; then
# Files are the same, make a symlink
rm "$line"
ln -sr "${db[$bn]}" "$line"
fi
fi
done <<< "$(find . -type f)"
popd

52
.github/workflows/smatch.yml vendored Normal file
View File

@ -0,0 +1,52 @@
name: smatch
on:
push:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
smatch:
runs-on: ubuntu-24.04
steps:
- name: Checkout smatch
uses: actions/checkout@v4
with:
repository: error27/smatch
ref: master
path: smatch
- name: Install smatch dependencies
run: |
sudo apt-get install -y llvm gcc make sqlite3 libsqlite3-dev libdbd-sqlite3-perl libssl-dev libtry-tiny-perl
- name: Make smatch
run: |
cd $GITHUB_WORKSPACE/smatch
make -j$(nproc)
- name: Checkout OpenZFS
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
path: zfs
- name: Install OpenZFS dependencies
run: |
cd $GITHUB_WORKSPACE/zfs
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps-vm.sh ubuntu24
- name: Autogen.sh OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
./autogen.sh
- name: Configure OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
./configure --enable-debug
- name: Make OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
make -j$(nproc) CHECK="$GITHUB_WORKSPACE/smatch/smatch" CC=$GITHUB_WORKSPACE/smatch/cgcc | tee $GITHUB_WORKSPACE/smatch.log
- name: Smatch results log
run: |
grep -E 'error:|warn:|warning:' $GITHUB_WORKSPACE/smatch.log

40
.github/workflows/zfs-arm.yml vendored Normal file
View File

@ -0,0 +1,40 @@
name: zfs-arm
on:
push:
pull_request:
workflow_dispatch:
jobs:
zfs-arm:
name: ZFS ARM build
runs-on: ubuntu-24.04-arm
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
timeout-minutes: 20
run: |
sudo apt-get -y remove firefox || true
.github/workflows/scripts/qemu-3-deps-vm.sh ubuntu24
# We're running the VM scripts locally on the runner, so need to fix
# up hostnames to make it work.
for ((i=0; i<=3; i++)); do
echo "127.0.0.1 vm$i" | sudo tee -a /etc/hosts
done
- name: Build modules
timeout-minutes: 30
run: |
.github/workflows/scripts/qemu-4-build-vm.sh --enable-debug ubuntu24
# Quick sanity test since we're not running the full ZTS
sudo modprobe zfs
sudo dmesg | grep -i zfs
truncate -s 100M file
sudo zpool create tank ./file
zpool status
echo "Built ZFS successfully on ARM"

157
.github/workflows/zfs-qemu-packages.yml vendored Normal file
View File

@ -0,0 +1,157 @@
# This workflow is used to build and test RPM packages. It is a
# 'workflow_dispatch' workflow, which means it gets run manually.
#
# The workflow has a dropdown menu with two options:
#
# Build RPMs - Build release RPMs and tarballs and put them into an artifact
# ZIP file. The directory structure used in the ZIP file mirrors
# the ZFS yum repo.
#
# Test repo - Test install the ZFS RPMs from the ZFS repo. On EL distos, this
# will do a DKMS and KMOD test install from both the regular and
# testing repos. On Fedora, it will do a DKMS install from the
# regular repo. All test install results will be displayed in the
# Summary page. Note that the workflow provides an optional text
# text box where you can specify the full URL to an alternate repo.
# If left blank, it will install from the default repo from the
# zfs-release RPM (http://download.zfsonlinux.org).
#
# Most users will never need to use this workflow. It will be used primary by
# ZFS admins for building and testing releases.
#
name: zfs-qemu-packages
on:
workflow_dispatch:
inputs:
test_type:
type: choice
required: false
default: "Build RPMs"
description: "Build RPMs or test the repo?"
options:
- "Build RPMs"
- "Test repo"
patch_level:
type: string
required: false
default: ""
description: "(optional) patch level number"
repo_url:
type: string
required: false
default: ""
description: "(optional) repo URL (blank: use http://download.zfsonlinux.org)"
lookup:
type: boolean
required: false
default: false
description: "(optional) do version lookup only on repo test"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
zfs-qemu-packages-jobs:
name: qemu-VMs
strategy:
fail-fast: false
matrix:
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora42', 'fedora43']
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
run: .github/workflows/scripts/qemu-1-setup.sh
- name: Start build machine
run: .github/workflows/scripts/qemu-2-start.sh ${{ matrix.os }}
- name: Install dependencies
run: |
.github/workflows/scripts/qemu-3-deps.sh ${{ matrix.os }}
- name: Build modules or Test repo
run: |
set -e
if [ "${{ github.event.inputs.test_type }}" == "Test repo" ] ; then
# Bring VM back up and copy over zfs source
.github/workflows/scripts/qemu-prepare-for-build.sh
mkdir -p /tmp/repo
EXTRA=""
if [ "${{ github.event.inputs.lookup }}" == 'true' ] ; then
EXTRA="--lookup"
fi
ssh zfs@vm0 '$HOME/zfs/.github/workflows/scripts/qemu-test-repo-vm.sh' $EXTRA ${{ github.event.inputs.repo_url }}
else
EXTRA=""
if [ -n "${{ github.event.inputs.patch_level }}" ] ; then
EXTRA="--patch-level ${{ github.event.inputs.patch_level }}"
fi
.github/workflows/scripts/qemu-4-build.sh $EXTRA \
--repo --release --dkms --tarball ${{ matrix.os }}
fi
- name: Prepare artifacts
if: always()
run: |
rsync -a zfs@vm0:/tmp/repo /tmp || true
.github/workflows/scripts/replace-dupes-with-symlinks.sh /tmp/repo
tar -cf ${{ matrix.os }}-repo.tar -C /tmp repo
- uses: actions/upload-artifact@v4
id: artifact-upload
if: always()
with:
name: ${{ matrix.os }}-repo
path: ${{ matrix.os }}-repo.tar
compression-level: 0
retention-days: 2
if-no-files-found: ignore
combine_repos:
if: always()
needs: [zfs-qemu-packages-jobs]
name: "Results"
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v4
id: artifact-download
if: always()
- name: Test Summary
if: always()
run: |
for i in $(find . -type f -iname "*.tar") ; do
tar -xf $i -C /tmp
done
tar -cf all-repo.tar -C /tmp repo
# If we're installing from a repo, print out the summary of the versions
# that got installed using Markdown.
if [ "${{ github.event.inputs.test_type }}" == "Test repo" ] ; then
cd /tmp/repo
for i in $(ls *.txt) ; do
nicename="$(echo $i | sed 's/.txt//g; s/-/ /g')"
echo "### $nicename" >> $GITHUB_STEP_SUMMARY
echo "|repo|RPM|URL|" >> $GITHUB_STEP_SUMMARY
echo "|:---|:---|:---|" >> $GITHUB_STEP_SUMMARY
awk '{print "|"$1"|"$2"|"$3"|"}' $i >> $GITHUB_STEP_SUMMARY
done
fi
- uses: actions/upload-artifact@v4
id: artifact-upload2
if: always()
with:
name: all-repo
path: all-repo.tar
compression-level: 0
retention-days: 5
if-no-files-found: ignore

View File

@ -3,6 +3,18 @@ name: zfs-qemu
on:
push:
pull_request:
workflow_dispatch:
inputs:
fedora_kernel_ver:
type: string
required: false
default: ""
description: "(optional) Experimental kernel version to install on Fedora (like '6.14' or '6.13.3-0.rc3')"
specific_os:
type: string
required: false
default: ""
description: "(optional) Only run on this specific OS (like 'fedora42' or 'alpine3-23')"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@ -22,23 +34,59 @@ jobs:
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "centos-stream9", "debian11", "debian12", "fedora39", "fedora40", "freebsd13", "freebsd13r", "freebsd14", "freebsd14r", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora40", "freebsd13", "freebsd14", "ubuntu24"]'
ci_type="default"
ci_source="auto"
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
head=${{ github.event.pull_request.head.sha }}
base=${{ github.event.pull_request.base.sha }}
ci_type=$(python3 .github/workflows/scripts/generate-ci-type.py $head $base)
read ci_type ci_source <<< "$(python3 .github/workflows/scripts/generate-ci-type.py $head $base)"
fi
if [ "$ci_type" == "quick" ]; then
os_selection="$QUICK_OS"
case "$ci_type" in
quick)
os_selection='["almalinux8", "almalinux9", "almalinux10", "debian12", "fedora42", "freebsd15-0s", "ubuntu24"]'
;;
linux)
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian11", "debian12", "debian13", "fedora42", "fedora43", "ubuntu22", "ubuntu24"]'
;;
freebsd)
os_selection='["freebsd13-5r", "freebsd14-4r", "freebsd13-5s", "freebsd14-4s", "freebsd15-0s", "freebsd16-0c"]'
;;
*)
# default list
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian12", "debian13", "fedora42", "fedora43", "freebsd14-4r", "freebsd15-0s", "freebsd16-0c", "ubuntu22", "ubuntu24"]'
;;
esac
# Repository-level override for OS selection.
# Set vars.ZTS_OS_OVERRIDE in repo settings to restrict targets
# (e.g. '["debian13"]' or '["debian13", "fedora42"]').
# Manual ZFS-CI-Type in commit messages bypasses the override.
if [ -n "${{ vars.ZTS_OS_OVERRIDE }}" ] && [ "$ci_source" != "manual" ]; then
override='${{ vars.ZTS_OS_OVERRIDE }}'
if echo "$override" | jq -e 'type == "array"' >/dev/null 2>&1; then
os_selection="$override"
else
os_selection="$FULL_OS"
echo "::warning::Invalid ZTS_OS_OVERRIDE, using default"
fi
fi
if ${{ github.event.inputs.fedora_kernel_ver != '' }}; then
# They specified a custom kernel version for Fedora.
# Use only Fedora runners.
os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]')
elif ${{ github.event.inputs.specific_os != '' }}; then
# Use only the specified runner.
os_json=$(jq -cn --arg os "${{ github.event.inputs.specific_os }}" '[ $os ]')
else
# Normal case
os_json=$(echo ${os_selection} | jq -c)
echo "os=$os_json" >> $GITHUB_OUTPUT
echo "ci_type=$ci_type" >> $GITHUB_OUTPUT
fi
echo "os=$os_json" | tee -a $GITHUB_OUTPUT
echo "ci_type=$ci_type" | tee -a $GITHUB_OUTPUT
qemu-vm:
name: qemu-x86
@ -46,10 +94,13 @@ jobs:
strategy:
fail-fast: false
matrix:
# all:
# os: [almalinux8, almalinux9, archlinux, centos-stream9, fedora39, fedora40, debian11, debian12, freebsd13, freebsd13r, freebsd14, freebsd14r, freebsd15, ubuntu20, ubuntu22, ubuntu24]
# openzfs:
# os: [almalinux8, almalinux9, centos-stream9, debian11, debian12, fedora39, fedora40, freebsd13, freebsd13r, freebsd14, freebsd14r, ubuntu20, ubuntu22, ubuntu24]
# rhl: almalinux8, almalinux9, centos-streamX, fedora4x
# debian: debian12, debian13, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD variants of november 2025:
# FreeBSD Release: freebsd13-5r, freebsd14-4r, freebsd15-0r
# FreeBSD Stable: freebsd13-5s, freebsd14-4s, freebsd15-0s
# FreeBSD Current: freebsd16-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04
steps:
@ -58,7 +109,7 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
timeout-minutes: 10
timeout-minutes: 60
run: .github/workflows/scripts/qemu-1-setup.sh
- name: Start build machine
@ -66,32 +117,12 @@ jobs:
run: .github/workflows/scripts/qemu-2-start.sh ${{ matrix.os }}
- name: Install dependencies
timeout-minutes: 20
run: |
echo "Install dependencies in QEMU machine"
IP=192.168.122.10
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@$IP "uname -a" && break
done
scp .github/workflows/scripts/qemu-3-deps.sh zfs@$IP:qemu-3-deps.sh
PID=`pidof /usr/bin/qemu-system-x86_64`
ssh zfs@$IP '$HOME/qemu-3-deps.sh' ${{ matrix.os }}
# wait for poweroff to succeed
tail --pid=$PID -f /dev/null
sleep 5 # avoid this: "error: Domain is already active"
rm -f $HOME/.ssh/known_hosts
timeout-minutes: 60
run: .github/workflows/scripts/qemu-3-deps.sh --poweroff ${{ matrix.os }} ${{ github.event.inputs.fedora_kernel_ver }}
- name: Build modules
timeout-minutes: 30
run: |
echo "Build modules in QEMU machine"
sudo virsh start openzfs
IP=192.168.122.10
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@$IP "uname -a" && break
done
rsync -ar $HOME/work/zfs/zfs zfs@$IP:./
ssh zfs@$IP '$HOME/zfs/.github/workflows/scripts/qemu-4-build.sh' ${{ matrix.os }}
run: .github/workflows/scripts/qemu-4-build.sh --poweroff --enable-debug ${{ matrix.os }}
- name: Setup testing machines
timeout-minutes: 5

View File

@ -12,7 +12,8 @@ jobs:
zloop:
runs-on: ubuntu-24.04
env:
TEST_DIR: /var/tmp/zloop
WORK_DIR: /mnt/zloop
CORE_DIR: /mnt/zloop/cores
steps:
- uses: actions/checkout@v4
with:
@ -20,7 +21,7 @@ jobs:
- name: Install dependencies
run: |
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps.sh ubuntu24
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps-vm.sh ubuntu24
- name: Autogen.sh
run: |
sed -i '/DEBUG_CFLAGS="-Werror"/s/^/#/' config/zfs-build.m4
@ -40,38 +41,37 @@ jobs:
sudo modprobe zfs
- name: Tests
run: |
sudo mkdir -p $TEST_DIR
# run for 10 minutes or at most 6 iterations for a maximum runner
# time of 60 minutes.
sudo /usr/share/zfs/zloop.sh -t 600 -I 6 -l -m 1 -- -T 120 -P 60
sudo truncate -s 256G /mnt/vdev
sudo zpool create cipool -m $WORK_DIR -O compression=on -o autotrim=on /mnt/vdev
sudo /usr/share/zfs/zloop.sh -t 600 -I 6 -l -m 1 -c $CORE_DIR -f $WORK_DIR -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R $TEST_DIR/
sudo chmod +r -R $WORK_DIR/
- name: Ztest log
if: failure()
run: |
grep -B10 -A1000 'ASSERT' $TEST_DIR/*/ztest.out || tail -n 1000 $TEST_DIR/*/ztest.out
grep -B10 -A1000 'ASSERT' $CORE_DIR/*/ztest.out || tail -n 1000 $CORE_DIR/*/ztest.out
- name: Gdb log
if: failure()
run: |
sed -n '/Backtraces (full)/q;p' $TEST_DIR/*/ztest.gdb
sed -n '/Backtraces (full)/q;p' $CORE_DIR/*/ztest.gdb
- name: Zdb log
if: failure()
run: |
cat $TEST_DIR/*/ztest.zdb
cat $CORE_DIR/*/ztest.zdb
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Logs
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
/mnt/zloop/*/
!/mnt/zloop/cores/*/vdev/
if-no-files-found: ignore
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Pool files
path: |
/var/tmp/zloop/*/vdev/
/mnt/zloop/cores/*/vdev/
if-no-files-found: ignore

View File

@ -23,6 +23,7 @@
# These maps are making names consistent where they have varied but the email
# address has never changed. In most cases, the full name is in the
# Signed-off-by of a commit with a matching author.
Achill Gilgenast <achill@achill.org>
Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Alex John <alex@stty.io>
@ -37,6 +38,7 @@ Crag Wang <crag0715@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Daniel Kolesa <daniel@octaforge.org>
Debabrata Banerjee <dbavatar@gmail.com>
Diwakar Kristappagari <diwakar-k@hpe.com>
Finix Yan <yanchongwen@hotmail.com>
Gaurav Kumar <gauravk.18@gmail.com>
Gionatan Danti <g.danti@assyoma.it>
@ -51,6 +53,8 @@ Jason Harmening <jason.harmening@gmail.com>
Jeremy Faulkner <gldisater@gmail.com>
Jinshan Xiong <jinshan.xiong@gmail.com>
John Poduska <jpoduska@datto.com>
Joseph Holsten <joseph@josephholsten.com>
Jo Zzsi <jozzsicsataban@gmail.com>
Justin Scholz <git@justinscholz.de>
Ka Ho Ng <khng300@gmail.com>
Kash Pande <github@tripleback.net>
@ -65,11 +69,16 @@ Michael Gmelin <grembo@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
Piotr Kubaj <pkubaj@anongoth.pl>
Quentin Zdanis <zdanisq@gmail.com>
Roberto Ricci <io@r-ricci.it>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Rob Norris <rob.norris@truenas.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
SHENGYI HONG <aokblast@FreeBSD.org>
Sivesh Kumar <siveshjami@gmail.com>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
@ -77,8 +86,21 @@ Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Alexander Moch <mail@alexmoch.com> <amoch@ernw.de>
Alexander Moch <mail@alexmoch.com> <github@alexanderjulian.de>
Alexander Ziaee <ziaee@FreeBSD.org> <concussious@runbox.com>
delan azabani <dazabani@igalia.com> <delan@azabani.com>
Felix Schmidt <felixschmidt20@aol.com> <f.sch.prototype@gmail.com>
George Shammas <george@shamm.as> <georgyo@gmail.com>
Jean-Sébastien Pédron <dumbbell@FreeBSD.org> <jean-sebastien.pedron@dumbbell.fr>
Konstantin Belousov <kib@FreeBSD.org> <kib@kib.kiev.ua>
Olivier Certner <olce@FreeBSD.org> <olce.freebsd@certner.fr>
Patrick Xia <patrickx@google.com> <octalc0de@aim.com>
Phil Sutter <phil@nwl.cc> <p.github@nwl.cc>
poscat <poscat@poscat.moe> <poscat0x04@outlook.com>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>
@ -93,8 +115,10 @@ Ned Bass <bass6@llnl.gov> <bass6@zeno1.(none)>
Tulsi Jain <tulsi.jain@delphix.com> <tulsi.jain@Tulsi-Jains-MacBook-Pro.local>
# Mappings from Github no-reply addresses
Adi Gollamudi <adigollamudi@gmail.com> <68113680+Adi-Goll@users.noreply.github.com>
ajs124 <git@ajs124.de> <ajs124@users.noreply.github.com>
Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Aleksandr Liber <aleksandr.liber@perforce.com> <61714074+AleksandrLiber@users.noreply.github.com>
Alexander Lobakin <alobakin@pm.me> <solbjorn@users.noreply.github.com>
Alexey Smirnoff <fling@member.fsf.org> <fling-@users.noreply.github.com>
Allen Holl <allen.m.holl@gmail.com> <65494904+allen-4@users.noreply.github.com>
@ -109,11 +133,13 @@ bernie1995 <bernie.pikes@gmail.com> <42413912+bernie1995@users.noreply.github.co
Bojan Novković <bnovkov@FreeBSD.org> <72801811+bnovkov@users.noreply.github.com>
Boris Protopopov <boris.protopopov@actifio.com> <bprotopopov@users.noreply.github.com>
Brad Forschinger <github@bnjf.id.au> <bnjf@users.noreply.github.com>
Brad Spengler <94915855+bspengler-oss@users.noreply.github.com>>
Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
Charles Suh <charles.suh@gmail.com> <charlessuh@users.noreply.github.com>
Chris Peredun <chris.peredun@ixsystems.com> <126915832+chrisperedun@users.noreply.github.com>
classabbyamp <dev@placeviolette.net> <5366828+classabbyamp@users.noreply.github.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com> <35844628+dacianstremtan@users.noreply.github.com>
Damian Szuberski <szuberskidamian@gmail.com> <30863496+szubersk@users.noreply.github.com>
Daniel Hiepler <d-git@coderdu.de> <32984777+heeplr@users.noreply.github.com>
@ -121,6 +147,7 @@ Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dennis Vestergaard Værum <github@varum.dk> <6872940+dvaerum@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>
@ -131,10 +158,12 @@ Fedor Uporov <fuporov.vstack@gmail.com> <60701163+fuporovvStack@users.noreply.gi
Felix Dörre <felix@dogcraft.de> <felixdoerre@users.noreply.github.com>
Felix Neumärker <xdch47@posteo.de> <34678034+xdch47@users.noreply.github.com>
Finix Yan <yancw@info2soft.com> <Finix1979@users.noreply.github.com>
Friedrich Weber <f.weber@proxmox.com> <56110206+frwbr@users.noreply.github.com>
Gaurav Kumar <gauravk.18@gmail.com> <gaurkuma@users.noreply.github.com>
George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com>
Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com>
Gerardwx <gerardw@alum.mit.edu> <Gerardwx@users.noreply.github.com>
Germano Massullo <germano.massullo@gmail.com> <Germano0@users.noreply.github.com>
Gian-Carlo DeFazio <defazio1@llnl.gov> <defaziogiancarlo@users.noreply.github.com>
Giuseppe Di Natale <dinatale2@llnl.gov> <dinatale2@users.noreply.github.com>
Hajo Möller <dasjoe@gmail.com> <dasjoe@users.noreply.github.com>
@ -154,6 +183,7 @@ John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kaitlin Hoang <kthoang@amazon.com> <khoang98@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>
@ -171,6 +201,7 @@ Michael Niewöhner <foss@mniewoehner.de> <c0d3z3r0@users.noreply.github.com>
Michael Zhivich <mzhivich@akamai.com> <33133421+mzhivich@users.noreply.github.com>
MigeljanImeri <ImeriMigel@gmail.com> <78048439+MigeljanImeri@users.noreply.github.com>
Mo Zhou <cdluminate@gmail.com> <5723047+cdluminate@users.noreply.github.com>
nav1s <nav1s@proton.me> <42621369+nav1s@users.noreply.github.com>
Nick Mattis <nickm970@gmail.com> <nmattis@users.noreply.github.com>
omni <omni+vagant@hack.org> <79493359+omnivagant@users.noreply.github.com>
Pablo Correa Gómez <ablocorrea@hotmail.com> <32678034+pablofsf@users.noreply.github.com>
@ -192,6 +223,7 @@ Samuel Wycliffe <samuelwycliffe@gmail.com> <50765275+npc203@users.noreply.github
Savyasachee Jha <hi@savyasacheejha.com> <savyajha@users.noreply.github.com>
Scott Colby <scott@scolby.com> <scolby33@users.noreply.github.com>
Sean Eric Fagan <kithrup@mac.com> <kithrup@users.noreply.github.com>
Shreshth Srivastava <shreshthsrivastava2@gmail.com> <66148173+Shreshth3@users.noreply.github.com>
Spencer Kinny <spencerkinny1995@gmail.com> <30333052+Spencer-Kinny@users.noreply.github.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> <75025422+nssrikanth@users.noreply.github.com>
Stefan Lendl <s.lendl@proxmox.com> <1321542+stfl@users.noreply.github.com>
@ -205,6 +237,7 @@ Torsten Wörtwein <twoertwein@gmail.com> <twoertwein@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com> <TulsiJain@users.noreply.github.com>
Václav Skála <skala@vshosting.cz> <33496485+vaclavskala@users.noreply.github.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com> <88050553+vaibhav-delphix@users.noreply.github.com>
Vandana Rungta <vrungta@amazon.com> <46906819+vandanarungta@users.noreply.github.com>
Violet Purcell <vimproved@inventati.org> <66446404+vimproved@users.noreply.github.com>
Vipin Kumar Verma <vipin.verma@hpe.com> <75025470+vermavipinkumar@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com> <Blub@users.noreply.github.com>

69
AUTHORS
View File

@ -10,9 +10,11 @@ PAST MAINTAINERS:
CONTRIBUTORS:
Aaron Fineman <abyxcos@gmail.com>
Achill Gilgenast <achill@achill.org>
Adam D. Moss <c@yotes.com>
Adam Leventhal <ahl@delphix.com>
Adam Stevko <adam.stevko@gmail.com>
Adi Gollamudi <adigollamudi@gmail.com>
adisbladis <adis@blad.is>
Adrian Chadd <adrian@freebsd.org>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
@ -29,13 +31,17 @@ CONTRIBUTORS:
Alejandro Colomar <Colomar.6.4.3@GMail.com>
Alejandro R. Sedeño <asedeno@mit.edu>
Alek Pinchuk <alek@nexenta.com>
Aleksandr Liber <aleksandr.liber@perforce.com>
Aleksa Sarai <cyphar@cyphar.com>
Alex <simplecodemaster@gmail.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Lobakin <alobakin@pm.me>
Alexander Moch <mail@alexmoch.com>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>
Alexander Stetsenko <ams@nexenta.com>
Alexander Ziaee <ziaee@FreeBSD.org>
Alex Braunegg <alex.braunegg@gmail.com>
Alexey Shvetsov <alexxy@gentoo.org>
Alexey Smirnoff <fling@member.fsf.org>
@ -43,6 +49,7 @@ CONTRIBUTORS:
Alex McWhirter <alexmcwhirter@triadic.us>
Alex Reece <alex@delphix.com>
Alex Wilson <alex.wilson@joyent.com>
Alexx Saver <lzsaver.eth@ethermail.io>
Alex Zhuravlev <alexey.zhuravlev@intel.com>
Allan Jude <allanjude@freebsd.org>
Allen Holl <allen.m.holl@gmail.com>
@ -57,6 +64,7 @@ CONTRIBUTORS:
Andreas Buschmann <andreas.buschmann@tech.net.de>
Andreas Dilger <adilger@intel.com>
Andreas Vögele <andreas@andreasvoegele.com>
Andres <a-d-j-i@users.noreply.github.com>
Andrew Barnes <barnes333@gmail.com>
Andrew Hamilton <ahamilto@tjhsst.edu>
Andrew Innes <andrew.c12@gmail.com>
@ -70,6 +78,7 @@ CONTRIBUTORS:
Andrey Prokopenko <job@terem.fr>
Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Andriy Gapon <avg@freebsd.org>
Andriy Tkachuk <andriy.tkachuk@seagate.com>
Andy Bakun <github@thwartedefforts.org>
Andy Fiddaman <omnios@citrus-it.co.uk>
Aniruddha Shankar <k@191a.net>
@ -80,9 +89,11 @@ CONTRIBUTORS:
Arne Jansen <arne@die-jansens.de>
Aron Xu <happyaron.xu@gmail.com>
Arshad Hussain <arshad.hussain@aeoncomputing.com>
Artem <artem.vlasenko@ossrevival.org>
Arun KV <arun.kv@datacore.com>
Arvind Sankar <nivedita@alum.mit.edu>
Attila Fülöp <attila@fueloep.org>
Austin Wise <AustinWise@gmail.com>
Avatat <kontakt@avatat.pl>
Bart Coddens <bart.coddens@gmail.com>
Basil Crow <basil.crow@delphix.com>
@ -104,6 +115,7 @@ CONTRIBUTORS:
Boris Protopopov <boris.protopopov@nexenta.com>
Brad Forschinger <github@bnjf.id.au>
Brad Lewis <brad.lewis@delphix.com>
Brad Spengler <bspengler-oss@users.noreply.github.com>
Brandon Thetford <brandon@dodecatec.com>
Brian Atkinson <bwa@g.clemson.edu>
Brian Behlendorf <behlendorf1@llnl.gov>
@ -117,6 +129,7 @@ CONTRIBUTORS:
Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn>
Carl George <carlwgeorge@gmail.com>
Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com>
Cedric Maunoury <cedric.maunoury@gmail.com>
@ -147,6 +160,7 @@ CONTRIBUTORS:
Chris Zubrzycki <github@mid-earth.net>
Chuck Tuffli <ctuffli@gmail.com>
Chunwei Chen <david.chen@nutanix.com>
classabbyamp <dev@placeviolette.net>
Clemens Fruhwirth <clemens@endorphin.org>
Clemens Lang <cl@clang.name>
Clint Armstrong <clint@clintarmstrong.net>
@ -154,6 +168,7 @@ CONTRIBUTORS:
Colin Ian King <colin.king@canonical.com>
Colin Percival <cperciva@tarsnap.com>
Colm Buckley <colm@tuatha.org>
Cong Zhang <congzhangzh@users.noreply.github.com>
Crag Wang <crag0715@gmail.com>
Craig Loomis <cloomis@astro.princeton.edu>
Craig Sanders <github@taz.net.au>
@ -187,7 +202,9 @@ CONTRIBUTORS:
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
delan azabani <dazabani@igalia.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Dennis Vestergaard Værum <github@varum.dk>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Derek Schrock <dereks@lifeofadishwasher.com>
@ -197,6 +214,7 @@ CONTRIBUTORS:
Dimitri John Ledkov <xnox@ubuntu.com>
Dimitry Andric <dimitry@andric.com>
Dirkjan Bussink <d.bussink@gmail.com>
Diwakar Kristappagari <diwakar-k@hpe.com>
Dmitry Khasanov <pik4ez@gmail.com>
Dominic Pearson <dsp@technoanimal.net>
Dominik Hassler <hadfl@omniosce.org>
@ -209,9 +227,11 @@ CONTRIBUTORS:
Eitan Adler <lists@eitanadler.com>
Eli Rosenthal <eli.rosenthal@delphix.com>
Eli Schwartz <eschwartz93@gmail.com>
Eric A. Borisch <eborisch@gmail.com>
Eric Desrochers <eric.desrochers@canonical.com>
Eric Dillmann <eric@jave.fr>
Eric Schrock <Eric.Schrock@delphix.com>
Erik Larsson <catacombae@gmail.com>
Ethan Coe-Renner <coerenner1@llnl.gov>
Etienne Dechamps <etienne@edechamps.fr>
Evan Allrich <eallrich@gmail.com>
@ -226,10 +246,12 @@ CONTRIBUTORS:
Fedor Uporov <fuporov.vstack@gmail.com>
Felix Dörre <felix@dogcraft.de>
Felix Neumärker <xdch47@posteo.de>
Felix Schmidt <felixschmidt20@aol.com>
Feng Sun <loyou85@gmail.com>
Finix Yan <yancw@info2soft.com>
Francesco Mazzoli <f@mazzo.li>
Frederik Wessels <wessels147@gmail.com>
Friedrich Weber <f.weber@proxmox.com>
Frédéric Vanniere <f.vanniere@planet-work.com>
Gabriel A. Devenyi <gdevenyi@gmail.com>
Garrett D'Amore <garrett@nexenta.com>
@ -242,9 +264,11 @@ CONTRIBUTORS:
George Diamantopoulos <georgediam@gmail.com>
George Gaydarov <git@gg7.io>
George Melikov <mail@gmelikov.ru>
George Shammas <george@shamm.as>
George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net>
Gerardwx <gerardw@alum.mit.edu>
Germano Massullo <germano.massullo@gmail.com>
Gian-Carlo DeFazio <defazio1@llnl.gov>
Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com>
@ -277,17 +301,21 @@ CONTRIBUTORS:
Henrik Riomar <henrik.riomar@gmail.com>
Herb Wartens <wartens2@llnl.gov>
Hiếu Lê <leorize+oss@disroot.org>
hoshinomori <hoshinomorimorimo@gmail.com>
Huang Liu <liu.huang@zte.com.cn>
Håkan Johansson <f96hajo@chalmers.se>
Igor K <igor@dilos.org>
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
Igor Ostapenko <pm@igoro.pro>
ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com>
ilovezfs <ilovezfs@icloud.com>
InsanePrawn <Insane.Prawny@gmail.com>
Isaac Huang <he.huang@intel.com>
Ivan Shapovalov <intelfx@intelfx.name>
Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Jacek Fefliński <feflik@gmail.com>
Jacob Adams <tookmund@gmail.com>
Jake Howard <git@theorangeone.net>
@ -295,6 +323,7 @@ CONTRIBUTORS:
James H <james@kagisoft.co.uk>
James Lee <jlee@thestaticvoid.com>
James Pan <jiaming.pan@yahoo.com>
James Reilly <jreilly1821@gmail.com>
James Wah <james@laird-wah.net>
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
@ -306,20 +335,25 @@ CONTRIBUTORS:
Jason Lee <jasonlee@lanl.gov>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jaydeep Kshirsagar <jkshirsagar@maxlinear.com>
Jean-Baptiste Lallement <jean-baptiste@ubuntu.com>
Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Jeff Dike <jdike@akamai.com>
Jeremy Faulkner <gldisater@gmail.com>
Jeremy Gill <jgill@parallax-innovations.com>
Jeremy Jones <jeremy@delphix.com>
Jeremy Visser <jeremy.visser@gmail.com>
Jerry Jelinek <jerry.jelinek@joyent.com>
Jerzy Kołosowski <jerzy@kolosowscy.pl>
Jessica Clarke <jrtc27@jrtc27.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Jitendra Patidar <jitendra.patidar@nutanix.com>
JK Dingwall <james@dingwall.me.uk>
Joel Low <joel@joelsplace.sg>
Joe Stein <joe.stein@delphix.com>
John-Mark Gurney <jmg@funkthat.com>
John Albietz <inthecloud247@gmail.com>
John Cabaj <john.cabaj@canonical.com>
John Eismeier <john.eismeier@gmail.com>
John Gallagher <john.gallagher@delphix.com>
John Layman <jlayman@sagecloud.com>
@ -335,10 +369,13 @@ CONTRIBUTORS:
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Joseph Holsten <joseph@josephholsten.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Jo Zzsi <jozzsicsataban@gmail.com>
João Carlos Mendes Luís <jonny@jonny.eng.br>
JT Pennington <jt.pennington@klarasystems.com>
Julian Brunner <julian.brunner@gmail.com>
Julian Heuking <JulianH@beckhoff.com>
jumbi77 <jumbi77@users.noreply.github.com>
@ -365,13 +402,16 @@ CONTRIBUTORS:
Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net>
khoang98 <khoang98@users.noreply.github.com>
KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kleber Tarcísio <klebertarcisio@yahoo.com.br>
Kody A Kantor <kody.kantor@gmail.com>
Kohsuke Kawaguchi <kk@kohsuke.org>
Konstantin Belousov <kib@FreeBSD.org>
Konstantin Khorenko <khorenko@virtuozzo.com>
KORN Andras <korn@elan.rulez.org>
kotauskas <v.toncharov@gmail.com>
Kristof Provost <github@sigsegv.be>
Krzysztof Piecuch <piecuch@kpiecuch.pl>
Kyle Blatter <kyleblatter@llnl.gov>
@ -397,6 +437,7 @@ CONTRIBUTORS:
luozhengzheng <luo.zhengzheng@zte.com.cn>
Luís Henriques <henrix@camandro.org>
Madhav Suresh <madhav.suresh@delphix.com>
Maksym Shkolnyi <maksym.shkolnyi@workato.com>
manfromafar <jonsonb10@gmail.com>
Manoj Joseph <manoj.joseph@delphix.com>
Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
@ -423,6 +464,7 @@ CONTRIBUTORS:
Mathieu Velten <matmaul@gmail.com>
Matt Fiddaman <github@m.fiddaman.uk>
Matthew Ahrens <matt@delphix.com>
Matthew Heller <matthew.f.heller@gmail.com>
Matthew Thode <mthode@mthode.org>
Matthias Blankertz <matthias@blankertz.org>
Matt Johnston <matt@fugro-fsi.com.au>
@ -436,6 +478,7 @@ CONTRIBUTORS:
Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com>
megari <megari@iki.fi>
Meriel Luna Mittelbach <lunarlambda@gmail.com>
Michael D Labriola <michael.d.labriola@gmail.com>
Michael Franzl <michael@franzl.name>
Michael Gebetsroither <michael@mgeb.org>
@ -451,6 +494,7 @@ CONTRIBUTORS:
Mike Swanson <mikeonthecomputer@gmail.com>
Milan Jurik <milan.jurik@xylab.cz>
Minsoo Choo <minsoochoo0122@proton.me>
mnrx <mnrx@users.noreply.github.com>
Mohamed Tawfik <m_tawfik@aucegypt.edu>
Morgan Jones <mjones@rice.edu>
Moritz Maxeiner <moritz@ucworks.org>
@ -460,6 +504,7 @@ CONTRIBUTORS:
Nathaniel Clark <Nathaniel.Clark@misrule.us>
Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Nathan Lewis <linux.robotdude@gmail.com>
nav1s <nav1s@proton.me>
Nav Ravindranath <nav@delphix.com>
Neal Gompa (ニール・ゴンパ) <ngompa13@gmail.com>
Ned Bass <bass6@llnl.gov>
@ -476,13 +521,15 @@ CONTRIBUTORS:
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Certner <olce@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
Pablo Correa Gómez <ablocorrea@hotmail.com>
Palash Gandhi <pbg4930@rit.edu>
Patrick Fasano <patrick@patrickfasano.com>
Patrick Mooney <pmooney@pfmooney.com>
Patrick Xia <patrickx@google.com>
Patrik Greco <sikevux@sikevux.se>
Paul B. Henson <henson@acm.org>
Paul Dagnelie <pcd@delphix.com>
@ -493,6 +540,7 @@ CONTRIBUTORS:
Pawel Jakub Dawidek <pjd@FreeBSD.org>
Pedro Giffuni <pfg@freebsd.org>
Peng <peng.hse@xtaotech.com>
Peng Liu <littlenewton6@gmail.com>
Peter Ashford <ashford@accs.com>
Peter Dave Hello <hsu@peterdavehello.org>
Peter Doherty <peterd@acranox.org>
@ -502,15 +550,18 @@ CONTRIBUTORS:
Philip Pokorny <ppokorny@penguincomputing.com>
Philipp Riederer <pt@philipptoelke.de>
Phil Kauffman <philip@kauffman.me>
Phil Sutter <phil@nwl.cc>
Ping Huang <huangping@smartx.com>
Piotr Kubaj <pkubaj@anongoth.pl>
Piotr P. Stefaniak <pstef@freebsd.org>
poscat <poscat@poscat.moe>
Prakash Surya <prakash.surya@delphix.com>
Prasad Joshi <prasadjoshi124@gmail.com>
privb0x23 <privb0x23@users.noreply.github.com>
P.SCH <p88@yahoo.com>
Qiuhao Chen <chenqiuhao1997@gmail.com>
Quartz <yyhran@163.com>
Quentin Thébault <quentin.thebault@defenso.fr>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
@ -519,6 +570,7 @@ CONTRIBUTORS:
Remy Blank <remy.blank@pobox.com>
renelson <bnelson@nelsonbe.com>
Reno Reckling <e-github@wthack.de>
René Wirnata <rene.wirnata@pandascience.net>
Ricardo M. Correia <ricardo.correia@oracle.com>
Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com>
@ -562,6 +614,8 @@ CONTRIBUTORS:
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastian Pauka <me@spauka.se>
Sebastian Wuerl <s.wuerl@mailbox.org>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
@ -573,9 +627,14 @@ CONTRIBUTORS:
Shaun Tancheff <shaun@aeonazure.com>
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
SHENGYI HONG <aokblast@FreeBSD.org>
Shen Yan <shenyanxxxy@qq.com>
Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Sietse <sietse@wizdom.nu>
Simon Guest <simon.guest@tesujimath.org>
Simon Howard <fraggle@soulsphere.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sivesh Kumar <siveshjami@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
@ -596,6 +655,7 @@ CONTRIBUTORS:
Stéphane Lesimple <speed47_github@speed47.net>
Suman Chakravartula <schakrava@gmail.com>
Sydney Vanda <sydney.m.vanda@intel.com>
Syed Shahrukh Hussain <syed.shahrukh@ossrevival.org>
Sören Tempel <soeren+git@soeren-tempel.net>
Tamas TEVESZ <ice@extreme.hu>
Teodor Spæren <teodor_spaeren@riseup.net>
@ -613,9 +673,12 @@ CONTRIBUTORS:
timor <timor.dd@googlemail.com>
Timothy Day <tday141@gmail.com>
Tim Schumacher <timschumi@gmx.de>
Tim Smith <tim@mondoo.com>
Tino Reichardt <milky-zfs@mcmilk.de>
tleydxdy <shironeko.github@tesaguri.club>
Tobin Harding <me@tobin.cc>
Todd Seidelmann <seidelma@users.noreply.github.com>
Todd Zullinger <tmz@pobox.com>
Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
@ -628,7 +691,9 @@ CONTRIBUTORS:
Toyam Cox <aviator45003@gmail.com>
Trevor Bautista <trevrb@trevrb.net>
Trey Dockendorf <treydock@gmail.com>
trick2011 <trick2011@users.noreply.github.com>
Troels Nørgaard <tnn@tradeshift.com>
tstabrawa <tstabrawa@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
@ -636,6 +701,7 @@ CONTRIBUTORS:
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Valmiky Arquissandas <kayvlim@gmail.com>
Val Packett <val@packett.cool>
Vandana Rungta <vrungta@amazon.com>
Vince van Oosten <techhazard@codeforyouand.me>
Violet Purcell <vimproved@inventati.org>
Vipin Kumar Verma <vipin.verma@hpe.com>
@ -651,6 +717,7 @@ CONTRIBUTORS:
Windel Bouwman <windel@windel.nl>
Wojciech Małota-Wójcik <outofforest@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com>
Wolfgang Hoschek <wolfgang.hoschek@mac.com>
XDTG <click1799@163.com>
Xin Li <delphij@FreeBSD.org>
Xinliang Liu <xinliang.liu@linaro.org>

6
META
View File

@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.0
Release: rc1
Version: 2.4.99
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.11
Linux-Maximum: 6.19
Linux-Minimum: 4.18

View File

@ -1,6 +1,8 @@
# SPDX-License-Identifier: CDDL-1.0
CLEANFILES =
dist_noinst_DATA =
INSTALL_DATA_HOOKS =
INSTALL_EXEC_HOOKS =
ALL_LOCAL =
CLEAN_LOCAL =
CHECKS = shellcheck checkbashisms
@ -71,6 +73,9 @@ all: gitrev
PHONY += install-data-hook $(INSTALL_DATA_HOOKS)
install-data-hook: $(INSTALL_DATA_HOOKS)
PHONY += install-exec-hook $(INSTALL_EXEC_HOOKS)
install-exec-hook: $(INSTALL_EXEC_HOOKS)
PHONY += maintainer-clean-local
maintainer-clean-local:
-$(RM) $(GITREV)
@ -112,6 +117,10 @@ commitcheck:
${top_srcdir}/scripts/commitcheck.sh; \
fi
CHECKS += spdxcheck
spdxcheck:
$(AM_V_at)$(top_srcdir)/scripts/spdxcheck.pl
if HAVE_PARALLEL
cstyle_line = -print0 | parallel -X0 ${top_srcdir}/scripts/cstyle.pl -cpP {}
else
@ -124,6 +133,7 @@ cstyle:
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -name 'zstd_compat_wrapper.h' \
! -path './module/zstd/zstd-in.c' \
! -path './module/zstd/lib/*' \
! -path './include/sys/lua/*' \
! -path './module/lua/l*.[ch]' \

View File

@ -10,7 +10,7 @@ This repository contains the code for running OpenZFS on Linux and FreeBSD.
# Official Resources
* [Documentation](https://openzfs.github.io/openzfs-docs/) - for using and developing this repo
* [ZoL Site](https://zfsonlinux.org) - Linux release info & links
* [ZoL site](https://zfsonlinux.org) - Linux release info & links
* [Mailing lists](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
* [OpenZFS site](https://openzfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
@ -30,6 +30,42 @@ We have a [Code of Conduct](./CODE_OF_CONDUCT.md).
OpenZFS is released under a CDDL license.
For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 13.0-RELEASE.
# Supported Kernels and Distributions
## Linux
Given the wide variety of Linux environments, we prioritize development and testing on stable, supported kernels and distributions.
### Kernel ([kernel.org](https://kernel.org))
All **longterm** kernels from [kernel.org](https://kernel.org) are supported. **stable** kernels are usually supported in the next OpenZFS release.
**Supported longterm kernels**: **6.18**, **6.12**, **6.6**, **6.1**, **5.15**, **5.10**.
### Red Hat Enterprise Linux (RHEL)
All RHEL (and compatible systems: AlmaLinux OS, Rocky Linux, etc) on the **full** or **maintenance** support tracks are supported.
**Supported RHEL releases**: **8.10**, **9.7**, **10.1**.
### Ubuntu
All Ubuntu **LTS** releases are supported.
**Supported Ubuntu releases**: **24.04 “Noble”**, **22.04 “Jammy”**.
### Debian
All Debian **stable** and **LTS** releases are supported.
**Supported Debian releases**: **13 “Trixie”**, **12 “Bookworm”**, **11 “Bullseye”**.
### Other Distributions
Generally, if a distribution is following an LTS kernel, it should work well with OpenZFS.
## FreeBSD
All FreeBSD releases receiving [security support](https://www.freebsd.org/security/#sup) are supported by OpenZFS.
**Supported FreeBSD releases**: **15.0**, **14.4**, **13.5**.

View File

@ -28,7 +28,7 @@ Two release branches are maintained for OpenZFS, they are:
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
The current LTS release is OpenZFS 2.2.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases

View File

@ -1,62 +1,4 @@
#!/bin/sh
[ "${0%/*}" = "$0" ] || cd "${0%/*}" || exit
# SPDX-License-Identifier: CDDL-1.0
# %reldir%/%canon_reldir% (%D%/%C%) only appeared in automake 1.14, but RHEL/CentOS 7 has 1.13.4
# This is an (overly) simplistic preprocessor that papers around this for the duration of the generation step,
# and can be removed once support for CentOS 7 is dropped
automake --version | awk '{print $NF; exit}' | (
IFS=. read -r AM_MAJ AM_MIN _
[ "$AM_MAJ" -gt 1 ] || [ "$AM_MIN" -ge 14 ]
) || {
process_root() {
root="$1"; shift
grep -q '%[CD]%' "$root/Makefile.am" || return
find "$root" -name Makefile.am "$@" | while read -r dir; do
dir="${dir%/Makefile.am}"
grep -q '%[CD]%' "$dir/Makefile.am" || continue
reldir="${dir#"$root"}"
reldir="${reldir#/}"
canon_reldir="$(printf '%s' "$reldir" | tr -C 'a-zA-Z0-9@_' '_')"
reldir_slash="$reldir/"
canon_reldir_slash="${canon_reldir}_"
[ -z "$reldir" ] && reldir_slash=
[ -z "$reldir" ] && canon_reldir_slash=
echo "$dir/Makefile.am" >&3
sed -i~ -e "s:%D%/:$reldir_slash:g" -e "s:%D%:$reldir:g" \
-e "s:%C%_:$canon_reldir_slash:g" -e "s:%C%:$canon_reldir:g" "$dir/Makefile.am"
done 3>>"$substituted_files"
}
rollback() {
while read -r f; do
mv "$f~" "$f"
done < "$substituted_files"
rm -f "$substituted_files"
}
echo "Automake <1.14; papering over missing %reldir%/%canon_reldir% support" >&2
substituted_files="$(mktemp)"
trap rollback EXIT
roots="$(sed '/Makefile$/!d;/module/d;s:^\s*:./:;s:/Makefile::;/^\.$/d' configure.ac)"
IFS="
"
for root in $roots; do
root="${root#./}"
process_root "$root"
done
set -f
# shellcheck disable=SC2086,SC2046
process_root . $(printf '!\n-path\n%s/*\n' $roots)
}
autoreconf -fiv && rm -rf autom4te.cache
autoreconf -fiv "$(dirname "$0")" && rm -rf "$(dirname "$0")"/autom4te.cache

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
bin_SCRIPTS =
bin_PROGRAMS =
sbin_SCRIPTS =
@ -35,8 +36,8 @@ zhack_SOURCES = \
zhack_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
libnvpair.la \
librange_tree.la
ztest_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
ztest_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
@ -98,17 +99,16 @@ endif
if USING_PYTHON
bin_SCRIPTS += arc_summary arcstat dbufstat zilstat
CLEANFILES += arc_summary arcstat dbufstat zilstat
dist_noinst_DATA += %D%/arc_summary %D%/arcstat.in %D%/dbufstat.in %D%/zilstat.in
bin_SCRIPTS += zarcsummary zarcstat dbufstat zilstat
CLEANFILES += zarcsummary zarcstat dbufstat zilstat
dist_noinst_DATA += %D%/zarcsummary %D%/zarcstat.in %D%/dbufstat.in %D%/zilstat.in
$(call SUBST,arcstat,%D%/)
$(call SUBST,zarcstat,%D%/)
$(call SUBST,dbufstat,%D%/)
$(call SUBST,zilstat,%D%/)
arc_summary: %D%/arc_summary
zarcsummary: %D%/zarcsummary
$(AM_V_at)cp $< $@
endif
PHONY += cmd
cmd: $(bin_SCRIPTS) $(bin_PROGRAMS) $(sbin_SCRIPTS) $(sbin_PROGRAMS) $(dist_bin_SCRIPTS) $(zfsexec_PROGRAMS) $(mounthelper_PROGRAMS)

View File

@ -1,4 +1,5 @@
#!/usr/bin/env @PYTHON_SHEBANG@
# SPDX-License-Identifier: CDDL-1.0
#
# Print out statistics for all cached dmu buffers. This information
# is available through the dbufs kstat and may be post-processed as

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
raidz_test_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
raidz_test_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -32,6 +33,7 @@
#include <sys/vdev_raidz_impl.h>
#include <assert.h>
#include <stdio.h>
#include <libzpool.h>
#include "raidz_test.h"
static int *rand_data;
@ -263,9 +265,21 @@ cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
static int
init_rand(void *data, size_t size, void *private)
{
size_t *offsetp = (size_t *)private;
size_t offset = *offsetp;
VERIFY3U(offset + size, <=, SPA_MAXBLOCKSIZE);
memcpy(data, (char *)rand_data + offset, size);
*offsetp = offset + size;
return (0);
}
static int
corrupt_rand_fill(void *data, size_t size, void *private)
{
(void) private;
memcpy(data, rand_data, size);
memset(data, 0xAA, size);
return (0);
}
@ -277,7 +291,7 @@ corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
corrupt_rand_fill, NULL);
}
}
}
@ -285,7 +299,8 @@ corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
void
init_zio_abd(zio_t *zio)
{
abd_iterate_func(zio->io_abd, 0, zio->io_size, init_rand, NULL);
size_t offset = 0;
abd_iterate_func(zio->io_abd, 0, zio->io_size, init_rand, &offset);
}
static void
@ -372,7 +387,7 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
*zio = umem_zalloc(sizeof (zio_t), UMEM_NOFAIL);
(*zio)->io_offset = 0;
(*zio)->io_offset = opts->rto_offset;
(*zio)->io_size = alloc_dsize;
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
@ -833,6 +848,8 @@ main(int argc, char **argv)
err = run_test(NULL);
}
mprotect(rand_data, SPA_MAXBLOCKSIZE, PROT_READ | PROT_WRITE);
umem_free(rand_data, SPA_MAXBLOCKSIZE);
kernel_fini();

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -71,7 +72,7 @@ typedef struct raidz_test_opts {
static const raidz_test_opts_t rto_opts_defaults = {
.rto_ashift = 9,
.rto_offset = 1ULL << 0,
.rto_offset = 0,
.rto_dcols = 8,
.rto_dsize = 1<<19,
.rto_v = D_ALL,

View File

@ -1,7 +1,8 @@
#!/usr/bin/env @PYTHON_SHEBANG@
# SPDX-License-Identifier: CDDL-1.0
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arcstat -v
# For a definition of fields, or usage, use zarcstat -v
#
# This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on
@ -55,6 +56,7 @@ import time
import getopt
import re
import copy
import os
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
@ -170,7 +172,7 @@ cols = {
"zactive": [7, 1000, "zfetch prefetches active per second"],
}
# ARC structural breakdown from arc_summary
# ARC structural breakdown from zarcsummary
structfields = {
"cmp": ["compressed", "Compressed"],
"ovh": ["overhead", "Overhead"],
@ -186,7 +188,7 @@ structstats = { # size stats
"sz": ["_size", "size"],
}
# ARC types breakdown from arc_summary
# ARC types breakdown from zarcsummary
typefields = {
"data": ["data", "ARC data"],
"meta": ["metadata", "ARC metadata"],
@ -197,7 +199,7 @@ typestats = { # size stats
"sz": ["_size", "size"],
}
# ARC states breakdown from arc_summary
# ARC states breakdown from zarcsummary
statefields = {
"ano": ["anon", "Anonymous"],
"mfu": ["mfu", "MFU"],
@ -260,7 +262,7 @@ hdr_intr = 20 # Print header every 20 lines of output
opfile = None
sep = " " # Default separator is 2 spaces
l2exist = False
cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
cmd = ("Usage: zarcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
@ -347,10 +349,10 @@ def usage():
"character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\tzarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tzarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tzarcstat -v\n")
sys.stderr.write("\tzarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n")
sys.exit(1)
@ -365,7 +367,7 @@ def snap_stats():
cur = kstat
# fill in additional values from arc_summary
# fill in additional values from zarcsummary
cur["caches_size"] = caches_size = cur["anon_data"]+cur["anon_metadata"]+\
cur["mfu_data"]+cur["mfu_metadata"]+cur["mru_data"]+cur["mru_metadata"]+\
cur["uncached_data"]+cur["uncached_metadata"]
@ -734,13 +736,14 @@ def calculate():
v[group["percent"]] if v[group["percent"]] > 0 else 0
if l2exist:
l2asize = cur["l2_asize"]
v["l2hits"] = d["l2_hits"] / sint
v["l2miss"] = d["l2_misses"] / sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] / v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2asize"] = l2asize
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2wbytes"] = d["l2_write_bytes"] / sint
@ -750,11 +753,11 @@ def calculate():
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["l2pref%"] = 100 * v["l2pref"] / l2asize if l2asize > 0 else 0
v["l2mfu%"] = 100 * v["l2mfu"] / l2asize if l2asize > 0 else 0
v["l2mru%"] = 100 * v["l2mru"] / l2asize if l2asize > 0 else 0
v["l2data%"] = 100 * v["l2data"] / l2asize if l2asize > 0 else 0
v["l2meta%"] = 100 * v["l2meta"] / l2asize if l2asize > 0 else 0
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
@ -764,6 +767,7 @@ def calculate():
def main():
global sint
global count
global hdr_intr

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: BSD-2-Clause
#
# Copyright (c) 2008 Ben Rockwood <benr@cuddletech.com>,
# Copyright (c) 2010 Martin Matuska <mm@FreeBSD.org>,
@ -33,7 +34,7 @@ Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to arc_summary can be found at
The original introduction to zarcsummary can be found at
http://cuddletech.com/?p=454
"""
@ -160,7 +161,7 @@ elif sys.platform.startswith('linux'):
return get_params(TUNABLES_PATH)
def get_version_impl(request):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# The original zarcsummary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
try:
@ -438,7 +439,7 @@ def print_header():
"""
# datetime is now recommended over time but we keep the exact formatting
# from the older version of arc_summary in case there are scripts
# from the older version of zarcsummary in case there are scripts
# that expect it in this way
daydate = time.strftime(DATE_FORMAT)
spc_date = LINE_LENGTH-len(daydate)
@ -558,6 +559,7 @@ def section_arc(kstats_dict):
print()
compressed_size = arc_stats['compressed_size']
uncompressed_size = arc_stats['uncompressed_size']
overhead_size = arc_stats['overhead_size']
bonus_size = arc_stats['bonus_size']
dnode_size = arc_stats['dnode_size']
@ -662,10 +664,7 @@ def section_arc(kstats_dict):
print()
print('ARC hash breakdown:')
prt_i1('Elements max:', f_hits(arc_stats['hash_elements_max']))
prt_i2('Elements current:',
f_perc(arc_stats['hash_elements'], arc_stats['hash_elements_max']),
f_hits(arc_stats['hash_elements']))
prt_i1('Elements:', f_hits(arc_stats['hash_elements']))
prt_i1('Collisions:', f_hits(arc_stats['hash_collisions']))
prt_i1('Chain max:', f_hits(arc_stats['hash_chain_max']))
@ -673,6 +672,8 @@ def section_arc(kstats_dict):
print()
print('ARC misc:')
prt_i2('Uncompressed size:', f_perc(uncompressed_size, compressed_size),
f_bytes(uncompressed_size))
prt_i1('Memory throttles:', arc_stats['memory_throttle_count'])
prt_i1('Memory direct reclaims:', arc_stats['memory_direct_count'])
prt_i1('Memory indirect reclaims:', arc_stats['memory_indirect_count'])

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zdb_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
zdb_CFLAGS = $(AM_CFLAGS) $(LIBCRYPTO_CFLAGS)
@ -13,6 +14,8 @@ zdb_LDADD = \
libzdb.la \
libzpool.la \
libzfs_core.la \
libnvpair.la
libnvpair.la \
libbtree.la \
librange_tree.la
zdb_LDADD += $(LIBCRYPTO_LIBS)

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -28,6 +29,6 @@
#define _ZDB_H
void dump_intent_log(zilog_t *);
extern uint8_t dump_opt[256];
extern uint8_t dump_opt[512];
#endif /* _ZDB_H */

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -47,8 +48,6 @@
#include "zdb.h"
extern uint8_t dump_opt[256];
static char tab_prefix[4] = "\t\t\t";
static void
@ -67,19 +66,19 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
const lr_create_t *lrc = arg;
const _lr_create_t *lr = &lrc->lr_create;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
const char *name, *link;
lr_attr_t *lrattr;
name = (char *)(lr + 1);
name = (const char *)&lrc->lr_data[0];
if (lr->lr_common.lrc_txtype == TX_CREATE_ATTR ||
lr->lr_common.lrc_txtype == TX_MKDIR_ATTR) {
lrattr = (lr_attr_t *)(lr + 1);
lrattr = (lr_attr_t *)&lrc->lr_data[0];
name += ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
}
if (txtype == TX_SYMLINK) {
link = name + strlen(name) + 1;
link = (const char *)&lrc->lr_data[strlen(name) + 1];
(void) printf("%s%s -> %s\n", tab_prefix, name, link);
} else if (txtype != TX_MKXATTR) {
(void) printf("%s%s\n", tab_prefix, name);
@ -104,7 +103,7 @@ zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
(u_longlong_t)lr->lr_doid, (const char *)&lr->lr_data[0]);
}
static void
@ -115,7 +114,7 @@ zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1));
(const char *)&lr->lr_data[0]);
}
static void
@ -124,8 +123,8 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_rename_t *lrr = arg;
const _lr_rename_t *lr = &lrr->lr_rename;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
const char *snm = (const char *)&lrr->lr_data[0];
const char *tnm = (const char *)&lrr->lr_data[strlen(snm) + 1];
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
@ -175,7 +174,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
!BP_IS_HOLE(bp) && BP_GET_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
@ -188,7 +187,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%s<hole>\n", tab_prefix);
return;
}
if (BP_GET_LOGICAL_BIRTH(bp) < zilog->zl_header->zh_claim_txg) {
if (BP_GET_BIRTH(bp) < zilog->zl_header->zh_claim_txg) {
(void) printf("%s<block already committed>\n",
tab_prefix);
return;
@ -211,7 +210,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
abd_copy_from_buf(data, &lr->lr_data[0], lr->lr_length);
}
(void) printf("%s", tab_prefix);
@ -239,7 +238,7 @@ zil_prt_rec_write_enc(zilog_t *zilog, int txtype, const void *arg)
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
!BP_IS_HOLE(bp) && BP_GET_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
@ -309,7 +308,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) zilog, (void) txtype;
const lr_setsaxattr_t *lr = arg;
char *name = (char *)(lr + 1);
const char *name = (const char *)&lr->lr_data[0];
(void) printf("%sfoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid);
@ -318,7 +317,7 @@ zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
(void) printf("%sXAT_VALUE NULL\n", tab_prefix);
} else {
(void) printf("%sXAT_VALUE ", tab_prefix);
char *val = name + (strlen(name) + 1);
const char *val = (const char *)&lr->lr_data[strlen(name) + 1];
for (int i = 0; i < lr->lr_size; i++) {
(void) printf("%c", *val);
val++;
@ -475,7 +474,7 @@ print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
if (claim_txg != 0)
claim = "already claimed";
else if (BP_GET_LOGICAL_BIRTH(bp) >= spa_min_claim_txg(zilog->zl_spa))
else if (BP_GET_BIRTH(bp) >= spa_min_claim_txg(zilog->zl_spa))
claim = "will claim";
else
claim = "won't claim";

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
include $(srcdir)/%D%/zed.d/Makefile.am
zed_CFLAGS = $(AM_CFLAGS)
@ -37,8 +38,7 @@ zed_SOURCES = \
zed_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
libnvpair.la
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -133,11 +134,13 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
* of blkid cache and L2ARC VDEV does not contain pool guid in its
* blkid, so this is a special case for L2ARC VDEV.
*/
else if (gsp->gs_vdev_guid != 0 && gsp->gs_devid == NULL &&
else if (gsp->gs_vdev_guid != 0 &&
nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &vdev_guid) == 0 &&
gsp->gs_vdev_guid == vdev_guid) {
if (gsp->gs_devid == NULL) {
(void) nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID,
&gsp->gs_devid);
}
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_EXPANSION_TIME,
&gsp->gs_vdev_expandtime);
return (B_TRUE);
@ -155,22 +158,28 @@ zfs_agent_iter_pool(zpool_handle_t *zhp, void *arg)
/*
* For each vdev in this pool, look for a match by devid
*/
if ((config = zpool_get_config(zhp, NULL)) != NULL) {
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvl) == 0) {
(void) zfs_agent_iter_vdev(zhp, nvl, gsp);
}
}
/*
* if a match was found then grab the pool guid
*/
if (gsp->gs_vdev_guid && gsp->gs_devid) {
(void) nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&gsp->gs_pool_guid);
}
boolean_t found = B_FALSE;
uint64_t pool_guid;
/* Get pool configuration and extract pool GUID */
if ((config = zpool_get_config(zhp, NULL)) == NULL ||
nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&pool_guid) != 0)
goto out;
/* Skip this pool if we're looking for a specific pool */
if (gsp->gs_pool_guid != 0 && pool_guid != gsp->gs_pool_guid)
goto out;
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &nvl) == 0)
found = zfs_agent_iter_vdev(zhp, nvl, gsp);
if (found && gsp->gs_pool_guid == 0)
gsp->gs_pool_guid = pool_guid;
out:
zpool_close(zhp);
return (gsp->gs_devid != NULL && gsp->gs_vdev_guid != 0);
return (found);
}
void
@ -232,11 +241,9 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
if (devid == NULL || pool_guid == 0 || vdev_guid == 0) {
if (devid == NULL)
search.gs_vdev_guid = vdev_guid;
else
search.gs_devid = devid;
search.gs_vdev_guid = vdev_guid;
search.gs_pool_guid = pool_guid;
zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
if (devid == NULL)
devid = search.gs_devid;
@ -245,7 +252,6 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
/*
* We want to avoid reporting "remove" events coming from

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -28,7 +29,6 @@
#include <stddef.h>
#include <string.h>
#include <libuutil.h>
#include <libzfs.h>
#include <sys/types.h>
#include <sys/time.h>
@ -95,7 +95,7 @@ typedef struct zfs_case {
uint32_t zc_version;
zfs_case_data_t zc_data;
fmd_case_t *zc_case;
uu_list_node_t zc_node;
list_node_t zc_node;
id_t zc_remove_timer;
char *zc_fru;
er_timeval_t zc_when;
@ -125,8 +125,7 @@ zfs_de_stats_t zfs_stats = {
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
static list_t zfs_cases;
#define ZFS_MAKE_RSRC(type) \
FM_RSRC_CLASS "." ZFS_ERROR_CLASS "." type
@ -173,8 +172,8 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
zcp->zc_remove_timer = fmd_timer_install(hdl, zcp,
NULL, zfs_remove_timeout);
uu_list_node_init(zcp, &zcp->zc_node, zfs_case_pool);
(void) uu_list_insert_before(zfs_cases, NULL, zcp);
list_link_init(&zcp->zc_node);
list_insert_head(&zfs_cases, zcp);
fmd_case_setspecific(hdl, cp, zcp);
@ -205,8 +204,8 @@ zfs_other_serd_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
zfs_case_data_t *zcd = &zcp->zc_data;
/*
@ -256,8 +255,8 @@ zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded)
/*
* Mark any cases associated with this (pool, vdev) pair.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == vdev_guid) {
zcp->zc_present = B_TRUE;
@ -303,8 +302,8 @@ zfs_mark_pool(zpool_handle_t *zhp, void *unused)
/*
* Mark any cases associated with just this pool.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == 0)
zcp->zc_present = B_TRUE;
@ -320,8 +319,8 @@ zfs_mark_pool(zpool_handle_t *zhp, void *unused)
if (nelem == 2) {
loaded.ertv_sec = tod[0];
loaded.ertv_nsec = tod[1];
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == 0) {
zcp->zc_when = loaded;
@ -388,8 +387,7 @@ zpool_find_load_time(zpool_handle_t *zhp, void *arg)
static void
zfs_purge_cases(fmd_hdl_t *hdl)
{
zfs_case_t *zcp;
uu_list_walk_t *walk;
zfs_case_t *zcp, *next;
libzfs_handle_t *zhdl = fmd_hdl_getspecific(hdl);
/*
@ -409,8 +407,8 @@ zfs_purge_cases(fmd_hdl_t *hdl)
/*
* Mark the cases as not present.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp))
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp))
zcp->zc_present = B_FALSE;
/*
@ -424,12 +422,11 @@ zfs_purge_cases(fmd_hdl_t *hdl)
/*
* Remove those cases which were not found.
*/
walk = uu_list_walk_start(zfs_cases, UU_WALK_ROBUST);
while ((zcp = uu_list_walk_next(walk)) != NULL) {
for (zcp = list_head(&zfs_cases); zcp != NULL; zcp = next) {
next = list_next(&zfs_cases, zcp);
if (!zcp->zc_present)
fmd_case_close(hdl, zcp->zc_case);
}
uu_list_walk_end(walk);
}
/*
@ -659,8 +656,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
zfs_ereport_when(hdl, nvl, &er_when);
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid) {
pool_found = B_TRUE;
pool_load = zcp->zc_when;
@ -866,8 +863,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
* Pool level fault. Before solving the case, go through and
* close any open device cases that may be pending.
*/
for (dcp = uu_list_first(zfs_cases); dcp != NULL;
dcp = uu_list_next(zfs_cases, dcp)) {
for (dcp = list_head(&zfs_cases); dcp != NULL;
dcp = list_next(&zfs_cases, dcp)) {
if (dcp->zc_data.zc_pool_guid ==
zcp->zc_data.zc_pool_guid &&
dcp->zc_data.zc_vdev_guid != 0)
@ -1087,8 +1084,7 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
uu_list_remove(zfs_cases, zcp);
uu_list_node_fini(zcp, &zcp->zc_node, zfs_case_pool);
list_remove(&zfs_cases, zcp);
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
@ -1116,23 +1112,11 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
if ((zhdl = libzfs_init()) == NULL)
return;
if ((zfs_case_pool = uu_list_pool_create("zfs_case_pool",
sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node),
NULL, UU_LIST_POOL_DEBUG)) == NULL) {
libzfs_fini(zhdl);
return;
}
if ((zfs_cases = uu_list_create(zfs_case_pool, NULL,
UU_LIST_DEBUG)) == NULL) {
uu_list_pool_destroy(zfs_case_pool);
libzfs_fini(zhdl);
return;
}
list_create(&zfs_cases,
sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node));
if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) {
uu_list_destroy(zfs_cases);
uu_list_pool_destroy(zfs_case_pool);
list_destroy(&zfs_cases);
libzfs_fini(zhdl);
return;
}
@ -1147,24 +1131,18 @@ void
_zfs_diagnosis_fini(fmd_hdl_t *hdl)
{
zfs_case_t *zcp;
uu_list_walk_t *walk;
libzfs_handle_t *zhdl;
/*
* Remove all active cases.
*/
walk = uu_list_walk_start(zfs_cases, UU_WALK_ROBUST);
while ((zcp = uu_list_walk_next(walk)) != NULL) {
while ((zcp = list_remove_head(&zfs_cases)) != NULL) {
fmd_hdl_debug(hdl, "removing case ena %llu",
(long long unsigned)zcp->zc_data.zc_ena);
uu_list_remove(zfs_cases, zcp);
uu_list_node_fini(zcp, &zcp->zc_node, zfs_case_pool);
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
uu_list_walk_end(walk);
uu_list_destroy(zfs_cases);
uu_list_pool_destroy(zfs_case_pool);
list_destroy(&zfs_cases);
zhdl = fmd_hdl_getspecific(hdl);
libzfs_fini(zhdl);

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -81,7 +82,7 @@
#include <sys/sunddi.h>
#include <sys/sysevent/eventdefs.h>
#include <sys/sysevent/dev.h>
#include <thread_pool.h>
#include <sys/taskq.h>
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
@ -97,7 +98,7 @@ typedef void (*zfs_process_func_t)(zpool_handle_t *, nvlist_t *, boolean_t);
libzfs_handle_t *g_zfshdl;
list_t g_pool_list; /* list of unavailable pools at initialization */
list_t g_device_list; /* list of disks with asynchronous label request */
tpool_t *g_tpool;
taskq_t *g_taskq;
boolean_t g_enumeration_done;
pthread_t g_zfs_tid; /* zfs_enum_pools() thread */
@ -214,6 +215,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
vdev_stat_t *vs;
char **lines = NULL;
int lines_cnt = 0;
int rc;
/*
* Get the persistent path, typically under the '/dev/disk/by-id' or
@ -405,17 +407,17 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
nvlist_lookup_string(vdev, "new_devid", &new_devid);
if (is_mpath_wholedisk) {
/* Don't label device mapper or multipath disks. */
zed_log_msg(LOG_INFO,
" it's a multipath wholedisk, don't label");
if (zpool_prepare_disk(zhp, vdev, "autoreplace", &lines,
&lines_cnt) != 0) {
rc = zpool_prepare_disk(zhp, vdev, "autoreplace", &lines,
&lines_cnt);
if (rc != 0) {
zed_log_msg(LOG_INFO,
" zpool_prepare_disk: could not "
"prepare '%s' (%s)", fullpath,
libzfs_error_description(g_zfshdl));
"prepare '%s' (%s), path '%s', rc = %d", fullpath,
libzfs_error_description(g_zfshdl), path, rc);
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
@ -446,12 +448,13 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* If this is a request to label a whole disk, then attempt to
* write out the label.
*/
if (zpool_prepare_and_label_disk(g_zfshdl, zhp, leafname,
vdev, "autoreplace", &lines, &lines_cnt) != 0) {
rc = zpool_prepare_and_label_disk(g_zfshdl, zhp, leafname,
vdev, "autoreplace", &lines, &lines_cnt);
if (rc != 0) {
zed_log_msg(LOG_WARNING,
" zpool_prepare_and_label_disk: could not "
"label '%s' (%s)", leafname,
libzfs_error_description(g_zfshdl));
"label '%s' (%s), rc = %d", leafname,
libzfs_error_description(g_zfshdl), rc);
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
@ -746,8 +749,8 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
continue;
if (zfs_toplevel_state(zhp) >= VDEV_STATE_DEGRADED) {
list_remove(&g_pool_list, pool);
(void) tpool_dispatch(g_tpool, zfs_enable_ds,
pool);
(void) taskq_dispatch(g_taskq, zfs_enable_ds,
pool, TQ_SLEEP);
break;
}
}
@ -1344,9 +1347,9 @@ zfs_slm_fini(void)
/* wait for zfs_enum_pools thread to complete */
(void) pthread_join(g_zfs_tid, NULL);
/* destroy the thread pool */
if (g_tpool != NULL) {
tpool_wait(g_tpool);
tpool_destroy(g_tpool);
if (g_taskq != NULL) {
taskq_wait(g_taskq);
taskq_destroy(g_taskq);
}
while ((pool = list_remove_head(&g_pool_list)) != NULL) {

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -403,6 +404,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
const char *devtype;
char *devname;
boolean_t skip_removal = B_FALSE;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0) {
@ -440,18 +442,28 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
nvlist_lookup_uint64_array(vdev, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c);
if (vs->vs_state == VDEV_STATE_OFFLINE)
return;
/*
* If state removed is requested for already removed vdev,
* its a loopback event from spa_async_remove(). Just
* ignore it.
*/
if (vs->vs_state == VDEV_STATE_REMOVED &&
state == VDEV_STATE_REMOVED)
if ((vs->vs_state == VDEV_STATE_REMOVED &&
state == VDEV_STATE_REMOVED)) {
if (strcmp(class, "resource.fs.zfs.removed") == 0 &&
nvlist_exists(nvl, "by_kernel")) {
skip_removal = B_TRUE;
} else {
return;
}
}
/* Remove the vdev since device is unplugged */
int remove_status = 0;
if (l2arc || (strcmp(class, "resource.fs.zfs.removed") == 0)) {
if (!skip_removal && (l2arc ||
(strcmp(class, "resource.fs.zfs.removed") == 0))) {
remove_status = zpool_vdev_remove_wanted(zhp, devname);
fmd_hdl_debug(hdl, "zpool_vdev_remove_wanted '%s'"
", err:%d", devname, libzfs_errno(zhdl));

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
%D%/zed-functions.sh \
@ -9,18 +10,18 @@ dist_zedexec_SCRIPTS = \
%D%/all-debug.sh \
%D%/all-syslog.sh \
%D%/data-notify.sh \
%D%/deadman-slot_off.sh \
%D%/deadman-sync-slot_off.sh \
%D%/generic-notify.sh \
%D%/pool_import-led.sh \
%D%/pool_import-sync-led.sh \
%D%/resilver_finish-notify.sh \
%D%/resilver_finish-start-scrub.sh \
%D%/scrub_finish-notify.sh \
%D%/statechange-led.sh \
%D%/statechange-sync-led.sh \
%D%/statechange-notify.sh \
%D%/statechange-slot_off.sh \
%D%/statechange-sync-slot_off.sh \
%D%/trim_finish-notify.sh \
%D%/vdev_attach-led.sh \
%D%/vdev_clear-led.sh
%D%/vdev_attach-sync-led.sh \
%D%/vdev_clear-sync-led.sh
nodist_zedexec_SCRIPTS = \
%D%/history_event-zfs-list-cacher.sh
@ -30,17 +31,17 @@ SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
deadman-slot_off.sh \
deadman-sync-slot_off.sh \
history_event-zfs-list-cacher.sh \
pool_import-led.sh \
pool_import-sync-led.sh \
resilver_finish-notify.sh \
resilver_finish-start-scrub.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-sync-led.sh \
statechange-notify.sh \
statechange-slot_off.sh \
vdev_attach-led.sh \
vdev_clear-led.sh
statechange-sync-slot_off.sh \
vdev_attach-sync-led.sh \
vdev_clear-sync-led.sh
dist_noinst_DATA += %D%/README

View File

@ -21,6 +21,7 @@ zed_check_cmd "${ZFS}" sort diff
# We lock the output file to avoid simultaneous writes.
# If we run into trouble, log and drop the lock
# shellcheck disable=SC2329
abort_alter() {
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"

View File

@ -1 +0,0 @@
statechange-led.sh

View File

@ -0,0 +1 @@
statechange-sync-led.sh

View File

@ -1,4 +1,5 @@
#!/bin/sh
# SPDX-License-Identifier: CDDL-1.0
# shellcheck disable=SC2154
#
# CDDL HEADER START

View File

@ -1 +0,0 @@
statechange-led.sh

View File

@ -0,0 +1 @@
statechange-sync-led.sh

View File

@ -1 +0,0 @@
statechange-led.sh

View File

@ -0,0 +1 @@
statechange-sync-led.sh

View File

@ -283,6 +283,11 @@ zed_notify_email()
if [ "${ZED_EMAIL_OPTS%@SUBJECT@*}" = "${ZED_EMAIL_OPTS}" ] ; then
# inject subject header
printf "Subject: %s\n" "${subject}"
# The following empty line is needed to separate the header from the
# body of the message. Otherwise programs like sendmail will skip
# everything up to the first empty line (or wont send an email at
# all) and will still exit with exit code 0
printf "\n"
fi
# output message
cat "${pathname}"
@ -436,8 +441,9 @@ zed_notify_slack_webhook()
"${pathname}")"
# Construct the JSON message for posting.
# shellcheck disable=SC2016
#
msg_json="$(printf '{"text": "*%s*\\n%s"}' "${subject}" "${msg_body}" )"
msg_json="$(printf '{"text": "*%s*\\n```%s```"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -139,7 +140,8 @@ dev_event_nvlist(struct udev_device *dev)
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
if (parent_dev != NULL &&
(value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*
@ -109,7 +110,7 @@ zed_event_fini(struct zed_conf *zcp)
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
int zzlm, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen, orig_qlen;

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*
@ -44,21 +45,10 @@ struct launched_process_node {
static int
_launched_process_node_compare(const void *x1, const void *x2)
{
pid_t p1;
pid_t p2;
const struct launched_process_node *node1 = x1;
const struct launched_process_node *node2 = x2;
assert(x1 != NULL);
assert(x2 != NULL);
p1 = ((const struct launched_process_node *) x1)->pid;
p2 = ((const struct launched_process_node *) x2)->pid;
if (p1 < p2)
return (-1);
else if (p1 == p2)
return (0);
else
return (1);
return (TREE_CMP(node1->pid, node2->pid));
}
static pthread_t _reap_children_tid = (pthread_t)-1;
@ -195,37 +185,29 @@ _nop(int sig)
(void) sig;
}
static void *
_reap_children(void *arg)
static void
wait_for_children(boolean_t do_pause, boolean_t wait)
{
(void) arg;
struct launched_process_node node, *pnode;
pid_t pid;
int status;
struct rusage usage;
struct sigaction sa = {};
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
int status;
struct launched_process_node node, *pnode;
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, WNOHANG, &usage);
pid = wait4(0, &status, wait ? 0 : WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
if (pid == 0 || errno == ECHILD)
if ((pid == 0) || (errno == ECHILD)) {
if (do_pause)
pause();
else if (errno != EINTR)
} else if (errno != EINTR)
zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s",
strerror(errno));
if (!do_pause)
return;
} else {
memset(&node, 0, sizeof (node));
node.pid = pid;
@ -277,6 +259,25 @@ _reap_children(void *arg)
}
}
}
static void *
_reap_children(void *arg)
{
(void) arg;
struct sigaction sa = {};
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
wait_for_children(B_TRUE, B_FALSE);
return (NULL);
}
@ -305,6 +306,45 @@ zed_exec_fini(void)
_reap_children_tid = (pthread_t)-1;
}
/*
* Check if the zedlet name indicates if it is a synchronous zedlet
*
* Synchronous zedlets have a "-sync-" immediately following the event name in
* their zedlet filename, like:
*
* EVENT_NAME-sync-ZEDLETNAME.sh
*
* For example, if you wanted a synchronous statechange script:
*
* statechange-sync-myzedlet.sh
*
* Synchronous zedlets are guaranteed to be the only zedlet running. No other
* zedlets may run in parallel with a synchronous zedlet. A synchronous
* zedlet will wait for all previously spawned zedlets to finish before running.
* Users should be careful to only use synchronous zedlets when needed, since
* they decrease parallelism.
*/
static boolean_t
zedlet_is_sync(const char *zedlet, const char *event)
{
const char *sync_str = "-sync-";
size_t sync_str_len;
size_t zedlet_len;
size_t event_len;
sync_str_len = strlen(sync_str);
zedlet_len = strlen(zedlet);
event_len = strlen(event);
if (event_len + sync_str_len >= zedlet_len)
return (B_FALSE);
if (strncmp(&zedlet[event_len], sync_str, sync_str_len) == 0)
return (B_TRUE);
return (B_FALSE);
}
/*
* Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix.
@ -367,9 +407,28 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
z = zed_strings_next(zcp->zedlets)) {
for (csp = class_strings; *csp; csp++) {
n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n])) {
boolean_t is_sync = zedlet_is_sync(z, *csp);
if (is_sync) {
/*
* Wait for previous zedlets to
* finish
*/
wait_for_children(B_FALSE, B_TRUE);
}
_zed_exec_fork_child(eid, zcp->zedlet_dir,
z, e, zcp->zevent_fd, zcp->do_foreground);
if (is_sync) {
/*
* Wait for sync zedlet we just launched
* to finish.
*/
wait_for_children(B_FALSE, B_TRUE);
}
}
}
}
free(e);

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*
@ -41,26 +42,10 @@ typedef struct zed_strings_node zed_strings_node_t;
static int
_zed_strings_node_compare(const void *x1, const void *x2)
{
const char *s1;
const char *s2;
int rv;
const zed_strings_node_t *n1 = x1;
const zed_strings_node_t *n2 = x2;
assert(x1 != NULL);
assert(x2 != NULL);
s1 = ((const zed_strings_node_t *) x1)->key;
assert(s1 != NULL);
s2 = ((const zed_strings_node_t *) x2)->key;
assert(s2 != NULL);
rv = strcmp(s1, s2);
if (rv < 0)
return (-1);
if (rv > 0)
return (1);
return (0);
return (TREE_ISIGN(strcmp(n1->key, n2->key)));
}
/*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* This file is part of the ZFS Event Daemon (ZED).
*

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
sbin_PROGRAMS += zfs
CPPCHECKTARGETS += zfs
@ -12,8 +13,7 @@ zfs_SOURCES = \
zfs_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
libnvpair.la
zfs_LDADD += $(LTLIBINTL)

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -27,7 +28,6 @@
*/
#include <libintl.h>
#include <libuutil.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
@ -49,14 +49,16 @@
* When finished, we have an AVL tree of ZFS handles. We go through and execute
* the provided callback for each one, passing whatever data the user supplied.
*/
typedef struct callback_data callback_data_t;
typedef struct zfs_node {
zfs_handle_t *zn_handle;
uu_avl_node_t zn_avlnode;
callback_data_t *zn_callback;
avl_node_t zn_avlnode;
} zfs_node_t;
typedef struct callback_data {
uu_avl_t *cb_avl;
struct callback_data {
avl_tree_t cb_avl;
int cb_flags;
zfs_type_t cb_types;
zfs_sort_column_t *cb_sortcol;
@ -64,9 +66,7 @@ typedef struct callback_data {
int cb_depth_limit;
int cb_depth;
uint8_t cb_props_table[ZFS_NUM_PROPS];
} callback_data_t;
uu_avl_pool_t *avl_pool;
};
/*
* Include snaps if they were requested or if this a zfs list where types
@ -98,13 +98,12 @@ zfs_callback(zfs_handle_t *zhp, void *data)
if ((zfs_get_type(zhp) & cb->cb_types) ||
((zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT) && include_snaps)) {
uu_avl_index_t idx;
avl_index_t idx;
zfs_node_t *node = safe_malloc(sizeof (zfs_node_t));
node->zn_handle = zhp;
uu_avl_node_init(node, &node->zn_avlnode, avl_pool);
if (uu_avl_find(cb->cb_avl, node, cb->cb_sortcol,
&idx) == NULL) {
node->zn_callback = cb;
if (avl_find(&cb->cb_avl, node, &idx) == NULL) {
if (cb->cb_proplist) {
if ((*cb->cb_proplist) &&
!(*cb->cb_proplist)->pl_all)
@ -119,7 +118,7 @@ zfs_callback(zfs_handle_t *zhp, void *data)
return (-1);
}
}
uu_avl_insert(cb->cb_avl, node, idx);
avl_insert(&cb->cb_avl, node, idx);
should_close = B_FALSE;
} else {
free(node);
@ -285,7 +284,7 @@ zfs_compare(const void *larg, const void *rarg)
if (rat != NULL)
*rat = '\0';
ret = strcmp(lname, rname);
ret = TREE_ISIGN(strcmp(lname, rname));
if (ret == 0 && (lat != NULL || rat != NULL)) {
/*
* If we're comparing a dataset to one of its snapshots, we
@ -339,11 +338,11 @@ zfs_compare(const void *larg, const void *rarg)
* with snapshots grouped under their parents.
*/
static int
zfs_sort(const void *larg, const void *rarg, void *data)
zfs_sort(const void *larg, const void *rarg)
{
zfs_handle_t *l = ((zfs_node_t *)larg)->zn_handle;
zfs_handle_t *r = ((zfs_node_t *)rarg)->zn_handle;
zfs_sort_column_t *sc = (zfs_sort_column_t *)data;
zfs_sort_column_t *sc = ((zfs_node_t *)larg)->zn_callback->cb_sortcol;
zfs_sort_column_t *psc;
for (psc = sc; psc != NULL; psc = psc->sc_next) {
@ -413,15 +412,13 @@ zfs_sort(const void *larg, const void *rarg, void *data)
return (-1);
if (lstr)
ret = strcmp(lstr, rstr);
else if (lnum < rnum)
ret = -1;
else if (lnum > rnum)
ret = 1;
ret = TREE_ISIGN(strcmp(lstr, rstr));
else
ret = TREE_CMP(lnum, rnum);
if (ret != 0) {
if (psc->sc_reverse == B_TRUE)
ret = (ret < 0) ? 1 : -1;
ret = -ret;
return (ret);
}
}
@ -437,13 +434,6 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
callback_data_t cb = {0};
int ret = 0;
zfs_node_t *node;
uu_avl_walk_t *walk;
avl_pool = uu_avl_pool_create("zfs_pool", sizeof (zfs_node_t),
offsetof(zfs_node_t, zn_avlnode), zfs_sort, UU_DEFAULT);
if (avl_pool == NULL)
nomem();
cb.cb_sortcol = sortcol;
cb.cb_flags = flags;
@ -488,8 +478,8 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
sizeof (cb.cb_props_table));
}
if ((cb.cb_avl = uu_avl_create(avl_pool, NULL, UU_DEFAULT)) == NULL)
nomem();
avl_create(&cb.cb_avl, zfs_sort,
sizeof (zfs_node_t), offsetof(zfs_node_t, zn_avlnode));
if (argc == 0) {
/*
@ -530,25 +520,20 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
* At this point we've got our AVL tree full of zfs handles, so iterate
* over each one and execute the real user callback.
*/
for (node = uu_avl_first(cb.cb_avl); node != NULL;
node = uu_avl_next(cb.cb_avl, node))
for (node = avl_first(&cb.cb_avl); node != NULL;
node = AVL_NEXT(&cb.cb_avl, node))
ret |= callback(node->zn_handle, data);
/*
* Finally, clean up the AVL tree.
*/
if ((walk = uu_avl_walk_start(cb.cb_avl, UU_WALK_ROBUST)) == NULL)
nomem();
while ((node = uu_avl_walk_next(walk)) != NULL) {
uu_avl_remove(cb.cb_avl, node);
void *cookie = NULL;
while ((node = avl_destroy_nodes(&cb.cb_avl, &cookie)) != NULL) {
zfs_close(node->zn_handle);
free(node);
}
uu_avl_walk_end(walk);
uu_avl_destroy(cb.cb_avl);
uu_avl_pool_destroy(avl_pool);
avl_destroy(&cb.cb_avl);
return (ret);
}

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -144,11 +145,11 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
switch (zpc->zpc_op) {
case ZFS_PROJECT_OP_LIST:
(void) printf("%5u %c %s\n", fsx.fsx_projid,
(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) ? 'P' : '-', name);
(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT) ? 'P' : '-', name);
goto out;
case ZFS_PROJECT_OP_CHECK:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
fsx.fsx_xflags & ZFS_PROJINHERIT_FL)
fsx.fsx_xflags & FS_XFLAG_PROJINHERIT)
goto out;
if (!zpc->zpc_newline) {
@ -163,29 +164,30 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
"(%u/%u)\n", name, fsx.fsx_projid,
(uint32_t)zpc->zpc_expected_projid);
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
if (!(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT))
(void) printf("%s - project inherit flag is not set\n",
name);
goto out;
case ZFS_PROJECT_OP_CLEAR:
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) &&
if (!(fsx.fsx_xflags & FS_XFLAG_PROJINHERIT) &&
(zpc->zpc_keep_projid ||
fsx.fsx_projid == ZFS_DEFAULT_PROJID))
goto out;
fsx.fsx_xflags &= ~ZFS_PROJINHERIT_FL;
fsx.fsx_xflags &= ~FS_XFLAG_PROJINHERIT;
if (!zpc->zpc_keep_projid)
fsx.fsx_projid = ZFS_DEFAULT_PROJID;
break;
case ZFS_PROJECT_OP_SET:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
(!zpc->zpc_set_flag || fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
(!zpc->zpc_set_flag ||
fsx.fsx_xflags & FS_XFLAG_PROJINHERIT))
goto out;
fsx.fsx_projid = zpc->zpc_expected_projid;
if (zpc->zpc_set_flag)
fsx.fsx_xflags |= ZFS_PROJINHERIT_FL;
fsx.fsx_xflags |= FS_XFLAG_PROJINHERIT;
break;
default:
ASSERT(0);
@ -193,11 +195,30 @@ zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
}
ret = ioctl(fd, ZFS_IOC_FSSETXATTR, &fsx);
if (ret)
if (ret) {
(void) fprintf(stderr,
gettext("failed to set xattr for %s: %s\n"),
name, strerror(errno));
if (errno == ENOTSUP) {
char *kver = zfs_version_kernel();
/*
* Special case: a module/userspace version mismatch can
* return ENOTSUP due to us fixing the XFLAGs bits in
* #17884. In that case give a hint to the user that
* they should take action to make the versions match.
*/
if (strcmp(kver, ZFS_META_ALIAS) != 0) {
fprintf(stderr,
gettext("Warning: The zfs module version "
"(%s) and userspace\nversion (%s) do not "
"match up. This may be the\ncause of the "
"\"Operation not supported\" error.\n"),
kver, ZFS_META_ALIAS);
}
}
}
out:
close(fd);
return (ret);

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*

View File

@ -1,3 +1,4 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
@ -51,12 +52,16 @@
#include <sys/zio_compress.h>
#include <sys/zfeature.h>
#include <sys/dmu_tx.h>
#include <sys/backtrace.h>
#include <zfeature_common.h>
#include <libzutil.h>
#include <sys/metaslab_impl.h>
#include <libzpool.h>
static importargs_t g_importargs;
static char *g_pool;
static boolean_t g_readonly;
static boolean_t g_dump_dbgmsg;
typedef enum {
ZHACK_REPAIR_OP_UNKNOWN = 0,
@ -68,11 +73,23 @@ static __attribute__((noreturn)) void
usage(void)
{
(void) fprintf(stderr,
"Usage: zhack [-c cachefile] [-d dir] <subcommand> <args> ...\n"
"where <subcommand> <args> is one of the following:\n"
"Usage: zhack [-o tunable] [-c cachefile] [-d dir] [-G] "
"<subcommand> <args> ...\n"
" where <subcommand> <args> is one of the following:\n"
"\n");
(void) fprintf(stderr,
" global options:\n"
" -c <cachefile> reads config from the given cachefile\n"
" -d <dir> directory with vdevs for import\n"
" -o var=value... set global variable to an unsigned "
"32-bit integer\n"
" -G dump zfs_dbgmsg buffer before exiting\n"
"\n"
" action idle <pool> [-f] [-t seconds]\n"
" import the pool for a set time then export it\n"
" -t <seconds> sets the time the pool is imported\n"
"\n"
" feature stat <pool>\n"
" print information about enabled features\n"
" feature enable [-r] [-d desc] <pool> <feature>\n"
@ -92,10 +109,46 @@ usage(void)
" -c repair corrupted label checksums\n"
" -u restore the label on a detached device\n"
"\n"
" <device> : path to vdev\n");
" <device> : path to vdev\n"
"\n"
" metaslab leak <pool>\n"
" apply allocation map from zdb to specified pool\n");
exit(1);
}
static void
dump_debug_buffer(void)
{
ssize_t ret __attribute__((unused));
if (!g_dump_dbgmsg)
return;
/*
* We use write() instead of printf() so that this function
* is safe to call from a signal handler.
*/
ret = write(STDERR_FILENO, "\n", 1);
zfs_dbgmsg_print(STDERR_FILENO, "zhack");
}
static void sig_handler(int signo)
{
struct sigaction action;
libspl_backtrace(STDERR_FILENO);
dump_debug_buffer();
/*
* Restore default action and re-raise signal so SIGSEGV and
* SIGABRT can trigger a core dump.
*/
action.sa_handler = SIG_DFL;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
(void) sigaction(signo, &action, NULL);
raise(signo);
}
static __attribute__((format(printf, 3, 4))) __attribute__((noreturn)) void
fatal(spa_t *spa, const void *tag, const char *fmt, ...)
@ -113,6 +166,8 @@ fatal(spa_t *spa, const void *tag, const char *fmt, ...)
va_end(ap);
(void) fputc('\n', stderr);
dump_debug_buffer();
exit(1);
}
@ -161,9 +216,9 @@ zhack_import(char *target, boolean_t readonly)
props = NULL;
if (readonly) {
VERIFY(nvlist_alloc(&props, NV_UNIQUE_NAME, 0) == 0);
VERIFY(nvlist_add_uint64(props,
zpool_prop_to_name(ZPOOL_PROP_READONLY), 1) == 0);
VERIFY0(nvlist_alloc(&props, NV_UNIQUE_NAME, 0));
VERIFY0(nvlist_add_uint64(props,
zpool_prop_to_name(ZPOOL_PROP_READONLY), 1));
}
zfeature_checks_disable = B_TRUE;
@ -217,8 +272,8 @@ dump_obj(objset_t *os, uint64_t obj, const char *name)
} else {
ASSERT(za->za_integer_length == 1);
char val[1024];
VERIFY(zap_lookup(os, obj, za->za_name,
1, sizeof (val), val) == 0);
VERIFY0(zap_lookup(os, obj, za->za_name,
1, sizeof (val), val));
(void) printf("\t%s = %s\n", za->za_name, val);
}
}
@ -362,10 +417,12 @@ feature_incr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg;
uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount + 1, tx);
spa_history_log_internal(spa, "zhack feature incr", tx,
"name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
}
static void
@ -375,10 +432,12 @@ feature_decr_sync(void *arg, dmu_tx_t *tx)
zfeature_info_t *feature = arg;
uint64_t refcount;
mutex_enter(&spa->spa_feat_stats_lock);
VERIFY0(feature_get_refcount_from_disk(spa, feature, &refcount));
feature_sync(spa, feature, refcount - 1, tx);
spa_history_log_internal(spa, "zhack feature decr", tx,
"name=%s", feature->fi_guid);
mutex_exit(&spa->spa_feat_stats_lock);
}
static void
@ -495,6 +554,262 @@ zhack_do_feature(int argc, char **argv)
return (0);
}
static void
zhack_do_action_idle(int argc, char **argv)
{
spa_t *spa;
char *target, *tmp;
int idle_time = 0;
int c;
optind = 1;
while ((c = getopt(argc, argv, "+t:")) != -1) {
switch (c) {
case 't':
idle_time = strtol(optarg, &tmp, 0);
if (*tmp) {
(void) fprintf(stderr, "error: time must "
"be an integer in seconds: %s\n", tmp);
usage();
}
if (idle_time < 0) {
(void) fprintf(stderr, "error: time must "
"not be negative: %d\n", idle_time);
usage();
}
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc < 1) {
(void) fprintf(stderr, "error: missing pool name\n");
usage();
}
target = argv[0];
zhack_spa_open(target, B_FALSE, FTAG, &spa);
fprintf(stdout, "Imported pool %s, idle for %d seconds\n",
target, idle_time);
sleep(idle_time);
spa_close(spa, FTAG);
}
static int
zhack_do_action(int argc, char **argv)
{
char *subcommand;
argc--;
argv++;
if (argc == 0) {
(void) fprintf(stderr,
"error: no import operation specified\n");
usage();
}
subcommand = argv[0];
if (strcmp(subcommand, "idle") == 0) {
zhack_do_action_idle(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
usage();
}
return (0);
}
static boolean_t
strstarts(const char *a, const char *b)
{
return (strncmp(a, b, strlen(b)) == 0);
}
static void
metaslab_force_alloc(metaslab_t *msp, uint64_t start, uint64_t size,
dmu_tx_t *tx)
{
ASSERT(msp->ms_disabled);
ASSERT(MUTEX_HELD(&msp->ms_lock));
uint64_t txg = dmu_tx_get_txg(tx);
uint64_t off = start;
while (off < start + size) {
uint64_t ostart, osize;
boolean_t found = zfs_range_tree_find_in(msp->ms_allocatable,
off, start + size - off, &ostart, &osize);
if (!found)
break;
zfs_range_tree_remove(msp->ms_allocatable, ostart, osize);
if (zfs_range_tree_is_empty(msp->ms_allocating[txg & TXG_MASK]))
vdev_dirty(msp->ms_group->mg_vd, VDD_METASLAB, msp,
txg);
zfs_range_tree_add(msp->ms_allocating[txg & TXG_MASK], ostart,
osize);
msp->ms_allocating_total += osize;
off = ostart + osize;
}
}
static void
zhack_do_metaslab_leak(int argc, char **argv)
{
int c;
char *target;
spa_t *spa;
optind = 1;
boolean_t force = B_FALSE;
while ((c = getopt(argc, argv, "f")) != -1) {
switch (c) {
case 'f':
force = B_TRUE;
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc < 1) {
(void) fprintf(stderr, "error: missing pool name\n");
usage();
}
target = argv[0];
zhack_spa_open(target, B_FALSE, FTAG, &spa);
spa_config_enter(spa, SCL_VDEV | SCL_ALLOC, FTAG, RW_READER);
char *line = NULL;
size_t cap = 0;
vdev_t *vd = NULL;
metaslab_t *prev = NULL;
dmu_tx_t *tx = NULL;
while (getline(&line, &cap, stdin) > 0) {
if (strstarts(line, "\tvdev ")) {
uint64_t vdev_id, ms_shift;
if (sscanf(line,
"\tvdev %10"PRIu64"\t%*s metaslab shift %4"PRIu64,
&vdev_id, &ms_shift) == 1) {
VERIFY3U(sscanf(line, "\tvdev %"PRIu64
"\t metaslab shift %4"PRIu64,
&vdev_id, &ms_shift), ==, 2);
}
vd = vdev_lookup_top(spa, vdev_id);
if (vd == NULL) {
fprintf(stderr, "error: no such vdev with "
"id %"PRIu64"\n", vdev_id);
break;
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (vd->vdev_ms_shift != ms_shift) {
fprintf(stderr, "error: ms_shift mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_shift,
ms_shift);
break;
}
} else if (strstarts(line, "\tmetaslabs ")) {
uint64_t ms_count;
VERIFY3U(sscanf(line, "\tmetaslabs %"PRIu64, &ms_count),
==, 1);
ASSERT(vd);
if (!force && vd->vdev_ms_count != ms_count) {
fprintf(stderr, "error: ms_count mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_count,
ms_count);
break;
}
} else if (strstarts(line, "ALLOC:")) {
uint64_t start, size;
VERIFY3U(sscanf(line, "ALLOC: %"PRIu64" %"PRIu64"\n",
&start, &size), ==, 2);
ASSERT(vd);
size_t idx;
idx = start >> vd->vdev_ms_shift;
if (idx >= vd->vdev_ms_count)
continue;
metaslab_t *cur = vd->vdev_ms[idx];
if (prev != cur) {
if (prev) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
}
ASSERT(cur);
metaslab_disable(cur);
mutex_enter(&cur->ms_lock);
metaslab_load(cur);
prev = cur;
tx = dmu_tx_create_dd(
spa_get_dsl(vd->vdev_spa)->dp_root_dir);
dmu_tx_assign(tx, DMU_TX_WAIT);
}
metaslab_force_alloc(cur, start, size, tx);
} else {
continue;
}
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (line)
free(line);
spa_config_exit(spa, SCL_VDEV | SCL_ALLOC, FTAG);
spa_close(spa, FTAG);
}
static int
zhack_do_metaslab(int argc, char **argv)
{
char *subcommand;
argc--;
argv++;
if (argc == 0) {
(void) fprintf(stderr,
"error: no metaslab operation specified\n");
usage();
}
subcommand = argv[0];
if (strcmp(subcommand, "leak") == 0) {
zhack_do_metaslab_leak(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
usage();
}
return (0);
}
#define ASHIFT_UBERBLOCK_SHIFT(ashift) \
MIN(MAX(ashift, UBERBLOCK_SHIFT), \
MAX_UBERBLOCK_SHIFT)
@ -524,6 +839,23 @@ zhack_repair_read_label(const int fd, vdev_label_t *vl,
return (0);
}
static int
zhack_repair_get_byteswap(const zio_eck_t *vdev_eck, const int l, int *byteswap)
{
if (vdev_eck->zec_magic == ZEC_MAGIC) {
*byteswap = B_FALSE;
} else if (vdev_eck->zec_magic == BSWAP_64((uint64_t)ZEC_MAGIC)) {
*byteswap = B_TRUE;
} else {
(void) fprintf(stderr, "error: label %d: "
"Expected the nvlist checksum magic number but instead got "
"0x%" PRIx64 "\n",
l, vdev_eck->zec_magic);
return (1);
}
return (0);
}
static void
zhack_repair_calc_cksum(const int byteswap, void *data, const uint64_t offset,
const uint64_t abdsize, zio_eck_t *eck, zio_cksum_t *cksum)
@ -550,33 +882,10 @@ zhack_repair_calc_cksum(const int byteswap, void *data, const uint64_t offset,
}
static int
zhack_repair_check_label(uberblock_t *ub, const int l, const char **cfg_keys,
const size_t cfg_keys_len, nvlist_t *cfg, nvlist_t *vdev_tree_cfg,
uint64_t *ashift)
zhack_repair_get_ashift(nvlist_t *cfg, const int l, uint64_t *ashift)
{
int err;
if (ub->ub_txg != 0) {
(void) fprintf(stderr,
"error: label %d: UB TXG of 0 expected, but got %"
PRIu64 "\n",
l, ub->ub_txg);
(void) fprintf(stderr, "It would appear the device was not "
"properly removed.\n");
return (1);
}
for (int i = 0; i < cfg_keys_len; i++) {
uint64_t val;
err = nvlist_lookup_uint64(cfg, cfg_keys[i], &val);
if (err) {
(void) fprintf(stderr,
"error: label %d, %d: "
"cannot find nvlist key %s\n",
l, i, cfg_keys[i]);
return (err);
}
}
nvlist_t *vdev_tree_cfg;
err = nvlist_lookup_nvlist(cfg,
ZPOOL_CONFIG_VDEV_TREE, &vdev_tree_cfg);
@ -600,7 +909,7 @@ zhack_repair_check_label(uberblock_t *ub, const int l, const char **cfg_keys,
(void) fprintf(stderr,
"error: label %d: nvlist key %s is zero\n",
l, ZPOOL_CONFIG_ASHIFT);
return (err);
return (1);
}
return (0);
@ -615,30 +924,35 @@ zhack_repair_undetach(uberblock_t *ub, nvlist_t *cfg, const int l)
*/
if (BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp) != 0) {
const uint64_t txg = BP_GET_LOGICAL_BIRTH(&ub->ub_rootbp);
int err;
ub->ub_txg = txg;
if (nvlist_remove_all(cfg, ZPOOL_CONFIG_CREATE_TXG) != 0) {
err = nvlist_remove_all(cfg, ZPOOL_CONFIG_CREATE_TXG);
if (err) {
(void) fprintf(stderr,
"error: label %d: "
"Failed to remove pool creation TXG\n",
l);
return (1);
return (err);
}
if (nvlist_remove_all(cfg, ZPOOL_CONFIG_POOL_TXG) != 0) {
err = nvlist_remove_all(cfg, ZPOOL_CONFIG_POOL_TXG);
if (err) {
(void) fprintf(stderr,
"error: label %d: Failed to remove pool TXG to "
"be replaced.\n",
l);
return (1);
return (err);
}
if (nvlist_add_uint64(cfg, ZPOOL_CONFIG_POOL_TXG, txg) != 0) {
err = nvlist_add_uint64(cfg, ZPOOL_CONFIG_POOL_TXG, txg);
if (err) {
(void) fprintf(stderr,
"error: label %d: "
"Failed to add pool TXG of %" PRIu64 "\n",
l, txg);
return (1);
return (err);
}
}
@ -732,6 +1046,7 @@ zhack_repair_test_cksum(const int byteswap, void *vdev_data,
BSWAP_64(ZEC_MAGIC) : ZEC_MAGIC;
const uint64_t actual_magic = vdev_eck->zec_magic;
int err = 0;
if (actual_magic != expected_magic) {
(void) fprintf(stderr, "error: label %d: "
"Expected "
@ -753,6 +1068,36 @@ zhack_repair_test_cksum(const int byteswap, void *vdev_data,
return (err);
}
static int
zhack_repair_unpack_cfg(vdev_label_t *vl, const int l, nvlist_t **cfg)
{
const char *cfg_keys[] = { ZPOOL_CONFIG_VERSION,
ZPOOL_CONFIG_POOL_STATE, ZPOOL_CONFIG_GUID };
int err;
err = nvlist_unpack(vl->vl_vdev_phys.vp_nvlist,
VDEV_PHYS_SIZE - sizeof (zio_eck_t), cfg, 0);
if (err) {
(void) fprintf(stderr,
"error: cannot unpack nvlist label %d\n", l);
return (err);
}
for (int i = 0; i < ARRAY_SIZE(cfg_keys); i++) {
uint64_t val;
err = nvlist_lookup_uint64(*cfg, cfg_keys[i], &val);
if (err) {
(void) fprintf(stderr,
"error: label %d, %d: "
"cannot find nvlist key %s\n",
l, i, cfg_keys[i]);
return (err);
}
}
return (0);
}
static void
zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
vdev_label_t *vl, const uint64_t label_offset, const int l,
@ -766,10 +1111,7 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
(zio_eck_t *)((char *)(vdev_data) + VDEV_PHYS_SIZE) - 1;
const uint64_t vdev_phys_offset =
label_offset + offsetof(vdev_label_t, vl_vdev_phys);
const char *cfg_keys[] = { ZPOOL_CONFIG_VERSION,
ZPOOL_CONFIG_POOL_STATE, ZPOOL_CONFIG_GUID };
nvlist_t *cfg;
nvlist_t *vdev_tree_cfg = NULL;
uint64_t ashift;
int byteswap;
@ -777,18 +1119,9 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
if (err)
return;
if (vdev_eck->zec_magic == 0) {
(void) fprintf(stderr, "error: label %d: "
"Expected the nvlist checksum magic number to not be zero"
"\n",
l);
(void) fprintf(stderr, "There should already be a checksum "
"for the label.\n");
err = zhack_repair_get_byteswap(vdev_eck, l, &byteswap);
if (err)
return;
}
byteswap =
(vdev_eck->zec_magic == BSWAP_64((uint64_t)ZEC_MAGIC));
if (byteswap) {
byteswap_uint64_array(&vdev_eck->zec_cksum,
@ -804,16 +1137,7 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
return;
}
err = nvlist_unpack(vl->vl_vdev_phys.vp_nvlist,
VDEV_PHYS_SIZE - sizeof (zio_eck_t), &cfg, 0);
if (err) {
(void) fprintf(stderr,
"error: cannot unpack nvlist label %d\n", l);
return;
}
err = zhack_repair_check_label(ub,
l, cfg_keys, ARRAY_SIZE(cfg_keys), cfg, vdev_tree_cfg, &ashift);
err = zhack_repair_unpack_cfg(vl, l, &cfg);
if (err)
return;
@ -821,6 +1145,19 @@ zhack_repair_one_label(const zhack_repair_op_t op, const int fd,
char *buf;
size_t buflen;
if (ub->ub_txg != 0) {
(void) fprintf(stderr,
"error: label %d: UB TXG of 0 expected, but got %"
PRIu64 "\n", l, ub->ub_txg);
(void) fprintf(stderr, "It would appear the device was "
"not properly detached.\n");
return;
}
err = zhack_repair_get_ashift(cfg, l, &ashift);
if (err)
return;
err = zhack_repair_undetach(ub, cfg, l);
if (err)
return;
@ -970,17 +1307,35 @@ zhack_do_label(int argc, char **argv)
int
main(int argc, char **argv)
{
struct sigaction action;
char *path[MAX_NUM_PATHS];
const char *subcommand;
int rv = 0;
int c;
/*
* Set up signal handlers, so if we crash due to bad on-disk data we
* can get more info. Unlike ztest, we don't bail out if we can't set
* up signal handlers, because zhack is very useful without them.
*/
action.sa_handler = sig_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGSEGV, &action, NULL) < 0) {
(void) fprintf(stderr, "zhack: cannot catch SIGSEGV: %s\n",
strerror(errno));
}
if (sigaction(SIGABRT, &action, NULL) < 0) {
(void) fprintf(stderr, "zhack: cannot catch SIGABRT: %s\n",
strerror(errno));
}
g_importargs.path = path;
dprintf_setup(&argc, argv);
zfs_prop_init();
while ((c = getopt(argc, argv, "+c:d:")) != -1) {
while ((c = getopt(argc, argv, "+c:d:Go:")) != -1) {
switch (c) {
case 'c':
g_importargs.cachefile = optarg;
@ -989,6 +1344,13 @@ main(int argc, char **argv)
assert(g_importargs.paths < MAX_NUM_PATHS);
g_importargs.path[g_importargs.paths++] = optarg;
break;
case 'G':
g_dump_dbgmsg = B_TRUE;
break;
case 'o':
if (handle_tunable_option(optarg, B_FALSE) != 0)
exit(1);
break;
default:
usage();
break;
@ -1006,10 +1368,14 @@ main(int argc, char **argv)
subcommand = argv[0];
if (strcmp(subcommand, "feature") == 0) {
if (strcmp(subcommand, "action") == 0) {
rv = zhack_do_action(argc, argv);
} else if (strcmp(subcommand, "feature") == 0) {
rv = zhack_do_feature(argc, argv);
} else if (strcmp(subcommand, "label") == 0) {
return (zhack_do_label(argc, argv));
} else if (strcmp(subcommand, "metaslab") == 0) {
rv = zhack_do_metaslab(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
@ -1021,6 +1387,9 @@ main(int argc, char **argv)
"changes may not be committed to disk\n");
}
if (g_dump_dbgmsg)
dump_debug_buffer();
kernel_fini();
return (rv);

View File

@ -1,4 +1,5 @@
#!/usr/bin/env @PYTHON_SHEBANG@
# SPDX-License-Identifier: CDDL-1.0
#
# Print out statistics for all zil stats. This information is
# available through the zil kstat.
@ -46,6 +47,7 @@ cols = {
"cec": [5, 1000, "zil_commit_error_count"],
"csc": [5, 1000, "zil_commit_stall_count"],
"cSc": [5, 1000, "zil_commit_suspend_count"],
"cCc": [5, 1000, "zil_commit_crash_count"],
"ic": [5, 1000, "zil_itx_count"],
"iic": [5, 1000, "zil_itx_indirect_count"],
"iib": [5, 1024, "zil_itx_indirect_bytes"],

Some files were not shown because too many files have changed in this diff Show More