Compare commits

...

105 Commits

Author SHA1 Message Date
Brian Behlendorf 1af41fd203 Tag zfs-2.3.3
META file and changelog updated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-06-19 08:21:04 -07:00
Tony Hutter faefa5ffc3 Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17393
2025-06-17 10:50:27 -07:00
Germano Massullo 777d8ee345 Fix mixed-use-of-spaces-and-tabs rpmlint warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Germano Massullo <germano.massullo@gmail.com>
Closes #17461
2025-06-17 10:50:27 -07:00
Rob Norris b00bc81b05 ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17308
2025-06-17 10:50:27 -07:00
Alexander Motin f7e6dcc68d Relax zfs_vnops_read_chunk_size limitations
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB.  Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.

On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.

Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.

My profiles show ~4% of CPU time saving when reading 16MB blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17415
2025-06-17 10:50:27 -07:00
Rob Norris e2de00ca44 dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-06-17 10:50:27 -07:00
Allan Jude 8e9ffe1b4f ARC: parallel eviction
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #16486
2025-06-17 10:50:26 -07:00
Don Brady 6b67a5bdd3 During pool export flush the ARC asynchronously
This also includes removing L2 vdevs asynchronously.

This commit also guarantees that spa_load_guid is unique.

The zpool reguid feature introduced the spa_load_guid, which is a
transient value used for runtime identification purposes in the ARC.
This value is not the same as the spa's persistent pool guid.

However, the value is seeded from spa_generate_load_guid() which
does not check for uniqueness against the spa_load_guid from other
pools.  Although extremely rare, you can end up with two different
pools sharing the same spa_load_guid value! So we guarantee that
the value is always unique and additionally not still in use by an
async arc flush task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16215
2025-06-17 10:50:26 -07:00
Alexander Motin 4f1b91e343 CI: Automate some GitHub PR status labels manipulations
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
 - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.

I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels.  It is better to have no
labels than incorrect ones saying there is nothing to look at.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16721
2025-06-17 10:50:26 -07:00
Rob Norris a65225ec7e FreeBSD: zfs_putpages: don't undirty pages until after write completes
zfs_putpages() would put the entire range of pages onto the ZIL, then
return VM_PAGER_OK for each page to the kernel. However, an associated
zil_commit() or txg sync had not happened at this point, so the write
may not actually be on disk.

So, we rework it to use a ZIL commit callback, and do the post-write
work of undirtying the page and signaling completion there. We return
VM_PAGER_PEND to the kernel instead so it knows that we will take care
of it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris 9c0f5bc183 zfs_log_write: only put the callback on the last itx
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris e1dd433a44 zpl_sync_fs: work around kernels that ignore sync_fs errors
If the kernel will honour our error returns, use them. If not, fool it
by setting a writeback error on the superblock, if available.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris 08cec6532e zfs_sync: return error when pool suspends
If the pool is suspended, we'll just block in zil_commit(). If the
system is shutting down, blocking wouldn't help anyone. So, we should
keep this test for now, but at least return an error for anyone who is
actually interested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris d944641502 zfs_sync: remove support for impossible scenarios
The superblock pointer will always be set, as will z_log, so remove code
supporting cases that can't occur (on Linux at least).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris c758072b2f zts: test syncfs() behaviour when pool suspends
Fairly coarse, but if it returns while the pool suspends, it must be
with an error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Alexander Motin 0c9cdd1606 Improve block cloning transactions accounting
Previous dmu_tx_count_clone() was broken, stating that cloning is
similar to free.  While they might be from some points, cloning
is not net-free.  It will likely consume space and memory, and
unlike free it will do it no matter whether the destination has
the blocks or not (usually not, so previous code did nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17431
2025-06-17 10:50:26 -07:00
Alexander Motin 23fad19818 Reduce zfs_dmu_offset_next_sync penalty
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17434
2025-06-17 10:50:26 -07:00
Alexander Motin 3897e86bd1 Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-06-17 10:50:26 -07:00
Alexander Motin 4c8d0471fa Allow zero compression if dedup is enabled
Having high-refcount dedup entries for zero blocks is inefficient
when they could be recorded as a holes instead.  Normally, zero
compression is not done if compression is disabled to not confuse
naive benchmarks.  But with dedup enabled, it is expected that the
write will be skipped anyway, so we are just optimizing the way it
is skipped.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17435
2025-06-17 10:50:26 -07:00
Tino Reichardt ea3a600bba ZTS: Enable io_uring on CentOS Stream 9 and 10 also
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17447
2025-06-17 10:50:26 -07:00
Attila Fülöp e9c1e08e07 Linux build: silence objtool warnings
After #17401 the Linux build produces some stack related warnings.

Silence them with the `STACK_FRAME_NON_STANDARD` macro.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17410
2025-06-17 10:50:26 -07:00
Brian Behlendorf 1688d9991d CI: Retire Fedora 40 builder
Fedora 40 has gone EOL as of May 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17408
2025-06-17 10:50:26 -07:00
Tino Reichardt e0ad633c64 ZTS: Enable io_uring support on el9/el10
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Tino Reichardt da4dfa85eb ZTS: Add AlmaLinux 10
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Rob Norris 7be33d2d40 abd_os: move headers from libzpool to libspl
5b9e695 added specific userspace versions of abd_os.h and abd_impl_os.h
for libzpool. However, abd.h and abd_impl.h, which include them, are
packaged with libzfs, so other programs building against libzfs can
fail to build, either because the headers aren't installed, or because
they aren't on any standard include path.

So, move abd_os.h and abd_impl_os.h to libspl, where they we will be
installed alongside abd.h and abd_impl.h in a known path.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16940
Closes #17390
Closes #17394
2025-06-17 10:50:26 -07:00
Alexander Motin bf4baee81e Set spa_final_txg in spa_unload()
I've noticed that after some dedup tests system reboot ends up in
assertion about ms_defer tree not free.  It seems to be caused by
DDT flushing still freeing some blocks while ZFS trying to reach
a final steady state due to spa_final_txg, while being set by
spa_export_common() on pool export, is not set when spa_unload()
is called by spa_evict_all() on system reboot/shutdown.  Setting
spa_final_txg in spa_unload() fixes this issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17395
2025-06-17 10:50:26 -07:00
Ameer Hamza e93d15f112 zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message
Disks can be removed either by the administrator via hotplug or by the
kernel when a disk failure occurs. The previous message implied that
removal was always manual, which could be confusing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Ameer Hamza f292b0f146 vdev: skip faulting disks pending removal
This patch fixes a race where vdev_remove_wanted may be set after probe
initiation, which could otherwise trigger redundant fault and removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Brian Behlendorf bda0bc6304 CI: Retire Ubuntu 20.04 builder
Ubuntu 20.04 has gone EOL as of April 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17403
2025-06-17 10:50:26 -07:00
Rob Norris 04493ca819 linux/zvol_os: don't try to set disk ops if alloc fails
If the kernel fails to allocate the gendisk, zvo_disk will be NULL, and
derefencing it will explode. So don't do that.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17396
2025-06-17 10:50:26 -07:00
Attila Fülöp 2c53fe7764 Linux build: always use objtool
We silence `objtool` warnings on some object files using
`OBJECT_FILES_NON_STANDARD_some_file.o`. Nowadays `objtool` is
needed for CPU vulnerability mitigations and a lot more
functionality so its use is desirable.

Just remove the `OBJECT_FILES_NON_STANDARD` definitions. A follow-up
commit is needed to make the offending files standard and address
the compile time warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17401
Closes #17364
2025-06-17 10:50:26 -07:00
Rob Norris d7bb6bbf13 tunables: fix spelling
Three occurences with an 'e', and all of them mine. Maybe it's an
British thing?

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris b8f80812a3 tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris 97696962b5 tunables: remove unused param get/set aliases
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris 06fd6dc6f7 tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris 28ff5ff1c6 tunables: remove support for s64 tunables
Nothing uses them now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris e9002887e2 tunables: remove direct use of module_param_cb
The use for spl_taskq_kick was the only use, and the comment that
module_param_call is obsolete is no longer true - it's still very much
used even in recent kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris 840b070ec7 tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris d02d3add0d tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.

This matters very little so long as we don't have userspace tunables,
but its still good hygeine.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris cc5724f38d tunables: don't assert initialisation in impl getters
It actually doesn't matter if it's not initialised when we first query
the current value; it just returns empty-string. A crash is quite
obnoxious even if it is a rare case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris 8317244270 zfs_log: make zfs_immediate_write_sz uint
Likely it's only int64 for comparison with ssize_t, which is signed.
However, it would make no sense for it to be less than 0 or greater than
4G, so making it a regular uint will make it safe for comparison and
remove the only S64 tunable in core.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Paul Dagnelie 65cf521353 Only interrupt active disk I/Os in failmode=continue
failmode=continue is in a sorry state. Originally designed to fix a very
specific problem, it causes crashes and panics for most people who end
up trying to use it. At this point, we should either remove it entirely,
or try to make it more usable.

With this patch, I choose the latter. While the feature is fundamentally
unpredictable and prone to race conditions, it should be possible to get
it to the point where it can at least sometimes be useful for some
users. This patch fixes one of the major issues with failmode=continue:
it interrupts even ZIOs that are patiently waiting in line behind stuck
IOs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17372
2025-06-17 10:49:40 -07:00
Pavel Snajdr 08caad8257 zcp: get_prop: fix encryptionroot and encryption
It was reported that channel programs' zfs.get_prop doesn't work for
dataset properties encryption and encryptionroot.

They are handled in get_special_prop due to the need to call
dsl_dataset_crypt_stats to load those dsl props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Graham Christensen <graham@grahamc.com>
Closes #17280
2025-06-17 10:49:40 -07:00
Fedor Uporov d187e3e1a7 ZVOL: Comment platform-specific empty functions bodies on FreeBSD side
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17383
2025-06-17 10:49:40 -07:00
Ameer Hamza 1215c3b609 Expose dataset encryption status via fast stat path
In truenas_pylibzfs, we query list of encrypted datasets several times,
which is expensive. This commit exposes a public API zfs_is_encrypted()
to get encryption status from fast stat path without having to refresh
the properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17368
2025-06-17 10:49:40 -07:00
Alexander Motin 2fe0d5df94 ZIL: Improve write log size accounting
Before this change write log size TXG throttling mechanism was
accounting only user payload bytes.  But the actual ZIL both on
disk and especially in memory include headers of hundred(s) of
bytes.  Not accouting those may allow applications like
bonnie++, in their wisdom writing one byte at a time, to consume
excessive amount of memory and ZIL/SLOG in one TXG.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17373
2025-06-17 10:49:40 -07:00
George Amanakis fa545db846 ZTS: testing for leaked key mappings in encrypted non-raw send
This test covers a bug fixed by commit ea74cde: performing an
incremental non-raw send from an encrypted filesystem followed by
exporting the pool. Before that commit, exporting the sending pool
in this scenario would trigger a panic:

VERIFY(avl_is_empty(&sk->sk_dsl_keys)) failed
PANIC at dsl_crypt.c:353:spa_keystore_fini()
Call Trace:
 spl_dumpstack+0x29/0x2f [spl]
 spl_panic+0xd1/0xe9 [spl]
 spl_assert.constprop.0+0x1a/0x30 [zfs]
 spa_keystore_fini+0xc2/0xf0 [zfs]
 spa_deactivate+0x25f/0x610 [zfs]
 spa_evict_all+0xf4/0x200 [zfs]
 spa_fini+0x13/0x140 [zfs]
 zfs_kmod_fini+0x72/0xc0 [zfs]
 openzfs_fini_os+0x13/0x3a [zfs]
 openzfs_fini+0x9/0x6b8 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #17366
2025-06-17 10:49:40 -07:00
Cameron Harr cfb9cba51c Refactor man page and CLI help output per mandoc
The man page and the usage statement from the CLI have been refactored
to abide by the ManDoc standard. Style changes include:
 * Upper-case letters before lower-case
 * List short options w/o arguments first
 * Then list short options w/ arguments
 * Then list long arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #17357
2025-06-17 10:49:40 -07:00
Cameron Harr b647336cc4 Reformat cli help and man page to be in sync
The man page and CLI usage statements were both a little out
of sync and neither fully alphabetized correctly. That has
been fixed. One outstanding question is whether to get rid of
the ellipses on the CLI usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16004
Closes #17357
2025-06-17 10:49:40 -07:00
Paul Dagnelie b9324a1e75 Fix off-by-one bug in range tree code
Without this fix, zfs_range_tree_find_in could return an overlap when
the found range starts immediately after the searched range, with no
actual overlap.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17363
2025-06-17 10:49:40 -07:00
Alexander Motin 64e77fdf3b Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
(cherry picked from commit 5c30b24381)
2025-05-28 16:00:28 -07:00
Andres fd13ad0503 Update 69-vdev.rules.in
Add support to alias md-type devices in udev rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andres <a-d-j-i@users.noreply.github.com>
Closes #17345
(cherry picked from commit a6f20250de)
2025-05-28 16:00:28 -07:00
Rob Norris b9f5227d2b lzc_ioctl_fd: add ZFS_IOC_TRACE envvar to enable ioctl tracing
When set, dumps all ZFS ioctl calls and returns and their nvlists to
STDERR, to make debugging and understanding a lot easier.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit a387b7599c)
2025-05-28 16:00:28 -07:00
Rob Norris b7d14266f1 lzc: move lzc_ioctl_fd() into lzc proper
Name the OS-specific call lzc_ioctl_fd_os(), and make lzc_ioctl_fd()
wrap it, so we can do more in the wrapper.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit c4c3917b2a)
2025-05-28 16:00:28 -07:00
Rob Norris 13768f46e6 libzfs: ensure all ioctl calls go through lzc_ioctl_fd()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
(cherry picked from commit f454cc1723)
2025-05-28 16:00:28 -07:00
Richard Yao 999717c5ac Add Quality Assurance to pull request template
PRs like #17352 have no applicable checkbox, so let us add one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17354
(cherry picked from commit 2e5e4bb0f8)
2025-05-28 16:00:28 -07:00
Richard Yao 25ad9ce692 dmu_objset_hold_flags() should call dsl_dataset_rele_flags() on error
This was caught when doing a manual check to see if #17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in #17340.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17353
(cherry picked from commit 83fa80a550)
2025-05-28 16:00:28 -07:00
Ameer Hamza 6b70ca665d arcstat: prevent ZeroDivisionError when L2ARC becomes empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard@ryao.dev>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17348
(cherry picked from commit f0baaa329a)
2025-05-28 16:00:28 -07:00
Rob Norris 8c0f7619b2 Linux 6.2/6.15: del_timer_sync() renamed to timer_delete_sync()
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or
functional change apart from that, so a very minimal update for us.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
(cherry picked from commit 841be1d049)
2025-05-28 16:00:28 -07:00
Rob Norris e64d4718a7 Linux 6.15: mkdir now returns struct dentry *
The intent is that the filesystem may have a reference to an "old"
version of the new directory, eg if it was keeping it alive because a
remote NFS client still had it open.

We don't need anything like that, so this really just changes things so
we return error codes encoded in pointers.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
(cherry picked from commit bb740d66de)
2025-05-28 16:00:28 -07:00
Richard Yao a4de1d38da icp: Use explicit_memset() exclusively in gcm_clear_ctx()
d634d20d1b had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.

When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17343
(cherry picked from commit d8a33bc0a5)
2025-05-28 16:00:28 -07:00
George Amanakis f28c685a84 Fix 2 bugs in non-raw send with encryption
Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <hbarta@gmail.com>
Contributions-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12014
Closes #17340
(cherry picked from commit ea74cdedda)
2025-05-28 16:00:28 -07:00
Paul Dagnelie 6517ebf4da Cause zpool scan resume commands to get logged in history
Currently, commands that resume a scrub/errorscrub from a paused state
don't get logged in the pool history. This is because resumes actually
return ECANCELED, instead of 0. This causes the tsd code in the common
ioctl logic to not think the ioctl succeeded, which causes the
log_history ioctl to fail with EPERM. However, for resuming a scrub from
a paused state, ECANCELED is success.

There are two options for how to deal with this. The first is the one
that I implemented here; I can't find a good reason for dmu_scan to
return ECANCELED on resume instead of 0, so let's just not. The only
place we check for the ECANCELED value is in zpool_scan, where we just
convert it back to zero.  However, I am aware that this is changing an
ioctl interface, which I believe is a breaking change. I don't think
it's an important change, but maybe there is someone who relies on it.

The other option that could be implemented is to either allow ECANCELED
specifically from dsl_scan in the common ioctl code, or add a generic
facility to the common ioctl code that allows each command to specify
whether or not success happened, regardless of the return values. I am
open to feedback on which option people think would be better.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17301
(cherry picked from commit 086105f4c4)
2025-05-28 16:00:28 -07:00
Alexander Motin 97c1fb6ad5 ARC: Notify dbuf cache about target size reduction
ARC target size might drop significantly under memory pressure,
especially if current ARC size was much smaller than the target.
Since dbuf cache size is a fraction of the target ARC size, it
might need eviction too.  Aside of memory from the dbuf eviction
itself, it might help ARC by making more buffers evictable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17314
(cherry picked from commit 89a8a91582)
2025-05-28 16:00:28 -07:00
Alexander Motin db290fd48b Linux: Stop using NR_FILE_PAGES for ARC scaling
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES.  But it is actually a non-evictable
anonymous memory.  Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.

Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17334
(cherry picked from commit 0aa83dce99)
2025-05-28 16:00:28 -07:00
Tony Hutter a2bcf194f8 runners: Add option to install custom kernel on Fedora
Allow installing a custom kernel version from the Fedora experimental
kernel repos onto the github runners.  This is useful for testing if
ZFS works against a newer kernel.

Fedora has a number of repos with experimental kernel packages. This
PR allows installs from kernels in these repos:

@kernel-vanilla/stable
@kernel-vanilla/mainline
(https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories)

You will need to manually kick of a github runner to test with a custom
kernel version.  To do that, go to the github actions tab under
'zfs-qemu' and click the drop-down for 'run workflow'.  In there you
will see a text box to specify the version (like '6.14').  The scripts
will do their best to match the version to the newest matching version
that the repos support (since they're may be multiple nightly versions
of, say, '6.14').  A full list of kernel versions can be seen in the
dependency stage output if you kick off a manual run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17156
(cherry picked from commit b55256e5bb)
2025-05-28 16:00:28 -07:00
diwakar-kristappagari 280c3c3ace vdev_id: symlinks creation for multipath disk partitions (#17331)
It has been observed that the symlinks are not being created
for the disk partitions on multipath enabled systems.
This fix addresses the issue.

Signed-off-by: Diwakar Kristappagari <diwakar-k@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
(cherry picked from commit e2ba0f7643)
2025-05-28 16:00:28 -07:00
Rob Norris 2e1f95b6bc AUTHORS/mailmap: update with new contributors
Thanks everyone!

Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit 485a2d0112)
2025-05-28 16:00:28 -07:00
Rob Norris 0dc3656a55 update_authors: output possible mailmap additions
Once we've selected a best ident for the AUTHORS file, it makes sense to
set up a corresponding mailmap entry for any other ident for that
committer, to ensure the git history also reflects this into the future.

So, here we output potential mailmap updates for a human to consider.

For the moment, this needs to be done by a human, because update_authors
uses git to get the author names, and thus is reliant on the mailmap
contents to generate its output, so having it update mailmap directly
would introduce a circular dependency that I'm not totally sure about.
It's definitely better than having to go back through the history and
check each commit by hand though.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit ae2caf9cb0)
2025-05-28 16:00:28 -07:00
Rob Norris f8bad9b1f9 update_authors: consider Signed-off-by trailers for committer idents
I've increasingly found that commits from new contributors have the
author set in the "Github noreply" obfuscated style. If they do have a
better canonical choice, it's usually in the Signed-off-by: trailer in
the commit message.

I had avoided using these in the first version of this program because
they aren't always present, aren't always correct, and some commits have
multiple signoffs. It seems however that requiring either the name or
the email address to match the commit author sufficiently narrows the
scope to be useful for the "Github noreply" situation, which is really
the main sticking point. And of course, if it gets it wrong, overriding
in .mailmap or AUTHORS is always an option.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
(cherry picked from commit 8e318fda80)
2025-05-28 16:00:28 -07:00
Rob Norris 56a61d54f4 test-runner: rework output dir construction
The old code would compare all the test group names to work out some
sort of common path, but it didn't appear to work consistently,
sometimes placing output in a top-level dir, other times in one or more
subdirs. (I confess, I do not quite understand what it's supposed to
do).

This is a very simple rework that simply looks at all the test group
paths, removes common leading components, and uses the remainder as the
output directory. This should work because groups paths are unique, and
means we get a output dir tree of roughly the same shape as the test
groups in the runfiles and the test source dirs themselves.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: @ImAwsumm
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17167
(cherry picked from commit 9aae14a14a)
2025-05-28 16:00:28 -07:00
Mariusz Zaborski 1c11d3a54f spa: clear checkpoint information during retry
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17319
(cherry picked from commit 8b9c4e643b)
2025-05-28 16:00:28 -07:00
Rob Norris db988fabfb linux/uio: remove "skip" offset for UIO_ITER
For UIO_ITER, we are just wrapping a kernel iterator. It will take care
of its own offsets if necessary. We don't need to do anything, and if we
do try to do anything with it (like advancing the iterator by the skip
in zfs_uio_advance) we're just confusing the kernel iterator, ending up
at the wrong position or worse, off the end of the memory region.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17298
(cherry picked from commit 2ee5b51a57)
2025-05-28 16:00:28 -07:00
Alan Somers fa2cdaa604 More aggressively assert that db_mtx protects db.db_data
db.db_mtx must be held any time that db.db_data is accessed.  All of
these functions do have the lock held by a parent; add assertions to
ensure that it stays that way.

See https://github.com/openzfs/zfs/discussions/17118

* Refactor dbuf_read_bonus to make it obvious why db_rwlock isn't
required.

* Refactor dbuf_hold_copy to eliminate the db_rwlock
Copy data into the newly allocated buffer before assigning it to the db.
That way, there will be no need to take db->db_rwlock.

* Refactor dbuf_read_hole
In the case of an indirect hole, initialize the newly allocated buffer
before assigning it to the dmu_buf_impl_t.

Sponsored by:	ConnectWise
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #17209
(cherry picked from commit c17bdc4914)
2025-05-28 16:00:28 -07:00
Olivier Certner 51ed9640e9 FreeBSD: Use new SYSCTL_SIZEOF()
SYSCTL_SIZEOF() has been introduced in FreeBSD by commit "sysctl(9):
Ease exporting struct sizes; Discourage doing that" (713abc9880aa) in
branch 'main'.  It will soon be backported to 'stable/14'.  We will thus
be able to remove the old, alternate version left in the '#else' branch
as soon as 'stable/13' goes out of support (April 30, 2026).

Sponsored-by:   The FreeBSD Foundation
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olivier Certner <olce@FreeBSD.org>
Closes #17309
(cherry picked from commit 78628a5c15)
2025-05-28 16:00:28 -07:00
Alexander Motin 9b446fbb60 ARC: Avoid overflows in arc_evict_adj() (#17255)
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes #17210
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit b1ccab1721)
2025-05-28 16:00:28 -07:00
Rob Norris ce9cd12c97 txg: generalise txg_wait_synced_sig() to txg_wait_synced_flags() (#17284)
txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We
expect that future development will require similar "wait unless X"
behaviour.

This generalises the API as txg_wait_synced_flags(), where the provided
flags describe the events that should cause the call to return.

Instead of a boolean, the return is now an error code, which the caller
can use to know which event caused the call to return.

The existing call to txg_wait_synced_sig() is now
txg_wait_synced_flags(TXG_WAIT_SIGNAL).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
(cherry picked from commit a7de203c86)
2025-05-28 16:00:28 -07:00
Alexander Motin 52749ebb49 ZTS: Restore some delays in online_offline tests
After more CI runs and code reading after #17259 I've found that
online starts resilver via async mechanism, which does not provide
wait primitives at this time.  Restore some delays to restore CI
until this is properly fixed.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f85c96edf7)
2025-05-28 16:00:28 -07:00
Alexander Motin 101edf7ed9 Fix race between resilver wait and offline/detach
We should not clear scn_state and notify waiters until we call
vdev_dtl_reassess(), otherwise following offline/detach request
may fail with "no valid replicas".

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f86d9af16b)
2025-05-28 16:00:28 -07:00
José Luis Salvador Rufo a2593c1610 tests: fix S_IFMT undeclared at statx.c
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header
because it redeclares the `statx` function with different argument
types. Therefore, we define `S_IFMT` ourselves, in the same way as the
other definitions.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #17293
Closes #17294
(cherry picked from commit 634c172ee8)
2025-05-28 16:00:28 -07:00
Tony Hutter fcc7259789 ZTS: Stop zpool_status tests from spamming stdout (#17292)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of
a file without specifying 'of=/dev/null'.  This spams the ZTS logs
with ~20MB of garbage data. This commit adds 'of=/dev/null'.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
(cherry picked from commit a6cca8a7da)
2025-05-28 16:00:28 -07:00
Tony Hutter 4b014840ea Fix double spares for failed vdev
It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk.  This commit checks for that condition and disallows it.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #16547
Closes: #17231
(cherry picked from commit f40ab9e399)
2025-05-28 16:00:28 -07:00
Tino Reichardt cd777ba5ad ZTS: Fix replacement/resilver_restart_001 on FreeBSD
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.

Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1

The test time decreases also from around 00:42 to 00:24 seconds.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16822
Closes #17279
(cherry picked from commit 3b18877269)
2025-05-28 16:00:28 -07:00
Artem ad63ab2d90 Sort the blocking snapshots list #12751 (#17264)
When multiple snapshots prevent the destruction/rollback of the
respective dataset/snapshot/volume via zfs destroy or zfs rollback,
the error message does not list the blocking snapshots sorted
according to their order of creation. This causes inconvenience and can
lead to confusion, and also creates a contrast with a returned message
from zfs list -t snap function.

Closes: #12751

Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 27f3d94940)
2025-05-28 16:00:28 -07:00
Aleksandr Liber edae295af9 Double quote variables to prevent globbing and word splitting
This change goes through and quotes variables where appropriate to
avoid issues with incorrect splitting. The performance tests ran into
an issue with $SUDO_COMMAND splitting incorrectly because it was not
quoted. This change fixes that issue and hopefully gets ahead of any
other similar problems.

Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Aleksandr Liber <aleksandr.liber@perforce.com>
Closes #17235
(cherry picked from commit aa46cc9812)
2025-05-28 16:00:28 -07:00
Rob Norris c85f2fd531 cred: properly pass and test creds on other threads (#17273)
### Background

Various admin operations will be invoked by some userspace task, but the
work will be done on a separate kernel thread at a later time. Snapshots
are an example, which are triggered through zfs_ioc_snapshot() ->
dsl_dataset_snapshot(), but the actual work is from a task dispatched to
dp_sync_taskq.

Many such tasks end up in dsl_enforce_ds_ss_limits(), where various
limits and permissions are enforced. Among other things, it is necessary
to ensure that the invoking task (that is, the user) has permission to
do things. We can't simply check if the running task has permission; it
is a privileged kernel thread, which can do anything.

However, in the general case it's not safe to simply query the task for
its permissions at the check time, as the task may not exist any more,
or its permissions may have changed since it was first invoked. So
instead, we capture the permissions by saving CRED() in the user task,
and then using it for the check through the secpolicy_* functions.

### Current implementation

The current code calls CRED() to get the credential, which gets a
pointer to the cred_t inside the current task and passes it to the
worker task. However, it doesn't take a reference to the cred_t, and so
expects that it won't change, and that the task continues to exist. In
practice that is always the case, because we don't let the calling task
return from the kernel until the work is done.

For Linux, we also take a reference to the current task, because the
Linux credential APIs for the most part do not check an arbitrary
credential, but rather, query what a task can do. See
secpolicy_zfs_proc(). Again, we don't take a reference on the task, just
a pointer to it.

### Changes

We change to calling crhold() on the task credential, and crfree() when
we're done with it. This ensures it stays alive and unchanged for the
duration of the call.

On the Linux side, we change the main policy checking function
priv_policy_ns() to use override_creds()/revert_creds() if necessary to
make the provided credential active in the current task, allowing the
standard task-permission APIs to do the needed check. Since the task
pointer is no longer required, this lets us entirely remove
secpolicy_zfs_proc() and the need to carry a task pointer around as
well.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit c8fa39b46c)
2025-05-28 16:00:28 -07:00
Tino Reichardt aa9335bbbc ZTS: Optimize KSM on Linux and remove it for FreeBSD
Don't use KSM on the FreeBSD VMs and optimize KSM settings for
Linux to have faster run times.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17247
(cherry picked from commit ba17cedf65)
2025-05-28 16:00:28 -07:00
Quentin Thébault 4f34e8dcf6 zfs-rollback.8: fix typo in example number
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alexander Ziaee <ziaee@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr>
Closes #17282
(cherry picked from commit 63de2d2dbd)
2025-05-28 16:00:28 -07:00
Tino Reichardt a33e8b05ee ZTS: Use Ubuntu default url for cloud-image
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17278
(cherry picked from commit 88ec6c4f40)
2025-05-28 16:00:28 -07:00
Alexander Motin fbff1ae9f6 ZTS: Make zvol_stress write some more
Sometimes it fails unable to see any injected write errors.
I guess writing 25KB of zeroes might be not enough to trigger
errors with probability set to 10%.  Lets try to write more.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17270
(cherry picked from commit d947b9aedd)
2025-05-28 16:00:28 -07:00
Alexander Motin 273db246a4 ZTS: Reduce extra caching in pool_checkpoint (#17268)
Those tests are write-mostly at the nested pool.  Considering we have
3 more layers of caching underneath, we can hint ZFS how to use the
memory better by setting primarycache=metadata.

While there, add missing zpool sync after rm in checkpoint_capacity
before we could potentially see the freed space, would not there be
a pool checkpoint.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 1ef706c4ad)
2025-05-28 16:00:28 -07:00
Sebastian Pauka 28f0c5cfdc Support using llvm-libunwind
This commit adds support for using llvm-libunwind for kernels built
using llvm and clang. The two differences are that the largest register
index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check
whether the register is a floating point register and the prototype
for unw_regname takes the unwind cursor as the first argument.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Sebastian Pauka <me@spauka.se>
Closes #17230
(cherry picked from commit 1b4826b9a2)
2025-05-28 16:00:28 -07:00
Brian Atkinson a77d641f01 Export correct symbols for Lustre Direct I/O
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs
for supporting Direct I/O with ZFS. However, this has changed to using
abd_t structs instead. This exports the proper symbols that will be used
by the Lustre ZFS OSD code.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #17256
(cherry picked from commit 7031a48c70)
2025-05-28 16:00:28 -07:00
Artem-OSSRevival 0956fd736c Add more descriptive destroy error message
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org>
Fixes: #14538
Closes: #17234
(cherry picked from commit 37a3e26552)
2025-05-28 16:00:28 -07:00
Alexander Motin 7bb7ff7b49 ZTS: Fix 256MB file leak in zed_cksum_reported
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17267
(cherry picked from commit 38c3a8be83)
2025-05-28 16:00:28 -07:00
Tino Reichardt 658526db99 ZTS: Update FreeBSD version numbers
All defined variants:
- freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE)
- freebsd13-5s, freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes: #17260
(cherry picked from commit 6afb405d96)
2025-05-28 16:00:28 -07:00
Alexander Motin b590bfc6c8 ZTS: Remove fixed sleeps from slog_006_pos
Replace `sleep 15` with `zpool wait`, which should take much less
than the 15 seconds.  And considering it is called 16 times, this
should save us up to 4 minutes total.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17257
(cherry picked from commit 8f2c2dea3c)
2025-05-28 16:00:28 -07:00
Alexander Motin 03ac770008 ZTS: Polish online_offline tests
- Kill workload first for faster cleanup.
 - Use `zpool wait` for resilver instead of `sleep`.
 - Remove irrelevant workload from `online_offline_003_neg`.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes: #17259
(cherry picked from commit cb49e7701f)
2025-05-28 16:00:28 -07:00
Alexander Motin 95df01020d ZTS: Remove ashift setting from dedup_quota test (#17250)
The test writes 1M of 1KB blocks, which may produce up to 1GB of
dirty data.  On top of that ashift=12 likely produces additional
4GB of ZIO buffers during sync process.  On top of that we likely
need some page cache since the pool reside on files.  And finally
we need to cache the DDT.  Not surprising that the test regularly
ends up in OOMs, possibly depending on TXG size variations.

Also replace fio with pretty strange parameter set with a set of
dd writes and TXG commits, just as we neeed here.

While here, remove compression.  It has nothing to do here, but
waste CI CPU time.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 1d8f625233)
2025-05-28 16:00:28 -07:00
Alexander Motin 243a46f28d Cleanup VERIFY() macros (#17163)
- Fix VERIFY3B() when given non-boolean values.
 - Map EQUIV() into VERIFY3B(,==,) as equivalent.
 - Tune messages for better readability and to closer match source
code for easier search.  Unify user-space messages with kernel.
 - Tune printed types and remove %px outside of Linux kernel.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: @ImAwsumm
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 4866c2fabf)
2025-05-28 16:00:28 -07:00
Rob Norris 7fde3933fb vdev_to_nvlist_iter: ignore draid parameters when matching names (#17228)
Various tools will display draid vdev names with parameters embedded in
them, but would not accept them as valid vdev names when looking them
up, making it difficult to build pipelines involving draid vdevs.

This commit makes it so that if a full draid name is offered for match,
it gets truncated at the first ':' character.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 131df3bbf2)
2025-05-28 16:00:28 -07:00
Alexander Motin c2424f8d1a Improve L2 caching control for prefetched indirects
dbuf_prefetch_impl() should look on level of current indirect, not
the target prefetch level.  dbuf_prefetch_indirect_done() should
call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it.
It should fix some both false positive and negative L2ARC caching.

While there, fix redacted feature activation assertions.  One was
always true, while another could give false positive if dpa_dnode
is NULL.

George Amanakis <gamanakis@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17204
(cherry picked from commit a497c5fc8b)
2025-05-28 16:00:28 -07:00
Alexander Motin 40b9ad19cc ZTS: Remove TXG_TIMEOUT from dedup_quota test (#17150)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system
with the amount of dirty data caused by DDT updates within one
TXG due to tiny 1KB records used, while I see no reason for this
test to extend the TXGs beyond default.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: @ImAwsumm
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
(cherry picked from commit 21850f519b)
2025-05-28 16:00:28 -07:00
Alexander Motin 602fecc316 Prefer embedded blocks to dedup
Since embedded blocks introduction 11 years ago, their writing was
blocked if dedup is enabled.  After searching through the modern
code I see no reason for this restriction to exist.  Same time
embedded blocks are dramatically cheaper.  Even regular write of
so small blocks would likely be cheaper than deduplication, even
if the last is successful, not mentioning otherwise.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17113
(cherry picked from commit 09f4dd06c3)
2025-05-28 16:00:28 -07:00
Alexander Motin 588fa16830 ZAP: Reduce leaf array and free chunks fragmentation
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order.  Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order.  Together this created a mess in the free list, making
following allocations much more fragmented than necessary.

This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().

With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively.  Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16766
(cherry picked from commit 9a81484e35)
2025-05-28 16:00:28 -07:00
196 changed files with 2893 additions and 1327 deletions
+1
View File
@@ -27,6 +27,7 @@ https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Quality assurance (non-breaking change which makes the code more robust against bugs)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
+49
View File
@@ -0,0 +1,49 @@
name: labels
on:
pull_request_target:
types: [ opened, synchronize, reopened, converted_to_draft, ready_for_review ]
permissions:
pull-requests: write
jobs:
open:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'opened' && github.event.pull_request.draft }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --add-label "Status: Work in Progress"
push:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'synchronize' || github.event.action == 'reopened' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Inactive,Status: Revision Needed,Status: Stale"
draft:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'converted_to_draft' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Code Review Needed,Status: Inactive,Status: Revision Needed,Status: Stale" --add-label "Status: Work in Progress"
rfr:
runs-on: ubuntu-latest
if: ${{ github.event.action == 'ready_for_review' }}
steps:
- env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE: ${{ github.event.pull_request.html_url }}
run: |
gh pr edit $ISSUE --remove-label "Status: Accepted,Status: Inactive,Status: Revision Needed,Status: Stale,Status: Work in Progress" --add-label "Status: Code Review Needed"
@@ -29,6 +29,7 @@ FULL_RUN_IGNORE_REGEX = list(map(re.compile, [
Patterns of files that are considered to trigger full CI.
"""
FULL_RUN_REGEX = list(map(re.compile, [
r'\.github/workflows/scripts/.*',
r'cmd.*',
r'configs/.*',
r'META',
+9 -35
View File
@@ -10,36 +10,12 @@ set -eu
export DEBIAN_FRONTEND="noninteractive"
sudo apt-get -y update
sudo apt-get install -y axel cloud-image-utils daemonize guestfs-tools \
ksmtuned virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
# generate ssh keys
rm -f ~/.ssh/id_ed25519
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# /etc/ksmtuned.conf - Configuration file for ksmtuned
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=30
KSM_NPAGES_BOOST=0
KSM_NPAGES_DECAY=0
KSM_NPAGES_MIN=1000
KSM_NPAGES_MAX=25000
KSM_THRES_COEF=80
KSM_THRES_CONST=8192
LOGFILE=/var/log/ksmtuned.log
DEBUG=1
EOF
sudo systemctl restart ksm
sudo systemctl restart ksmtuned
# not needed
sudo systemctl stop docker.socket
sudo systemctl stop multipathd.socket
@@ -65,16 +41,14 @@ $DISK
sync
sleep 1
# swap with same size as RAM
# swap with same size as RAM (16GiB)
sudo mkswap $DISK-part1
sudo swapon $DISK-part1
# 60GB data disk
# JBOD 2xdisk for OpenZFS storage (test vm's)
SSD1="$DISK-part2"
# 10GB data disk on ext4
sudo fallocate -l 10G /test.ssd1
SSD2=$(sudo losetup -b 4096 -f /test.ssd1 --show)
sudo fallocate -l 12G /test.ssd2
SSD2=$(sudo losetup -b 4096 -f /test.ssd2 --show)
# adjust zfs module parameter and create pool
exec 1>/dev/null
@@ -83,11 +57,11 @@ ARC_MAX=$((1024*1024*512))
echo $ARC_MIN | sudo tee /sys/module/zfs/parameters/zfs_arc_min
echo $ARC_MAX | sudo tee /sys/module/zfs/parameters/zfs_arc_max
echo 1 | sudo tee /sys/module/zfs/parameters/zvol_use_blk_mq
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 \
-O relatime=off -O atime=off -O xattr=sa -O compression=lz4 \
-O mountpoint=/mnt/tests
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 -O relatime=off \
-O atime=off -O xattr=sa -O compression=lz4 -O sync=disabled \
-O redundant_metadata=none -O mountpoint=/mnt/tests
# no need for some scheduler
for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i > /dev/null
echo "none" | sudo tee $i
done
+41 -20
View File
@@ -14,13 +14,13 @@ OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-12-14"
FREEBSD="$REPO/releases/download/v2025-04-13"
URLzs=""
# Ubuntu mirrors
#UBMIRROR="https://cloud-images.ubuntu.com"
UBMIRROR="https://cloud-images.ubuntu.com"
#UBMIRROR="https://mirrors.cloud.tencent.com/ubuntu-cloud-images"
UBMIRROR="https://mirror.citrahost.com/ubuntu-cloud-images"
#UBMIRROR="https://mirror.citrahost.com/ubuntu-cloud-images"
# default nic model for vm's
NIC="virtio"
@@ -34,11 +34,14 @@ case "$OS" in
OSNAME="AlmaLinux 9"
URL="https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2"
;;
almalinux10)
OSNAME="AlmaLinux 10"
OSv="almalinux9"
URL="https://repo.almalinux.org/almalinux/10/cloud/x86_64/images/AlmaLinux-10-GenericCloud-latest.x86_64.qcow2"
;;
archlinux)
OSNAME="Archlinux"
URL="https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2"
# dns sometimes fails with that url :/
echo "89.187.191.12 geo.mirror.pkgbuild.com" | sudo tee /etc/hosts > /dev/null
;;
centos-stream10)
OSNAME="CentOS Stream 10"
@@ -58,11 +61,6 @@ case "$OS" in
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
fedora40)
OSNAME="Fedora 40"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
;;
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
@@ -80,16 +78,29 @@ case "$OS" in
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13-5r)
OSNAME="FreeBSD 13.5-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.5-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-2r)
OSNAME="FreeBSD 14.2-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd13-4s)
OSNAME="FreeBSD 13.4-STABLE"
freebsd13-5s)
OSNAME="FreeBSD 13.5-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
URLzs="$FREEBSD/amd64-freebsd-13.5-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
@@ -111,11 +122,6 @@ case "$OS" in
MIRROR="http://opensuse-mirror-gce-us.susecloud.net"
URL="$MIRROR/tumbleweed/appliances/openSUSE-MicroOS.x86_64-OpenStack-Cloud.qcow2"
;;
ubuntu20)
OSNAME="Ubuntu 20.04"
OSv="ubuntu20.04"
URL="$UBMIRROR/focal/current/focal-server-cloudimg-amd64.img"
;;
ubuntu22)
OSNAME="Ubuntu 22.04"
OSv="ubuntu22.04"
@@ -139,7 +145,7 @@ echo "ENV=$ENV" >> $ENV
# result path
echo 'RESPATH="/var/tmp/test_results"' >> $ENV
# FreeBSD 13 has problems with: e1000+virtio
# FreeBSD 13 has problems with: e1000 and virtio
echo "NIC=$NIC" >> $ENV
# freebsd15 -> used in zfs-qemu.yml
@@ -151,6 +157,14 @@ echo "OSv=\"$OSv\"" >> $ENV
# FreeBSD 15 (Current) -> used for summary
echo "OSNAME=\"$OSNAME\"" >> $ENV
# default vm count for testings
VMs=2
echo "VMs=\"$VMs\"" >> $ENV
# default cpu count for testing vm's
CPU=2
echo "CPU=\"$CPU\"" >> $ENV
sudo mkdir -p "/mnt/tests"
sudo chown -R $(whoami) /mnt/tests
@@ -212,12 +226,19 @@ sudo virt-install \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
# enable KSM on Linux
if [ ${OS:0:7} != "freebsd" ]; then
sudo virsh dommemstat --domain "openzfs" --period 5
sudo virsh node-memory-tune 100 50 1
echo 1 | sudo tee /sys/kernel/mm/ksm/run > /dev/null
fi
# Give the VMs hostnames so we don't have to refer to them with
# hardcoded IP addresses.
#
# vm0: Initial VM we install dependencies and build ZFS on.
# vm1..2 Testing VMs
for i in {0..9} ; do
for ((i=0; i<=VMs; i++)); do
echo "192.168.122.1$i vm$i" | sudo tee -a /etc/hosts
done
+29 -4
View File
@@ -4,6 +4,8 @@
# 3) install dependencies for compiling and loading
#
# $1: OS name (like 'fedora41')
# $2: (optional) Experimental Fedora kernel version, like "6.14" to
# install instead of Fedora defaults.
######################################################################
set -eu
@@ -55,6 +57,7 @@ function freebsd() {
'^samba4[[:digit:]]+$' \
'^py3[[:digit:]]+-cffi$' \
'^py3[[:digit:]]+-sysctl$' \
'^py3[[:digit:]]+-setuptools$' \
'^py3[[:digit:]]+-packaging$'
echo "##[endgroup]"
}
@@ -94,6 +97,25 @@ function tumbleweed() {
echo "##[endgroup]"
}
# $1: Kernel version to install (like '6.14rc7')
function install_fedora_experimental_kernel {
our_version="$1"
sudo dnf -y copr enable @kernel-vanilla/stable
sudo dnf -y copr enable @kernel-vanilla/mainline
all="$(sudo dnf list --showduplicates kernel-*)"
echo "Available versions:"
echo "$all"
# You can have a bunch of minor variants of the version we want '6.14'.
# Pick the newest variant (sorted by version number).
specific_version=$(echo "$all" | grep $our_version | awk '{print $2}' | sort -V | tail -n 1)
list="$(echo "$all" | grep $specific_version | grep -Ev 'kernel-rt|kernel-selftests|kernel-debuginfo' | sed 's/.x86_64//g' | awk '{print $1"-"$2}')"
sudo dnf install -y $list
sudo dnf -y copr disable @kernel-vanilla/stable
sudo dnf -y copr disable @kernel-vanilla/mainline
}
# Install dependencies
case "$1" in
almalinux8)
@@ -106,7 +128,7 @@ case "$1" in
sudo dnf install -y kernel-abi-whitelists
echo "##[endgroup]"
;;
almalinux9|centos-stream9|centos-stream10)
almalinux9|almalinux10|centos-stream9|centos-stream10)
echo "##[group]Enable epel and crb repositories"
sudo dnf config-manager -y --set-enabled crb
sudo dnf install -y epel-release
@@ -132,6 +154,11 @@ case "$1" in
# Fedora 42+ moves /usr/bin/script from 'util-linux' to 'util-linux-script'
sudo dnf install -y util-linux-script || true
# Optional: Install an experimental kernel ($2 = kernel version)
if [ -n "${2:-}" ] ; then
install_fedora_experimental_kernel "$2"
fi
;;
freebsd*)
freebsd
@@ -144,9 +171,7 @@ case "$1" in
echo "##[group]Install Ubuntu specific"
sudo apt-get install -yq linux-tools-common libtirpc-dev \
linux-modules-extra-$(uname -r)
if [ "$1" != "ubuntu20" ]; then
sudo apt-get install -yq dh-sequence-dkms
fi
sudo apt-get install -yq dh-sequence-dkms
echo "##[endgroup]"
echo "##[group]Delete Ubuntu OpenZFS modules"
for i in $(find /lib/modules -name zfs -type d); do sudo rm -rvf $i; done
+14 -1
View File
@@ -3,12 +3,25 @@
# script on it.
#
# $1: OS name (like 'fedora41')
# $2: (optional) Experimental kernel version to install on fedora,
# like "6.14".
######################################################################
.github/workflows/scripts/qemu-wait-for-vm.sh vm0
# SPECIAL CASE:
#
# If the user passed in an experimental kernel version to test on Fedora,
# we need to update the kernel version in zfs's META file to allow the
# build to happen. We update our local copy of META here, since we know
# it will be rsync'd up in the next step.
if [ -n "${2:-}" ] ; then
sed -i -E 's/Linux-Maximum: .+/Linux-Maximum: 99.99/g' META
fi
scp .github/workflows/scripts/qemu-3-deps-vm.sh zfs@vm0:qemu-3-deps-vm.sh
PID=`pidof /usr/bin/qemu-system-x86_64`
ssh zfs@vm0 '$HOME/qemu-3-deps-vm.sh' $1
ssh zfs@vm0 '$HOME/qemu-3-deps-vm.sh' "$@"
# wait for poweroff to succeed
tail --pid=$PID -f /dev/null
sleep 5 # avoid this: "error: Domain is already active"
+1 -1
View File
@@ -326,7 +326,7 @@ fi
#
# rhel8.10
# almalinux9.5
# fedora40
# fedora42
source /etc/os-release
sudo hostname "$ID$VERSION_ID"
+13 -21
View File
@@ -14,39 +14,33 @@ PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
# default values per test vm:
VMs=2
CPU=2
# cpu pinning
CPUSET=("0,1" "2,3")
case "$OS" in
freebsd*)
# FreeBSD can't be optimized via ksmtuned
# FreeBSD needs only 6GiB
RAM=6
;;
*)
# Linux can be optimized via ksmtuned
# Linux needs more memory, but can be optimized to share it via KSM
RAM=8
;;
esac
# this can be different for each distro
echo "VMs=$VMs" >> $ENV
# create snapshot we can clone later
sudo zfs snapshot zpool/openzfs@now
# setup the testing vm's
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
for i in $(seq 1 $VMs); do
# start testing VMs
for ((i=1; i<=VMs; i++)); do
echo "Creating disk for vm$i..."
DISK="/dev/zvol/zpool/vm$i"
FORMAT="raw"
sudo zfs clone zpool/openzfs@now zpool/vm$i
sudo zfs create -ps -b 64k -V 80g zpool/vm$i-2
sudo zfs clone zpool/openzfs@now zpool/vm$i-system
sudo zfs create -ps -b 64k -V 64g zpool/vm$i-tests
cat <<EOF > /tmp/user-data
#cloud-config
@@ -83,23 +77,21 @@ EOF
--graphics none \
--cloud-init user-data=/tmp/user-data \
--network bridge=virbr0,model=$NIC,mac="52:54:00:83:79:0$i" \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-2,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-system,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-tests,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
done
# check the memory state from time to time
# generate some memory stats
cat <<EOF > cronjob.sh
# $OS
exec 1>>/var/tmp/stats.txt
exec 2>&1
echo "*******************************************************"
date
echo "********************************************************************************"
uptime
free -m
df -h /mnt/tests
zfs list
EOF
sudo chmod +x cronjob.sh
sudo mv -f cronjob.sh /root/cronjob.sh
echo '*/5 * * * * /root/cronjob.sh' > crontab.txt
@@ -108,7 +100,7 @@ rm crontab.txt
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
.github/workflows/scripts/qemu-wait-for-vm.sh vm$i
done
echo "All $VMs VMs are up now."
@@ -116,7 +108,7 @@ echo "All $VMs VMs are up now."
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
+17 -5
View File
@@ -45,7 +45,7 @@ if [ -z ${1:-} ]; then
echo 0 > /tmp/ctr
date "+%s" > /tmp/tsstart
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
IP="192.168.122.1$i"
daemonize -c /var/tmp -p vm${i}.pid -o vm${i}log.txt -- \
$SSH zfs@$IP $TESTS $OS $i $VMs $CI_TYPE
@@ -58,7 +58,7 @@ if [ -z ${1:-} ]; then
done
# wait for all vm's to finish
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
tail --pid=$(cat vm${i}.pid) -f /dev/null
pid=$(cat vm${i}log.pid)
rm -f vm${i}log.pid
@@ -72,19 +72,31 @@ fi
export PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin"
case "$1" in
freebsd*)
TDIR="/usr/local/share/zfs"
sudo kldstat -n zfs 2>/dev/null && sudo kldunload zfs
sudo -E ./zfs/scripts/zfs.sh
TDIR="/usr/local/share/zfs"
sudo mv -f /var/tmp/*.txt /tmp
sudo newfs -U -t -L tmp /dev/vtbd1 >/dev/null
sudo mount -o noatime /dev/vtbd1 /var/tmp
sudo chmod 1777 /var/tmp
sudo mv -f /tmp/*.txt /var/tmp
;;
*)
# use xfs @ /var/tmp for all distros
TDIR="/usr/share/zfs"
sudo -E modprobe zfs
sudo mv -f /var/tmp/*.txt /tmp
sudo mkfs.xfs -fq /dev/vdb
sudo mount -o noatime /dev/vdb /var/tmp
sudo chmod 1777 /var/tmp
sudo mv -f /tmp/*.txt /var/tmp
sudo -E modprobe zfs
TDIR="/usr/share/zfs"
;;
esac
# enable io_uring on el9/el10
case "$1" in
almalinux9|almalinux10|centos-stream*)
sudo sysctl kernel.io_uring_disabled=0 > /dev/null
;;
esac
+2 -2
View File
@@ -28,7 +28,7 @@ BASE="$HOME/work/zfs/zfs"
MERGE="$BASE/.github/workflows/scripts/merge_summary.awk"
# catch result files of testings (vm's should be there)
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
rsync -arL zfs@vm$i:$RESPATH/current $RESPATH/vm$i || true
scp zfs@vm$i:"/var/tmp/*.txt" $RESPATH/vm$i || true
scp zfs@vm$i:"/var/tmp/*.rpm" $RESPATH/vm$i || true
@@ -37,7 +37,7 @@ cp -f /var/tmp/*.txt $RESPATH || true
cd $RESPATH
# prepare result files for summary
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
file="vm$i/build-stderr.txt"
test -s $file && mv -f $file build-stderr.txt
+1 -1
View File
@@ -45,7 +45,7 @@ fi
echo -e "\nFull logs for download:\n $1\n"
for i in $(seq 1 $VMs); do
for ((i=1; i<=VMs; i++)); do
rv=$(cat vm$i/tests-exitcode.txt)
if [ $rv = 0 ]; then
+1 -1
View File
@@ -47,7 +47,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: ['almalinux8', 'almalinux9', 'fedora40', 'fedora41', 'fedora42']
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora41', 'fedora42']
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
+19 -6
View File
@@ -15,6 +15,11 @@ on:
required: false
default: false
description: 'Test on CentOS 10 stream'
fedora_kernel_ver:
type: string
required: false
default: ""
description: "(optional) Experimental kernel version to install on Fedora (like '6.14' or '6.13.3-0.rc3')"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -34,8 +39,8 @@ jobs:
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "debian11", "debian12", "fedora40", "fedora41", "fedora42", "freebsd13-4r", "freebsd14-2r", "freebsd15-0c", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora42", "freebsd14-2r", "ubuntu24"]'
FULL_OS='["almalinux8", "almalinux9", "almalinux10", "debian11", "debian12", "fedora41", "fedora42", "freebsd13-4r", "freebsd14-2s", "freebsd15-0c", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "almalinux10", "debian12", "fedora42", "freebsd14-2r", "ubuntu24"]'
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
@@ -48,7 +53,15 @@ jobs:
else
os_selection="$FULL_OS"
fi
os_json=$(echo ${os_selection} | jq -c)
if [ ${{ github.event.inputs.fedora_kernel_ver }} != "" ] ; then
# They specified a custom kernel version for Fedora. Use only
# Fedora runners.
os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]')
else
# Normal case
os_json=$(echo ${os_selection} | jq -c)
fi
# Add optional runners
if [ "${{ github.event.inputs.include_stream9 }}" == 'true' ]; then
@@ -68,8 +81,8 @@ jobs:
strategy:
fail-fast: false
matrix:
# rhl: almalinux8, almalinux9, centos-stream9, fedora40, fedora41
# debian: debian11, debian12, ubuntu20, ubuntu22, ubuntu24
# rhl: almalinux8, almalinux9, centos-stream9, fedora41
# debian: debian11, debian12, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD variants of 2024-12:
# FreeBSD Release: freebsd13-4r, freebsd14-2r
@@ -92,7 +105,7 @@ jobs:
- name: Install dependencies
timeout-minutes: 20
run: .github/workflows/scripts/qemu-3-deps.sh ${{ matrix.os }}
run: .github/workflows/scripts/qemu-3-deps.sh ${{ matrix.os }} ${{ github.event.inputs.fedora_kernel_ver }}
- name: Build modules
timeout-minutes: 30
+6 -2
View File
@@ -80,11 +80,13 @@ Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Alexander Ziaee <ziaee@FreeBSD.org> <concussious@runbox.com>
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Felix Schmidt <felixschmidt20@aol.com> <f.sch.prototype@gmail.com>
Olivier Certner <olce@FreeBSD.org> <olce.freebsd@certner.fr>
Phil Sutter <phil@nwl.cc> <p.github@nwl.cc>
poscat <poscat@poscat.moe> <poscat0x04@outlook.com>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>
@@ -101,6 +103,7 @@ Tulsi Jain <tulsi.jain@delphix.com> <tulsi.jain@Tulsi-Jains-MacBook-Pro.local>
# Mappings from Github no-reply addresses
ajs124 <git@ajs124.de> <ajs124@users.noreply.github.com>
Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Aleksandr Liber <aleksandr.liber@perforce.com> <61714074+AleksandrLiber@users.noreply.github.com>
Alexander Lobakin <alobakin@pm.me> <solbjorn@users.noreply.github.com>
Alexey Smirnoff <fling@member.fsf.org> <fling-@users.noreply.github.com>
Allen Holl <allen.m.holl@gmail.com> <65494904+allen-4@users.noreply.github.com>
@@ -137,6 +140,7 @@ Fedor Uporov <fuporov.vstack@gmail.com> <60701163+fuporovvStack@users.noreply.gi
Felix Dörre <felix@dogcraft.de> <felixdoerre@users.noreply.github.com>
Felix Neumärker <xdch47@posteo.de> <34678034+xdch47@users.noreply.github.com>
Finix Yan <yancw@info2soft.com> <Finix1979@users.noreply.github.com>
Friedrich Weber <f.weber@proxmox.com> <56110206+frwbr@users.noreply.github.com>
Gaurav Kumar <gauravk.18@gmail.com> <gaurkuma@users.noreply.github.com>
George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com>
Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com>
+9 -1
View File
@@ -29,6 +29,7 @@ CONTRIBUTORS:
Alejandro Colomar <Colomar.6.4.3@GMail.com>
Alejandro R. Sedeño <asedeno@mit.edu>
Alek Pinchuk <alek@nexenta.com>
Aleksandr Liber <aleksandr.liber@perforce.com>
Aleksa Sarai <cyphar@cyphar.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Lobakin <alobakin@pm.me>
@@ -81,6 +82,7 @@ CONTRIBUTORS:
Arne Jansen <arne@die-jansens.de>
Aron Xu <happyaron.xu@gmail.com>
Arshad Hussain <arshad.hussain@aeoncomputing.com>
Artem <artem.vlasenko@ossrevival.org>
Arun KV <arun.kv@datacore.com>
Arvind Sankar <nivedita@alum.mit.edu>
Attila Fülöp <attila@fueloep.org>
@@ -227,10 +229,12 @@ CONTRIBUTORS:
Fedor Uporov <fuporov.vstack@gmail.com>
Felix Dörre <felix@dogcraft.de>
Felix Neumärker <xdch47@posteo.de>
Felix Schmidt <felixschmidt20@aol.com>
Feng Sun <loyou85@gmail.com>
Finix Yan <yancw@info2soft.com>
Francesco Mazzoli <f@mazzo.li>
Frederik Wessels <wessels147@gmail.com>
Friedrich Weber <f.weber@proxmox.com>
Frédéric Vanniere <f.vanniere@planet-work.com>
Gabriel A. Devenyi <gdevenyi@gmail.com>
Garrett D'Amore <garrett@nexenta.com>
@@ -484,7 +488,7 @@ CONTRIBUTORS:
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Certner <olce@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
@@ -522,6 +526,7 @@ CONTRIBUTORS:
P.SCH <p88@yahoo.com>
Qiuhao Chen <chenqiuhao1997@gmail.com>
Quartz <yyhran@163.com>
Quentin Thébault <quentin.thebault@defenso.fr>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
@@ -573,6 +578,7 @@ CONTRIBUTORS:
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastian Pauka <me@spauka.se>
Sebastian Wuerl <s.wuerl@mailbox.org>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
@@ -589,6 +595,7 @@ CONTRIBUTORS:
Shen Yan <shenyanxxxy@qq.com>
Sietse <sietse@wizdom.nu>
Simon Guest <simon.guest@tesujimath.org>
Simon Howard <fraggle@soulsphere.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Spencer Kinny <spencerkinny1995@gmail.com>
@@ -610,6 +617,7 @@ CONTRIBUTORS:
Stéphane Lesimple <speed47_github@speed47.net>
Suman Chakravartula <schakrava@gmail.com>
Sydney Vanda <sydney.m.vanda@intel.com>
Syed Shahrukh Hussain <syed.shahrukh@ossrevival.org>
Sören Tempel <soeren+git@soeren-tempel.net>
Tamas TEVESZ <ice@extreme.hu>
Teodor Spæren <teodor_spaeren@riseup.net>
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.2
Version: 2.3.3
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.14
Linux-Maximum: 6.15
Linux-Minimum: 4.18
+7 -6
View File
@@ -735,13 +735,14 @@ def calculate():
v[group["percent"]] if v[group["percent"]] > 0 else 0
if l2exist:
l2asize = cur["l2_asize"]
v["l2hits"] = d["l2_hits"] / sint
v["l2miss"] = d["l2_misses"] / sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] / v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2asize"] = l2asize
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2wbytes"] = d["l2_write_bytes"] / sint
@@ -751,11 +752,11 @@ def calculate():
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["l2pref%"] = 100 * v["l2pref"] / l2asize if l2asize > 0 else 0
v["l2mfu%"] = 100 * v["l2mfu"] / l2asize if l2asize > 0 else 0
v["l2mru%"] = 100 * v["l2mru"] / l2asize if l2asize > 0 else 0
v["l2data%"] = 100 * v["l2data"] / l2asize if l2asize > 0 else 0
v["l2meta%"] = 100 * v["l2meta"] / l2asize if l2asize > 0 else 0
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
+1 -1
View File
@@ -4440,7 +4440,7 @@ zfs_do_rollback(int argc, char **argv)
if (cb.cb_create > 0)
min_txg = cb.cb_create;
if ((ret = zfs_iter_snapshots_v2(zhp, 0, rollback_check, &cb,
if ((ret = zfs_iter_snapshots_sorted_v2(zhp, 0, rollback_check, &cb,
min_txg, 0)) != 0)
goto out;
if ((ret = zfs_iter_bookmarks_v2(zhp, 0, rollback_check, &cb)) != 0)
+20 -19
View File
@@ -521,11 +521,11 @@ get_usage(zpool_help_t idx)
return (gettext("\ttrim [-dw] [-r <rate>] [-c | -s] <pool> "
"[<device> ...]\n"));
case HELP_STATUS:
return (gettext("\tstatus [--power] [-j [--json-int, "
"--json-flat-vdevs, ...\n"
"\t --json-pool-key-guid]] [-c [script1,script2,...]] "
"[-dDegiLpPstvx] ...\n"
"\t [-T d|u] [pool] [interval [count]]\n"));
return (gettext("\tstatus [-DdegiLPpstvx] "
"[-c script1[,script2,...]] ...\n"
"\t [-j|--json [--json-flat-vdevs] [--json-int] "
"[--json-pool-key-guid]] ...\n"
"\t [-T d|u] [--power] [pool] [interval [count]]\n"));
case HELP_UPGRADE:
return (gettext("\tupgrade\n"
"\tupgrade -v\n"
@@ -10432,10 +10432,9 @@ print_status_reason(zpool_handle_t *zhp, status_cbdata_t *cbp,
break;
case ZPOOL_STATUS_REMOVED_DEV:
snprintf(status, ST_SIZE, gettext("One or more devices has "
"been removed by the administrator.\n\tSufficient "
"replicas exist for the pool to continue functioning in "
"a\n\tdegraded state.\n"));
snprintf(status, ST_SIZE, gettext("One or more devices have "
"been removed.\n\tSufficient replicas exist for the pool "
"to continue functioning in a\n\tdegraded state.\n"));
snprintf(action, AC_SIZE, gettext("Online the device "
"using zpool online' or replace the device with\n\t'zpool "
"replace'.\n"));
@@ -10980,28 +10979,30 @@ status_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool status [-c [script1,script2,...]] [-dDegiLpPstvx] [--power] ...
* [-T d|u] [pool] [interval [count]]
* zpool status [-dDegiLpPstvx] [-c [script1,script2,...]] ...
* [-j|--json [--json-flat-vdevs] [--json-int] ...
* [--json-pool-key-guid]] [--power] [-T d|u] ...
* [pool] [interval [count]]
*
* -c CMD For each vdev, run command CMD
* -d Display Direct I/O write verify errors
* -D Display dedup status (undocumented)
* -d Display Direct I/O write verify errors
* -e Display only unhealthy vdevs
* -g Display guid for individual vdev name.
* -i Display vdev initialization status.
* -j [...] Display output in JSON format
* --json-flat-vdevs Display vdevs in flat hierarchy
* --json-int Display numbers in integer format instead of string
* --json-pool-key-guid Use pool GUID as key for pool objects
* -L Follow links when resolving vdev path name.
* -p Display values in parsable (exact) format.
* -P Display full path for vdev name.
* -p Display values in parsable (exact) format.
* --power Display vdev enclosure slot power status
* -s Display slow IOs column.
* -t Display vdev TRIM status.
* -T Display a timestamp in date(1) or Unix format
* -t Display vdev TRIM status.
* -v Display complete error logs
* -x Display only pools with potential problems
* -j Display output in JSON format
* --power Display vdev enclosure slot power status
* --json-int Display numbers in inteeger format instead of string
* --json-flat-vdevs Display vdevs in flat hierarchy
* --json-pool-key-guid Use pool GUID as key for pool objects
*
* Describes the health status of all pools or some subset.
*/
+1 -2
View File
@@ -10,8 +10,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/include \
-I$(top_srcdir)/module/icp/include \
-I$(top_srcdir)/lib/libspl/include \
-I$(top_srcdir)/lib/libspl/include/os/@ac_system_l@ \
-I$(top_srcdir)/lib/libzpool/include
-I$(top_srcdir)/lib/libspl/include/os/@ac_system_l@
AM_LIBTOOLFLAGS = --silent
+42 -15
View File
@@ -2,6 +2,22 @@ dnl #
dnl # Supported mkdir() interfaces checked newest to oldest.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_MKDIR], [
dnl #
dnl # 6.15 API change
dnl # mkdir() returns struct dentry *
dnl #
ZFS_LINUX_TEST_SRC([mkdir_return_dentry], [
#include <linux/fs.h>
static struct dentry *mkdir(struct mnt_idmap *idmap,
struct inode *inode, struct dentry *dentry,
umode_t umode) { return dentry; }
static const struct inode_operations
iops __attribute__ ((unused)) = {
.mkdir = mkdir,
};
],[])
dnl #
dnl # 6.3 API change
dnl # mkdir() takes struct mnt_idmap * as the first arg
@@ -59,29 +75,40 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MKDIR], [
AC_DEFUN([ZFS_AC_KERNEL_MKDIR], [
dnl #
dnl # 6.3 API change
dnl # mkdir() takes struct mnt_idmap * as the first arg
dnl # 6.15 API change
dnl # mkdir() returns struct dentry *
dnl #
AC_MSG_CHECKING([whether iops->mkdir() takes struct mnt_idmap*])
ZFS_LINUX_TEST_RESULT([mkdir_mnt_idmap], [
AC_MSG_CHECKING([whether iops->mkdir() returns struct dentry*])
ZFS_LINUX_TEST_RESULT([mkdir_return_dentry], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IOPS_MKDIR_IDMAP, 1,
[iops->mkdir() takes struct mnt_idmap*])
AC_DEFINE(HAVE_IOPS_MKDIR_DENTRY, 1,
[iops->mkdir() returns struct dentry*])
],[
AC_MSG_RESULT(no)
dnl #
dnl # 5.12 API change
dnl # The struct user_namespace arg was added as the first argument to
dnl # mkdir() of the iops structure.
dnl # 6.3 API change
dnl # mkdir() takes struct mnt_idmap * as the first arg
dnl #
AC_MSG_CHECKING([whether iops->mkdir() takes struct user_namespace*])
ZFS_LINUX_TEST_RESULT([mkdir_user_namespace], [
AC_MSG_CHECKING([whether iops->mkdir() takes struct mnt_idmap*])
ZFS_LINUX_TEST_RESULT([mkdir_mnt_idmap], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IOPS_MKDIR_USERNS, 1,
[iops->mkdir() takes struct user_namespace*])
AC_DEFINE(HAVE_IOPS_MKDIR_IDMAP, 1,
[iops->mkdir() takes struct mnt_idmap*])
],[
AC_MSG_RESULT(no)
dnl #
dnl # 5.12 API change
dnl # The struct user_namespace arg was added as the first argument to
dnl # mkdir() of the iops structure.
dnl #
AC_MSG_CHECKING([whether iops->mkdir() takes struct user_namespace*])
ZFS_LINUX_TEST_RESULT([mkdir_user_namespace], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_IOPS_MKDIR_USERNS, 1,
[iops->mkdir() takes struct user_namespace*])
],[
AC_MSG_RESULT(no)
])
])
])
])
+19
View File
@@ -11,10 +11,12 @@ AC_DEFUN([ZFS_AC_KERNEL_OBJTOOL_HEADER], [
#include <linux/objtool.h>
],[
],[
objtool_header=$LINUX/include/linux/objtool.h
AC_DEFINE(HAVE_KERNEL_OBJTOOL_HEADER, 1,
[kernel has linux/objtool.h])
AC_MSG_RESULT(linux/objtool.h)
],[
objtool_header=$LINUX/include/linux/frame.h
AC_MSG_RESULT(linux/frame.h)
])
])
@@ -62,6 +64,23 @@ AC_DEFUN([ZFS_AC_KERNEL_OBJTOOL], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_STACK_FRAME_NON_STANDARD, 1,
[STACK_FRAME_NON_STANDARD is defined])
dnl # Needed for kernels missing the asm macro. We grep
dnl # for it in the header file since there is currently
dnl # no test to check the result of assembling a file.
AC_MSG_CHECKING(
[whether STACK_FRAME_NON_STANDARD asm macro is defined])
dnl # Escape square brackets.
sp='@<:@@<:@:space:@:>@@:>@'
dotmacro='@<:@.@:>@macro'
regexp="^$sp*$dotmacro$sp+STACK_FRAME_NON_STANDARD$sp"
AS_IF([$EGREP -s -q "$regexp" $objtool_header],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_STACK_FRAME_NON_STANDARD_ASM, 1,
[STACK_FRAME_NON_STANDARD asm macro is defined])
],[
AC_MSG_RESULT(no)
])
],[
AC_MSG_RESULT(no)
])
+27
View File
@@ -0,0 +1,27 @@
# dnl
# dnl 5.8 (735e4ae5ba28) introduced a superblock scoped errseq_t to use to
# dnl record writeback errors for syncfs() to return. Up until 5.17, when
# dnl sync_fs errors were returned directly, this is the only way for us to
# dnl report an error from syncfs().
# dnl
AC_DEFUN([ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_WB_ERR], [
ZFS_LINUX_TEST_SRC([super_block_s_wb_err], [
#include <linux/fs.h>
static const struct super_block
sb __attribute__ ((unused)) = {
.s_wb_err = 0,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_SUPER_BLOCK_S_WB_ERR], [
AC_MSG_CHECKING([whether super_block has s_wb_err])
ZFS_LINUX_TEST_RESULT([super_block_s_wb_err], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SUPER_BLOCK_S_WB_ERR, 1,
[have super_block s_wb_err])
],[
AC_MSG_RESULT(no)
])
])
+32
View File
@@ -0,0 +1,32 @@
dnl #
dnl # 6.2: timer_delete_sync introduced, del_timer_sync deprecated and made
dnl # into a simple wrapper
dnl # 6.15: del_timer_sync removed
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_TIMER_DELETE_SYNC], [
ZFS_LINUX_TEST_SRC([timer_delete_sync], [
#include <linux/timer.h>
],[
struct timer_list *timer __attribute__((unused)) = NULL;
timer_delete_sync(timer);
])
])
AC_DEFUN([ZFS_AC_KERNEL_TIMER_DELETE_SYNC], [
AC_MSG_CHECKING([whether timer_delete_sync() is available])
ZFS_LINUX_TEST_RESULT([timer_delete_sync], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_TIMER_DELETE_SYNC, 1,
[timer_delete_sync is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_TIMER], [
ZFS_AC_KERNEL_SRC_TIMER_DELETE_SYNC
])
AC_DEFUN([ZFS_AC_KERNEL_TIMER], [
ZFS_AC_KERNEL_TIMER_DELETE_SYNC
])
+4
View File
@@ -130,6 +130,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING
ZFS_AC_KERNEL_SRC_FILE
ZFS_AC_KERNEL_SRC_PIN_USER_PAGES
ZFS_AC_KERNEL_SRC_TIMER
ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_WB_ERR
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@@ -244,6 +246,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_1ARG_ASSIGN_STR
ZFS_AC_KERNEL_FILE
ZFS_AC_KERNEL_PIN_USER_PAGES
ZFS_AC_KERNEL_TIMER
ZFS_AC_KERNEL_SUPER_BLOCK_S_WB_ERR
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
+16
View File
@@ -34,6 +34,22 @@ AC_DEFUN([ZFS_AC_CONFIG_USER_LIBUNWIND], [
], [
AC_MSG_RESULT([no])
])
dnl LLVM includes it's own libunwind library, which
dnl defines the highest numbered register in a different
dnl way, and has an incompatible unw_resname function.
AC_MSG_CHECKING([whether libunwind is llvm libunwind])
AC_COMPILE_IFELSE([
AC_LANG_PROGRAM([
#include <libunwind.h>
#if !defined(_LIBUNWIND_HIGHEST_DWARF_REGISTER)
#error "_LIBUNWIND_HIGHEST_DWARF_REGISTER is not defined"
#endif
], [])], [
AC_MSG_RESULT([yes])
AC_DEFINE(IS_LIBUNWIND_LLVM, 1, [libunwind is llvm libunwind])
], [
AC_MSG_RESULT([no])
])
AX_RESTORE_FLAGS
], [
AS_IF([test "x$with_libunwind" = "xyes"], [
+1
View File
@@ -591,6 +591,7 @@ _LIBZFS_H int zfs_crypto_attempt_load_keys(libzfs_handle_t *, const char *);
_LIBZFS_H int zfs_crypto_load_key(zfs_handle_t *, boolean_t, const char *);
_LIBZFS_H int zfs_crypto_unload_key(zfs_handle_t *);
_LIBZFS_H int zfs_crypto_rewrap(zfs_handle_t *, nvlist_t *, boolean_t);
_LIBZFS_H boolean_t zfs_is_encrypted(zfs_handle_t *);
typedef struct zprop_list {
int pl_prop;
+1 -4
View File
@@ -43,10 +43,7 @@ extern "C" {
#endif
#define EXPORT_SYMBOL(x)
#define module_param(a, b, c)
#define module_param_call(a, b, c, d, e)
#define module_param_named(a, b, c, d)
#define MODULE_PARM_DESC(a, b)
#define asm __asm
#ifdef ZFS_DEBUG
#undef NDEBUG
+29 -42
View File
@@ -112,14 +112,13 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
} while (0)
#define VERIFY3B(LEFT, OP, RIGHT) do { \
const boolean_t _verify3_left = (boolean_t)(LEFT); \
const boolean_t _verify3_right = (boolean_t)(RIGHT); \
const boolean_t _verify3_left = (boolean_t)!!(LEFT); \
const boolean_t _verify3_right = (boolean_t)!!(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3B(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%d " #OP " %d)\n", \
(boolean_t)_verify3_left, \
(boolean_t)_verify3_right); \
_verify3_left, _verify3_right); \
} while (0)
#define VERIFY3S(LEFT, OP, RIGHT) do { \
@@ -127,7 +126,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3S(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%lld " #OP " %lld)\n", \
(long long)_verify3_left, \
(long long)_verify3_right); \
@@ -138,7 +137,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uint64_t _verify3_right = (uint64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3U(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%llu " #OP " %llu)\n", \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right); \
@@ -149,8 +148,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%px " #OP " %px)\n", \
"VERIFY3P(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%p " #OP " %p)\n", \
(void *)_verify3_left, \
(void *)_verify3_right); \
} while (0)
@@ -159,8 +158,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify0_right = (int64_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(" #RIGHT ") " \
"failed (0 == %lld)\n", \
"VERIFY0(" #RIGHT ") failed (%lld)\n", \
(long long)_verify0_right); \
} while (0)
@@ -168,8 +166,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify0_right = (uintptr_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0P(" #RIGHT ") " \
"failed (NULL == %px)\n", \
"VERIFY0P(" #RIGHT ") failed (%p)\n", \
(void *)_verify0_right); \
} while (0)
@@ -182,14 +179,13 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
*/
#define VERIFY3BF(LEFT, OP, RIGHT, STR, ...) do { \
const boolean_t _verify3_left = (boolean_t)(LEFT); \
const boolean_t _verify3_right = (boolean_t)(RIGHT); \
const boolean_t _verify3_left = (boolean_t)!!(LEFT); \
const boolean_t _verify3_right = (boolean_t)!!(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3B(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%d " #OP " %d) " STR "\n", \
(boolean_t)(_verify3_left), \
(boolean_t)(_verify3_right), \
_verify3_left, _verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -198,10 +194,9 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3S(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
(long long)_verify3_left, (long long)_verify3_right,\
__VA_ARGS__); \
} while (0)
@@ -210,10 +205,10 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uint64_t _verify3_right = (uint64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3U(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -222,32 +217,27 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%px " #OP " %px) " STR "\n", \
(void *) (_verify3_left), \
(void *) (_verify3_right), \
"VERIFY3P(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%p " #OP " %p) " STR "\n", \
(void *)_verify3_left, (void *)_verify3_right, \
__VA_ARGS__); \
} while (0)
#define VERIFY0PF(RIGHT, STR, ...) do { \
const uintptr_t _verify3_left = (uintptr_t)(0); \
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
if (unlikely(!(0 == _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"failed (0 == %px) " STR "\n", \
(long long) (_verify3_right), \
"VERIFY0P(" #RIGHT ") failed (%p) " STR "\n", \
(void *)_verify3_right, \
__VA_ARGS__); \
} while (0)
#define VERIFY0F(RIGHT, STR, ...) do { \
const int64_t _verify3_left = (int64_t)(0); \
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
if (unlikely(!(0 == _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"failed (0 == %lld) " STR "\n", \
(long long) (_verify3_right), \
"VERIFY0(" #RIGHT ") failed (%lld) " STR "\n", \
(long long)_verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -256,10 +246,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_assert("(" #A ") implies (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define VERIFY_EQUIV(A, B) \
((void)(likely(!!(A) == !!(B)) || \
spl_assert("(" #A ") is equivalent to (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define VERIFY_EQUIV(A, B) VERIFY3B(A, ==, B)
/*
* Debugging disabled (--disable-debug)
-1
View File
@@ -39,7 +39,6 @@ struct znode;
int secpolicy_nfs(cred_t *cr);
int secpolicy_zfs(cred_t *crd);
int secpolicy_zfs_proc(cred_t *cr, proc_t *proc);
int secpolicy_sys_config(cred_t *cr, int checkonly);
int secpolicy_zinject(cred_t *cr);
int secpolicy_fs_unmount(cred_t *cr, struct mount *vfsp);
+1
View File
@@ -35,6 +35,7 @@
extern const int zfs_vm_pagerret_bad;
extern const int zfs_vm_pagerret_error;
extern const int zfs_vm_pagerret_ok;
extern const int zfs_vm_pagerret_pend;
extern const int zfs_vm_pagerput_sync;
extern const int zfs_vm_pagerput_inval;
+15 -49
View File
@@ -31,15 +31,6 @@
#include <linux/module.h>
#include <linux/moduleparam.h>
/*
* Despite constifying struct kernel_param_ops, some older kernels define a
* `__check_old_set_param()` function in their headers that checks for a
* non-constified `->set()`. This has long been fixed in Linux mainline, but
* since we support older kernels, we workaround it by using a preprocessor
* definition to disable it.
*/
#define __check_old_set_param(_) (0)
typedef const struct kernel_param zfs_kernel_param_t;
#define ZMOD_RW 0644
@@ -79,48 +70,23 @@ enum scope_prefix_types {
};
/*
* While we define our own s64/u64 types, there is no reason to reimplement the
* existing Linux kernel types, so we use the preprocessor to remap our
* "custom" implementations to the kernel ones. This is done because the CPP
* does not allow us to write conditional definitions. The fourth definition
* exists because the CPP will not allow us to replace things like INT with int
* before string concatenation.
* Our uint64 params are called U64 in part because we had them before Linux
* provided ULLONG param ops. Now it does, and we use them, but we retain the
* U64 name to keep many existing tunables working without issue.
*/
#define spl_param_set_u64 param_set_ullong
#define spl_param_get_u64 param_get_ullong
#define spl_param_ops_U64 param_ops_ullong
#define spl_param_set_int param_set_int
#define spl_param_get_int param_get_int
#define spl_param_ops_int param_ops_int
#define spl_param_ops_INT param_ops_int
#define spl_param_set_long param_set_long
#define spl_param_get_long param_get_long
#define spl_param_ops_long param_ops_long
#define spl_param_ops_LONG param_ops_long
#define spl_param_set_uint param_set_uint
#define spl_param_get_uint param_get_uint
#define spl_param_ops_uint param_ops_uint
#define spl_param_ops_UINT param_ops_uint
#define spl_param_set_ulong param_set_ulong
#define spl_param_get_ulong param_get_ulong
#define spl_param_ops_ulong param_ops_ulong
#define spl_param_ops_ULONG param_ops_ulong
#define spl_param_set_charp param_set_charp
#define spl_param_get_charp param_get_charp
#define spl_param_ops_charp param_ops_charp
#define spl_param_ops_STRING param_ops_charp
int spl_param_set_s64(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_s64(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_s64;
#define spl_param_ops_S64 spl_param_ops_s64
extern int spl_param_set_u64(const char *val, zfs_kernel_param_t *kp);
extern int spl_param_get_u64(char *buffer, zfs_kernel_param_t *kp);
extern const struct kernel_param_ops spl_param_ops_u64;
#define spl_param_ops_U64 spl_param_ops_u64
/*
* We keep our own names for param ops to make expanding them in
* ZFS_MODULE_PARAM easy.
*/
#define spl_param_ops_INT param_ops_int
#define spl_param_ops_LONG param_ops_long
#define spl_param_ops_UINT param_ops_uint
#define spl_param_ops_ULONG param_ops_ulong
#define spl_param_ops_STRING param_ops_charp
/*
* Declare a module parameter / sysctl node
+2 -2
View File
@@ -4,8 +4,8 @@
/*
* Create our own accessor functions to follow the Linux API changes
*/
#define nr_file_pages() global_node_page_state(NR_FILE_PAGES)
#define nr_inactive_anon_pages() global_node_page_state(NR_INACTIVE_ANON)
#define nr_file_pages() (global_node_page_state(NR_ACTIVE_FILE) + \
global_node_page_state(NR_INACTIVE_FILE))
#define nr_inactive_file_pages() global_node_page_state(NR_INACTIVE_FILE)
#endif /* _ZFS_PAGE_COMPAT_H */
+27 -40
View File
@@ -116,14 +116,13 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
} while (0)
#define VERIFY3B(LEFT, OP, RIGHT) do { \
const boolean_t _verify3_left = (boolean_t)(LEFT); \
const boolean_t _verify3_right = (boolean_t)(RIGHT); \
const boolean_t _verify3_left = (boolean_t)!!(LEFT); \
const boolean_t _verify3_right = (boolean_t)!!(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3B(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%d " #OP " %d)\n", \
(boolean_t)_verify3_left, \
(boolean_t)_verify3_right); \
_verify3_left, _verify3_right); \
} while (0)
#define VERIFY3S(LEFT, OP, RIGHT) do { \
@@ -131,7 +130,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3S(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%lld " #OP " %lld)\n", \
(long long)_verify3_left, \
(long long)_verify3_right); \
@@ -142,7 +141,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uint64_t _verify3_right = (uint64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3U(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%llu " #OP " %llu)\n", \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right); \
@@ -153,7 +152,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3P(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%px " #OP " %px)\n", \
(void *)_verify3_left, \
(void *)_verify3_right); \
@@ -163,8 +162,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify0_right = (int64_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(" #RIGHT ") " \
"failed (0 == %lld)\n", \
"VERIFY0(" #RIGHT ") failed (%lld)\n", \
(long long)_verify0_right); \
} while (0)
@@ -172,8 +170,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify0_right = (uintptr_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0P(" #RIGHT ") " \
"failed (NULL == %px)\n", \
"VERIFY0P(" #RIGHT ") failed (%px)\n", \
(void *)_verify0_right); \
} while (0)
@@ -186,14 +183,13 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
*/
#define VERIFY3BF(LEFT, OP, RIGHT, STR, ...) do { \
const boolean_t _verify3_left = (boolean_t)(LEFT); \
const boolean_t _verify3_right = (boolean_t)(RIGHT); \
const boolean_t _verify3_left = (boolean_t)!!(LEFT); \
const boolean_t _verify3_right = (boolean_t)!!(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3B(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%d " #OP " %d) " STR "\n", \
(boolean_t)(_verify3_left), \
(boolean_t)(_verify3_right), \
_verify3_left, _verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -202,10 +198,9 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3S(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%lld " #OP " %lld) " STR "\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right), \
(long long)_verify3_left, (long long)_verify3_right,\
__VA_ARGS__); \
} while (0)
@@ -214,10 +209,10 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uint64_t _verify3_right = (uint64_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3U(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%llu " #OP " %llu) " STR "\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right), \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -226,32 +221,27 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"VERIFY3P(" #LEFT ", " #OP ", " #RIGHT ") " \
"failed (%px " #OP " %px) " STR "\n", \
(void *) (_verify3_left), \
(void *) (_verify3_right), \
(void *)_verify3_left, (void *)_verify3_right, \
__VA_ARGS__); \
} while (0)
#define VERIFY0PF(RIGHT, STR, ...) do { \
const uintptr_t _verify3_left = (uintptr_t)(0); \
const uintptr_t _verify3_right = (uintptr_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
if (unlikely(!(0 == _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"failed (0 == %px) " STR "\n", \
(long long) (_verify3_right), \
"VERIFY0P(" #RIGHT ") failed (%px) " STR "\n", \
(void *)_verify3_right, \
__VA_ARGS__); \
} while (0)
#define VERIFY0F(RIGHT, STR, ...) do { \
const int64_t _verify3_left = (int64_t)(0); \
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
if (unlikely(!(0 == _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"failed (0 == %lld) " STR "\n", \
(long long) (_verify3_right), \
"VERIFY0(" #RIGHT ") failed (%lld) " STR "\n", \
(long long)_verify3_right, \
__VA_ARGS__); \
} while (0)
@@ -260,10 +250,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_assert("(" #A ") implies (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define VERIFY_EQUIV(A, B) \
((void)(likely(!!(A) == !!(B)) || \
spl_assert("(" #A ") is equivalent to (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define VERIFY_EQUIV(A, B) VERIFY3B(A, ==, B)
/*
* Debugging disabled (--disable-debug)
+2 -2
View File
@@ -174,7 +174,7 @@ zfs_uio_bvec_init(zfs_uio_t *uio, struct bio *bio, struct request *rq)
static inline void
zfs_uio_iov_iter_init(zfs_uio_t *uio, struct iov_iter *iter, offset_t offset,
ssize_t resid, size_t skip)
ssize_t resid)
{
uio->uio_iter = iter;
uio->uio_iovcnt = iter->nr_segs;
@@ -184,7 +184,7 @@ zfs_uio_iov_iter_init(zfs_uio_t *uio, struct iov_iter *iter, offset_t offset,
uio->uio_fmode = 0;
uio->uio_extflg = 0;
uio->uio_resid = resid;
uio->uio_skip = skip;
uio->uio_skip = 0;
uio->uio_soffset = uio->uio_loffset;
memset(&uio->uio_dio, 0, sizeof (zfs_uio_dio_t));
}
-1
View File
@@ -52,7 +52,6 @@ int secpolicy_vnode_setids_setgids(const cred_t *, gid_t, zidmap_t *,
struct user_namespace *);
int secpolicy_zinject(const cred_t *);
int secpolicy_zfs(const cred_t *);
int secpolicy_zfs_proc(const cred_t *, proc_t *);
void secpolicy_setid_clear(vattr_t *, cred_t *);
int secpolicy_setid_setsticky_clear(struct inode *, vattr_t *,
const vattr_t *, cred_t *, zidmap_t *, struct user_namespace *);
-35
View File
@@ -123,41 +123,6 @@ extern int zpl_clone_file_range(struct file *src_file, loff_t src_off,
extern int zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
struct file *dst_file, loff_t dst_off, uint64_t len);
/* compat for FICLONE/FICLONERANGE/FIDEDUPERANGE ioctls */
typedef struct {
int64_t fcr_src_fd;
uint64_t fcr_src_offset;
uint64_t fcr_src_length;
uint64_t fcr_dest_offset;
} zfs_ioc_compat_file_clone_range_t;
typedef struct {
int64_t fdri_dest_fd;
uint64_t fdri_dest_offset;
uint64_t fdri_bytes_deduped;
int32_t fdri_status;
uint32_t fdri_reserved;
} zfs_ioc_compat_dedupe_range_info_t;
typedef struct {
uint64_t fdr_src_offset;
uint64_t fdr_src_length;
uint16_t fdr_dest_count;
uint16_t fdr_reserved1;
uint32_t fdr_reserved2;
zfs_ioc_compat_dedupe_range_info_t fdr_info[];
} zfs_ioc_compat_dedupe_range_t;
#define ZFS_IOC_COMPAT_FICLONE _IOW(0x94, 9, int)
#define ZFS_IOC_COMPAT_FICLONERANGE \
_IOW(0x94, 13, zfs_ioc_compat_file_clone_range_t)
#define ZFS_IOC_COMPAT_FIDEDUPERANGE \
_IOWR(0x94, 54, zfs_ioc_compat_dedupe_range_t)
extern long zpl_ioctl_ficlone(struct file *filp, void *arg);
extern long zpl_ioctl_ficlonerange(struct file *filp, void *arg);
extern long zpl_ioctl_fideduperange(struct file *filp, void *arg);
#if defined(HAVE_INODE_TIMESTAMP_TRUNCATE)
#define zpl_inode_timestamp_truncate(ts, ip) timestamp_truncate(ts, ip)
+9
View File
@@ -64,8 +64,15 @@ extern "C" {
(hdr)->b_psize = ((x) >> SPA_MINBLOCKSHIFT); \
} while (0)
/* The l2size in the header is only used by L2 cache */
#define HDR_SET_L2SIZE(hdr, x) do { \
ASSERT(IS_P2ALIGNED((x), 1U << SPA_MINBLOCKSHIFT)); \
(hdr)->b_l2size = ((x) >> SPA_MINBLOCKSHIFT); \
} while (0)
#define HDR_GET_LSIZE(hdr) ((hdr)->b_lsize << SPA_MINBLOCKSHIFT)
#define HDR_GET_PSIZE(hdr) ((hdr)->b_psize << SPA_MINBLOCKSHIFT)
#define HDR_GET_L2SIZE(hdr) ((hdr)->b_l2size << SPA_MINBLOCKSHIFT)
typedef struct arc_buf_hdr arc_buf_hdr_t;
typedef struct arc_buf arc_buf_t;
@@ -323,8 +330,10 @@ void arc_freed(spa_t *spa, const blkptr_t *bp);
int arc_cached(spa_t *spa, const blkptr_t *bp);
void arc_flush(spa_t *spa, boolean_t retry);
void arc_flush_async(spa_t *spa);
void arc_tempreserve_clear(uint64_t reserve);
int arc_tempreserve_space(spa_t *spa, uint64_t reserve, uint64_t txg);
boolean_t arc_async_flush_guid_inuse(uint64_t load_guid);
uint64_t arc_all_memory(void);
uint64_t arc_default_max(uint64_t min, uint64_t allmem);
+4 -4
View File
@@ -379,8 +379,8 @@ typedef struct l2arc_lb_ptr_buf {
* L2ARC Internals
*/
typedef struct l2arc_dev {
vdev_t *l2ad_vdev; /* vdev */
spa_t *l2ad_spa; /* spa */
vdev_t *l2ad_vdev; /* can be NULL during remove */
spa_t *l2ad_spa; /* can be NULL during remove */
uint64_t l2ad_hand; /* next write location */
uint64_t l2ad_start; /* first addr on device */
uint64_t l2ad_end; /* last addr on device */
@@ -476,8 +476,8 @@ struct arc_buf_hdr {
arc_buf_contents_t b_type;
uint8_t b_complevel;
uint8_t b_reserved1; /* used for 4 byte alignment */
uint16_t b_reserved2; /* used for 4 byte alignment */
uint8_t b_reserved1; /* used for 4 byte alignment */
uint16_t b_l2size; /* alignment or L2-only size */
arc_buf_hdr_t *b_hash_next;
arc_flags_t b_flags;
+1
View File
@@ -446,6 +446,7 @@ int dbuf_dnode_findbp(dnode_t *dn, uint64_t level, uint64_t blkid,
void dbuf_init(void);
void dbuf_fini(void);
void dbuf_cache_reduce_target_size(void);
boolean_t dbuf_is_metadata(dmu_buf_impl_t *db);
+7 -1
View File
@@ -814,7 +814,7 @@ void dmu_tx_hold_append(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
void dmu_tx_hold_append_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
int len);
void dmu_tx_hold_clone_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
int len);
uint64_t len, uint_t blksz);
void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
uint64_t len);
void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
@@ -980,6 +980,11 @@ void dmu_object_size_from_db(dmu_buf_t *db, uint32_t *blksize,
void dmu_object_dnsize_from_db(dmu_buf_t *db, int *dnsize);
typedef enum {
DDS_FLAG_ENCRYPTED = (1<<0),
DDS_FLAG_HAS_ENCRYPTED = (1<<7),
} dmu_objset_flag_t;
typedef struct dmu_objset_stats {
uint64_t dds_num_clones; /* number of clones of this */
uint64_t dds_creation_txg;
@@ -989,6 +994,7 @@ typedef struct dmu_objset_stats {
uint8_t dds_inconsistent;
uint8_t dds_redacted;
char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
uint8_t dds_flags; /* dmu_objset_flag_t */
} dmu_objset_stats_t;
/*
-1
View File
@@ -60,7 +60,6 @@ typedef struct dmu_recv_cookie {
uint64_t drc_ivset_guid;
void *drc_owner;
cred_t *drc_cred;
proc_t *drc_proc;
nvlist_t *drc_begin_nvl;
objset_t *drc_os;
+1 -3
View File
@@ -284,7 +284,6 @@ typedef struct dsl_dataset_promote_arg {
uint64_t used, comp, uncomp, unique, cloneusedsnap, originusedsnap;
nvlist_t *err_ds;
cred_t *cr;
proc_t *proc;
} dsl_dataset_promote_arg_t;
typedef struct dsl_dataset_rollback_arg {
@@ -299,7 +298,6 @@ typedef struct dsl_dataset_snapshot_arg {
nvlist_t *ddsa_props;
nvlist_t *ddsa_errors;
cred_t *ddsa_cr;
proc_t *ddsa_proc;
} dsl_dataset_snapshot_arg_t;
typedef struct dsl_dataset_rename_snapshot_arg {
@@ -459,7 +457,7 @@ int dsl_dataset_clone_swap_check_impl(dsl_dataset_t *clone,
void dsl_dataset_clone_swap_sync_impl(dsl_dataset_t *clone,
dsl_dataset_t *origin_head, dmu_tx_t *tx);
int dsl_dataset_snapshot_check_impl(dsl_dataset_t *ds, const char *snapname,
dmu_tx_t *tx, boolean_t recv, uint64_t cnt, cred_t *cr, proc_t *proc);
dmu_tx_t *tx, boolean_t recv, uint64_t cnt, cred_t *cr);
void dsl_dataset_snapshot_sync_impl(dsl_dataset_t *ds, const char *snapname,
dmu_tx_t *tx);
+2 -2
View File
@@ -185,11 +185,11 @@ int dsl_dir_set_reservation(const char *ddname, zprop_source_t source,
uint64_t reservation);
int dsl_dir_activate_fs_ss_limit(const char *);
int dsl_fs_ss_limit_check(dsl_dir_t *, uint64_t, zfs_prop_t, dsl_dir_t *,
cred_t *, proc_t *);
cred_t *);
void dsl_fs_ss_count_adjust(dsl_dir_t *, int64_t, const char *, dmu_tx_t *);
int dsl_dir_rename(const char *oldname, const char *newname);
int dsl_dir_transfer_possible(dsl_dir_t *sdd, dsl_dir_t *tdd,
uint64_t fs_cnt, uint64_t ss_cnt, uint64_t space, cred_t *, proc_t *);
uint64_t fs_cnt, uint64_t ss_cnt, uint64_t space, cred_t *);
boolean_t dsl_dir_is_clone(dsl_dir_t *dd);
void dsl_dir_new_refreservation(dsl_dir_t *dd, struct dsl_dataset *ds,
uint64_t reservation, cred_t *cr, dmu_tx_t *tx);
+8
View File
@@ -31,8 +31,16 @@ extern "C" {
#else
#include <linux/frame.h>
#endif
#if defined(_ASM) && ! defined(HAVE_STACK_FRAME_NON_STANDARD_ASM)
.macro STACK_FRAME_NON_STANDARD func:req
.endm
#endif
#else
#define STACK_FRAME_NON_STANDARD(func)
#if defined(_ASM)
.macro STACK_FRAME_NON_STANDARD func:req
.endm
#endif
#endif
#ifdef __cplusplus
+1
View File
@@ -1103,6 +1103,7 @@ extern boolean_t spa_guid_exists(uint64_t pool_guid, uint64_t device_guid);
extern char *spa_strdup(const char *);
extern void spa_strfree(char *);
extern uint64_t spa_generate_guid(spa_t *spa);
extern uint64_t spa_generate_load_guid(void);
extern void snprintf_blkptr(char *buf, size_t buflen, const blkptr_t *bp);
extern void spa_freeze(spa_t *spa);
extern int spa_change_guid(spa_t *spa, const uint64_t *guidp);
+20 -3
View File
@@ -66,6 +66,20 @@ typedef struct txg_list {
txg_node_t *tl_head[TXG_SIZE];
} txg_list_t;
/*
* Wait flags for txg_wait_synced_flags(). By default (TXG_WAIT_NONE), it will
* wait until the wanted txg is reached, or block forever. Additional flags
* indicate other conditions that the caller is interested in, that will cause
* the wait to break and return an error code describing the condition.
*/
typedef enum {
/* No special flags. Guaranteed to block forever or return 0 */
TXG_WAIT_NONE = 0,
/* If a signal arrives while waiting, abort and return EINTR */
TXG_WAIT_SIGNAL = (1 << 0),
} txg_wait_flag_t;
struct dsl_pool;
extern void txg_init(struct dsl_pool *dp, uint64_t txg);
@@ -86,13 +100,16 @@ extern void txg_kick(struct dsl_pool *dp, uint64_t txg);
* Try to make this happen as soon as possible (eg. kick off any
* necessary syncs immediately). If txg==0, wait for the currently open
* txg to finish syncing.
* See txg_wait_flag_t above for a description of how the flags affect the wait.
*/
extern void txg_wait_synced(struct dsl_pool *dp, uint64_t txg);
extern int txg_wait_synced_flags(struct dsl_pool *dp, uint64_t txg,
txg_wait_flag_t flags);
/*
* Wait as above. Returns true if the thread was signaled while waiting.
* Traditional form of txg_wait_synced_flags, waits forever.
* Shorthand for VERIFY0(txg_wait_synced_flags(dp, TXG_WAIT_NONE))
*/
extern boolean_t txg_wait_synced_sig(struct dsl_pool *dp, uint64_t txg);
extern void txg_wait_synced(struct dsl_pool *dp, uint64_t txg);
/*
* Wait until the given transaction group, or one after it, is
-1
View File
@@ -76,7 +76,6 @@ typedef struct zcp_run_info {
* rather than the 'current' thread's.
*/
cred_t *zri_cred;
proc_t *zri_proc;
/*
* The tx in which this channel program is running.
+3 -1
View File
@@ -632,6 +632,9 @@ extern void delay(clock_t ticks);
#define kcred NULL
#define CRED() NULL
#define crhold(cr) ((void)cr)
#define crfree(cr) ((void)cr)
#define ptob(x) ((x) * PAGESIZE)
#define NN_DIVISOR_1000 (1U << 0)
@@ -744,7 +747,6 @@ extern int zfs_secpolicy_rename_perms(const char *from, const char *to,
cred_t *cr);
extern int zfs_secpolicy_destroy_perms(const char *name, cred_t *cr);
extern int secpolicy_zfs(const cred_t *cr);
extern int secpolicy_zfs_proc(const cred_t *cr, proc_t *proc);
extern zoneid_t getzoneid(void);
/* SID stuff */
+49 -17
View File
@@ -41,9 +41,26 @@
do { ssize_t r __maybe_unused = write(fd, s, n); } while (0)
#define spl_bt_write(fd, s) spl_bt_write_n(fd, s, sizeof (s)-1)
#if defined(HAVE_LIBUNWIND)
#ifdef HAVE_LIBUNWIND
/*
* libunwind-gcc and libunwind-llvm both list registers using an enum,
* unw_regnum_t, however they indicate the highest numbered register for
* a given architecture in different ways. We can check which one is defined
* and mark which libunwind is in use
*/
#ifdef IS_LIBUNWIND_LLVM
#include <libunwind.h>
#define LAST_REG_INDEX _LIBUNWIND_HIGHEST_DWARF_REGISTER
#else
/*
* Need to define UNW_LOCAL_ONLY before importing libunwind.h
* if using libgcc libunwind.
*/
#define UNW_LOCAL_ONLY
#include <libunwind.h>
#define LAST_REG_INDEX UNW_TDEP_LAST_REG
#endif
/*
* Convert `v` to ASCII hex characters. The bottom `n` nybbles (4-bits ie one
@@ -102,14 +119,13 @@ libspl_backtrace(int fd)
unw_init_local(&cp, &uc);
/*
* libunwind's list of possible registers for this architecture is an
* enum, unw_regnum_t. UNW_TDEP_LAST_REG is the highest-numbered
* register in that list, however, not all register numbers in this
* range are defined by the architecture, and not all defined registers
* will be present on every implementation of that architecture.
* Moreover, libunwind provides nice names for most, but not all
* registers, but these are hardcoded; a name being available does not
* mean that register is available.
* Iterate over all registers for the architecture. We've figured
* out the highest number above, however, not all register numbers in
* this range are defined by the architecture, and not all defined
* registers will be present on every implementation of that
* architecture. Moreover, libunwind provides nice names for most, but
* not all registers, but these are hardcoded; a name being available
* does not mean that register is available.
*
* So, we have to pull this all together here. We try to get the value
* of every possible register. If we get a value for it, then the
@@ -120,26 +136,42 @@ libspl_backtrace(int fd)
* thing.
*/
uint_t cols = 0;
for (uint_t regnum = 0; regnum <= UNW_TDEP_LAST_REG; regnum++) {
for (uint_t regnum = 0; regnum <= LAST_REG_INDEX; regnum++) {
/*
* Get the value. Any error probably means the register
* doesn't exist, and we skip it.
* doesn't exist, and we skip it. LLVM libunwind iterates over
* fp registers in the same list, however they have to be
* accessed using unw_get_fpreg instead. Here, we just ignore
* them.
*/
#ifdef IS_LIBUNWIND_LLVM
if (unw_is_fpreg(&cp, regnum) ||
unw_get_reg(&cp, regnum, &v) < 0)
continue;
#else
if (unw_get_reg(&cp, regnum, &v) < 0)
continue;
#endif
/*
* Register name. If libunwind doesn't have a name for it,
* Register name. If GCC libunwind doesn't have a name for it,
* it will return "???". As a shortcut, we just treat '?'
* is an alternate end-of-string character.
* is an alternate end-of-string character. LLVM libunwind will
* return the string 'unknown register', which we detect by
* checking if the register name is longer than 5 characters.
*/
#ifdef IS_LIBUNWIND_LLVM
const char *name = unw_regname(&cp, regnum);
#else
const char *name = unw_regname(regnum);
#endif
for (n = 0; name[n] != '\0' && name[n] != '?'; n++) {}
if (n == 0) {
if (n == 0 || n > 5) {
/*
* No valid name, so make one of the form "?xx", where
* "xx" is the two-char hex of libunwind's register
* number.
* No valid name, or likely llvm_libunwind returned
* unknown_register, so make one of the form "?xx",
* where "xx" is the two-char hex of libunwind's
* register number.
*/
buf[0] = '?';
n = spl_bt_u64_to_hex_str(regnum, 2,
+2
View File
@@ -24,6 +24,8 @@ libspl_rpc_HEADERS = \
libspl_sysdir = $(libspldir)/sys
libspl_sys_HEADERS = \
%D%/sys/abd_os.h \
%D%/sys/abd_impl_os.h \
%D%/sys/acl.h \
%D%/sys/acl_impl.h \
%D%/sys/asm_linkage.h \
+33 -32
View File
@@ -86,12 +86,13 @@ do { \
#define VERIFY3B(LEFT, OP, RIGHT) \
do { \
const boolean_t __left = (boolean_t)(LEFT); \
const boolean_t __right = (boolean_t)(RIGHT); \
const boolean_t __left = (boolean_t)!!(LEFT); \
const boolean_t __right = (boolean_t)!!(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
"VERIFY3B(%s, %s, %s) failed " \
"(%d %s %d)", #LEFT, #OP, #RIGHT, \
__left, #OP, __right); \
} while (0)
#define VERIFY3S(LEFT, OP, RIGHT) \
@@ -100,8 +101,9 @@ do { \
const int64_t __right = (int64_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
"VERIFY3S(%s, %s, %s) failed " \
"(%lld %s 0x%lld)", #LEFT, #OP, #RIGHT, \
(longlong_t)__left, #OP, (longlong_t)__right); \
} while (0)
#define VERIFY3U(LEFT, OP, RIGHT) \
@@ -110,7 +112,8 @@ do { \
const uint64_t __right = (uint64_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT, \
"VERIFY3U(%s, %s, %s) failed " \
"(%llu %s %llu)", #LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
} while (0)
@@ -120,7 +123,8 @@ do { \
const uintptr_t __right = (uintptr_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (%p %s %p)", #LEFT, #OP, #RIGHT, \
"VERIFY3P(%s, %s, %s) failed " \
"(%p %s %p)", #LEFT, #OP, #RIGHT, \
(void *)__left, #OP, (void *)__right); \
} while (0)
@@ -129,7 +133,7 @@ do { \
const uint64_t __left = (uint64_t)(LEFT); \
if (!(__left == 0)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s == 0 (0x%llx == 0)", #LEFT, \
"VERIFY0(%s) failed (%lld)", #LEFT, \
(u_longlong_t)__left); \
} while (0)
@@ -138,7 +142,7 @@ do { \
const uintptr_t __left = (uintptr_t)(LEFT); \
if (!(__left == 0)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s == 0 (%p == 0)", #LEFT, \
"VERIFY0P(%s) failed (%p)", #LEFT, \
(void *)__left); \
} while (0)
@@ -150,13 +154,13 @@ do { \
/* BEGIN CSTYLED */
#define VERIFY3BF(LEFT, OP, RIGHT, STR, ...) \
do { \
const boolean_t __left = (boolean_t)(LEFT); \
const boolean_t __right = (boolean_t)(RIGHT); \
const boolean_t __left = (boolean_t)!!(LEFT); \
const boolean_t __right = (boolean_t)!!(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx) " STR, \
#LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right, \
"VERIFY3B(%s, %s, %s) failed " \
"(%d %s %d) " STR, #LEFT, #OP, #RIGHT, \
__left, #OP, __right, \
__VA_ARGS__); \
} while (0)
@@ -166,9 +170,9 @@ do { \
const int64_t __right = (int64_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx) " STR, \
#LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right, \
"VERIFY3S(%s, %s, %s) failed " \
"(%lld %s %lld) " STR, #LEFT, #OP, #RIGHT, \
(longlong_t)__left, #OP, (longlong_t)__right, \
__VA_ARGS__); \
} while (0)
@@ -178,8 +182,8 @@ do { \
const uint64_t __right = (uint64_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx) " STR, \
#LEFT, #OP, #RIGHT, \
"VERIFY3U(%s, %s, %s) failed " \
"(%llu %s %llu) " STR, #LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right, \
__VA_ARGS__); \
} while (0)
@@ -190,20 +194,20 @@ do { \
const uintptr_t __right = (uintptr_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx) " STR, \
#LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right, \
"VERIFY3P(%s, %s, %s) failed " \
"(%p %s %p) " STR, #LEFT, #OP, #RIGHT, \
(void *)__left, #OP, (void *)__right, \
__VA_ARGS__); \
} while (0)
/* END CSTYLED */
#define VERIFY0F(LEFT, STR, ...) \
do { \
const uint64_t __left = (uint64_t)(LEFT); \
const int64_t __left = (int64_t)(LEFT); \
if (!(__left == 0)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s == 0 (0x%llx == 0) " STR, #LEFT, \
(u_longlong_t)__left, __VA_ARGS__); \
"VERIFY0(%s) failed (%lld) " STR, #LEFT, \
(longlong_t)__left, __VA_ARGS__); \
} while (0)
#define VERIFY0PF(LEFT, STR, ...) \
@@ -211,8 +215,8 @@ do { \
const uintptr_t __left = (uintptr_t)(LEFT); \
if (!(__left == 0)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s == 0 (%p == 0) " STR, #LEFT, \
(u_longlong_t)__left, __VA_ARGS__); \
"VERIFY0P(%s) failed (%p) " STR, #LEFT, \
(void *)__left, __VA_ARGS__); \
} while (0)
#ifdef assert
@@ -264,10 +268,7 @@ do { \
((void)(((!(A)) || (B)) || \
libspl_assert("(" #A ") implies (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define EQUIV(A, B) \
((void)((!!(A) == !!(B)) || \
libspl_assert("(" #A ") is equivalent to (" #B ")", \
__FILE__, __FUNCTION__, __LINE__)))
#define EQUIV(A, B) VERIFY3B(A, ==, B)
#endif /* NDEBUG */
+8
View File
@@ -345,6 +345,7 @@
<elf-symbol name='zfs_hold' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_hold_nvl' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_ioctl' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_is_encrypted' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_is_mounted' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_is_shared' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
<elf-symbol name='zfs_isnumber' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/>
@@ -1920,6 +1921,9 @@
<data-member access='public' layout-offset-in-bits='248'>
<var-decl name='dds_origin' type-id='d1617432' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='2296'>
<var-decl name='dds_flags' type-id='b96825af' visibility='default'/>
</data-member>
</class-decl>
<typedef-decl name='dmu_objset_stats_t' type-id='098f0221' id='b2c14f17'/>
<enum-decl name='zfs_type_t' naming-typedef-id='2e45de5d' id='5d8f7321'>
@@ -3815,6 +3819,10 @@
<parameter type-id='c19b74c3' name='inheritkey'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='zfs_is_encrypted' mangled-name='zfs_is_encrypted' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zfs_is_encrypted'>
<parameter type-id='9200a744' name='zhp'/>
<return type-id='c19b74c3'/>
</function-decl>
<function-decl name='zfs_error_aux' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='b0382bb3'/>
<parameter type-id='80f4b756'/>
+11
View File
@@ -1818,3 +1818,14 @@ error:
zfs_error(zhp->zfs_hdl, EZFS_CRYPTOFAILED, errbuf);
return (ret);
}
boolean_t
zfs_is_encrypted(zfs_handle_t *zhp)
{
uint8_t flags = zhp->zfs_dmustats.dds_flags;
if (flags & DDS_FLAG_HAS_ENCRYPTED)
return ((flags & DDS_FLAG_ENCRYPTED) != 0);
return (zfs_prop_get_int(zhp, ZFS_PROP_ENCRYPTION) != ZIO_CRYPT_OFF);
}
+21 -1
View File
@@ -4022,6 +4022,26 @@ zfs_destroy_snaps_nvl(libzfs_handle_t *hdl, nvlist_t *snaps, boolean_t defer)
dgettext(TEXT_DOMAIN, "snapshot is cloned"));
ret = zfs_error(hdl, EZFS_EXISTS, errbuf);
break;
case EBUSY: {
nvlist_t *existing_holds;
int err = lzc_get_holds(nvpair_name(pair),
&existing_holds);
/* check the presence of holders */
if (err == 0 && !nvlist_empty(existing_holds)) {
zfs_error_aux(hdl,
dgettext(TEXT_DOMAIN, "it's being held. "
"Run 'zfs holds -r %s' to see holders."),
nvpair_name(pair));
ret = zfs_error(hdl, EBUSY, errbuf);
} else {
ret = zfs_standard_error(hdl, errno, errbuf);
}
if (err == 0)
nvlist_free(existing_holds);
break;
}
default:
ret = zfs_standard_error(hdl, errno, errbuf);
break;
@@ -4885,7 +4905,7 @@ zfs_smb_acl_mgmt(libzfs_handle_t *hdl, char *dataset, char *path,
default:
return (-1);
}
error = ioctl(hdl->libzfs_fd, ZFS_IOC_SMB_ACL, &zc);
error = lzc_ioctl_fd(hdl->libzfs_fd, ZFS_IOC_SMB_ACL, &zc);
nvlist_free(nvlist);
return (error);
}
+1 -1
View File
@@ -570,7 +570,7 @@ iter_dependents_cb(zfs_handle_t *zhp, void *arg)
err = zfs_iter_filesystems_v2(zhp, ida->flags,
iter_dependents_cb, ida);
if (err == 0)
err = zfs_iter_snapshots_v2(zhp, ida->flags,
err = zfs_iter_snapshots_sorted_v2(zhp, ida->flags,
iter_dependents_cb, ida, 0, 0);
ida->stack = isf.next;
}
+17
View File
@@ -2761,6 +2761,11 @@ zpool_scan(zpool_handle_t *zhp, pool_scan_func_t func, pool_scrub_cmd_t cmd)
* 1. we resumed a paused scrub.
* 2. we resumed a paused error scrub.
* 3. Error scrub is not run because of no error log.
*
* Note that we no longer return ECANCELED in case 1 or 2. However, in
* order to prevent problems where we have a newer userland than
* kernel, we keep this check in place. That prevents erroneous
* failures when an older kernel returns ECANCELED in those cases.
*/
if (err == ECANCELED && (func == POOL_SCAN_SCRUB ||
func == POOL_SCAN_ERRORSCRUB) && cmd == POOL_SCRUB_NORMAL)
@@ -2961,6 +2966,18 @@ vdev_to_nvlist_iter(nvlist_t *nv, nvlist_t *search, boolean_t *avail_spare,
idx = p + 1;
*p = '\0';
/*
* draid names are presented like: draid2:4d:6c:0s
* We match them up to the first ':' so we can still
* do the parity check below, but the other params
* are ignored.
*/
if ((p = strchr(type, ':')) != NULL) {
if (strncmp(type, VDEV_TYPE_DRAID,
strlen(VDEV_TYPE_DRAID)) == 0)
*p = '\0';
}
/*
* If the types don't match then keep looking.
*/
+6
View File
@@ -785,6 +785,12 @@ zpool_standard_error_fmt(libzfs_handle_t *hdl, int error, const char *fmt, ...)
return (-1);
}
int
zfs_ioctl(libzfs_handle_t *hdl, int request, zfs_cmd_t *zc)
{
return (lzc_ioctl_fd(hdl->libzfs_fd, request, zc));
}
/*
* Display an out of memory error message and abort the current program.
*/
-6
View File
@@ -218,12 +218,6 @@ libzfs_error_init(int error)
return (errbuf);
}
int
zfs_ioctl(libzfs_handle_t *hdl, int request, zfs_cmd_t *zc)
{
return (lzc_ioctl_fd(hdl->libzfs_fd, request, zc));
}
/*
* Verify the required ZFS_DEV device is available and optionally attempt
* to load the ZFS modules. Under normal circumstances the modules
-6
View File
@@ -53,12 +53,6 @@
#define ZDIFF_SHARESDIR "/.zfs/shares/"
int
zfs_ioctl(libzfs_handle_t *hdl, int request, zfs_cmd_t *zc)
{
return (ioctl(hdl->libzfs_fd, request, zc));
}
const char *
libzfs_error_init(int error)
{
+5 -2
View File
@@ -1,18 +1,21 @@
libzfs_core_la_CFLAGS = $(AM_CFLAGS) $(LIBRARY_CFLAGS)
libzfs_core_la_CFLAGS += -fvisibility=hidden
libzfs_core_la_CPPFLAGS = $(AM_CPPFLAGS)
libzfs_core_la_CPPFLAGS += -I$(srcdir)/%D%
lib_LTLIBRARIES += libzfs_core.la
CPPCHECKTARGETS += libzfs_core.la
libzfs_core_la_SOURCES = \
%D%/libzfs_core.c
%D%/libzfs_core.c \
%D%/libzfs_core_impl.h
if BUILD_LINUX
libzfs_core_la_SOURCES += \
%D%/os/linux/libzfs_core_ioctl.c
endif
libzfs_core_la_CPPFLAGS = $(AM_CPPFLAGS)
if BUILD_FREEBSD
libzfs_core_la_CPPFLAGS += -Iinclude/os/freebsd/zfs
+3
View File
@@ -1494,6 +1494,9 @@
<data-member access='public' layout-offset-in-bits='248'>
<var-decl name='dds_origin' type-id='d1617432' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='2296'>
<var-decl name='dds_flags' type-id='b96825af' visibility='default'/>
</data-member>
</class-decl>
<typedef-decl name='dmu_objset_stats_t' type-id='098f0221' id='b2c14f17'/>
<enum-decl name='dmu_objset_type' id='6b1b19f9'>
+44
View File
@@ -96,10 +96,14 @@
#define BIG_PIPE_SIZE (64 * 1024) /* From sys/pipe.h */
#endif
#include "libzfs_core_impl.h"
static int g_fd = -1;
static pthread_mutex_t g_lock = PTHREAD_MUTEX_INITIALIZER;
static int g_refcount;
static int g_ioc_trace = 0;
#ifdef ZFS_DEBUG
static zfs_ioc_t fail_ioc_cmd = ZFS_IOC_LAST;
static zfs_errno_t fail_ioc_err;
@@ -152,6 +156,10 @@ libzfs_core_init(void)
#ifdef ZFS_DEBUG
libzfs_core_debug_ioc();
#endif
if (getenv("ZFS_IOC_TRACE"))
g_ioc_trace = 1;
(void) pthread_mutex_unlock(&g_lock);
return (0);
}
@@ -171,6 +179,42 @@ libzfs_core_fini(void)
(void) pthread_mutex_unlock(&g_lock);
}
int
lzc_ioctl_fd(int fd, unsigned long ioc, zfs_cmd_t *zc)
{
if (!g_ioc_trace)
return (lzc_ioctl_fd_os(fd, ioc, zc));
nvlist_t *nvl;
fprintf(stderr, "=== lzc_ioctl: call: ioc=0x%lx name=%s\n",
ioc, zc->zc_name[0] ? zc->zc_name : "[none]");
if (zc->zc_nvlist_src) {
nvl = fnvlist_unpack(
(void *)(uintptr_t)zc->zc_nvlist_src,
zc->zc_nvlist_src_size);
nvlist_print(stderr, nvl);
fnvlist_free(nvl);
}
int rc = lzc_ioctl_fd_os(fd, ioc, zc);
int err = errno;
fprintf(stderr, "=== lzc_ioctl: result: ioc=0x%lx name=%s "
"rc=%d errno=%d\n", ioc, zc->zc_name[0] ? zc->zc_name : "[none]",
rc, (rc < 0 ? err : 0));
if (rc >= 0 && zc->zc_nvlist_dst) {
nvl = fnvlist_unpack(
(void *)(uintptr_t)zc->zc_nvlist_dst,
zc->zc_nvlist_dst_size);
nvlist_print(stderr, nvl);
fnvlist_free(nvl);
}
errno = err;
return (rc);
}
static int
lzc_ioctl(zfs_ioc_t ioc, const char *name,
nvlist_t *source, nvlist_t **resultp)
+36
View File
@@ -0,0 +1,36 @@
// SPDX-License-Identifier: CDDL-1.0
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2012, 2020 by Delphix. All rights reserved.
* Copyright 2017 RackTop Systems.
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
* Copyright (c) 2019 Datto Inc.
*/
#ifndef _LIBZFS_CORE_IMPL_H
#define _LIBZFS_CORE_IMPL_H
struct zfs_cmd;
int lzc_ioctl_fd_os(int, unsigned long, struct zfs_cmd *);
#endif /* _LIBZFS_CORE_IMPL_H */
@@ -26,6 +26,7 @@
#include <os/freebsd/zfs/sys/zfs_ioctl_compat.h>
#include <err.h>
#include <libzfs_core.h>
#include "libzfs_core_impl.h"
int zfs_ioctl_version = ZFS_IOCVER_UNDEF;
@@ -101,7 +102,7 @@ zcmd_ioctl_compat(int fd, int request, zfs_cmd_t *zc, const int cflag)
* error is returned zc_nvlist_dst_size won't be updated.
*/
int
lzc_ioctl_fd(int fd, unsigned long request, zfs_cmd_t *zc)
lzc_ioctl_fd_os(int fd, unsigned long request, zfs_cmd_t *zc)
{
size_t oldsize;
int ret, cflag = ZFS_CMD_COMPAT_NONE;
+2 -1
View File
@@ -23,9 +23,10 @@
#include <sys/param.h>
#include <sys/zfs_ioctl.h>
#include <libzfs_core.h>
#include "libzfs_core_impl.h"
int
lzc_ioctl_fd(int fd, unsigned long request, zfs_cmd_t *zc)
lzc_ioctl_fd_os(int fd, unsigned long request, zfs_cmd_t *zc)
{
return (ioctl(fd, request, zc));
}
-2
View File
@@ -1,5 +1,3 @@
include $(srcdir)/%D%/include/Makefile.am
libzpool_la_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS) $(LIBRARY_CFLAGS)
libzpool_la_CFLAGS += $(ZLIB_CFLAGS)
-4
View File
@@ -1,4 +0,0 @@
libzpooldir = $(includedir)/libzpool
libzpool_HEADERS = \
%D%/sys/abd_os.h \
%D%/sys/abd_impl_os.h
-7
View File
@@ -918,13 +918,6 @@ secpolicy_zfs(const cred_t *cr)
return (0);
}
int
secpolicy_zfs_proc(const cred_t *cr, proc_t *proc)
{
(void) cr, (void) proc;
return (0);
}
ksiddomain_t *
ksid_lookupdomain(const char *dom)
{
+40 -10
View File
@@ -3,7 +3,7 @@
.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
.\" Copyright (c) 2019 Datto Inc.
.\" Copyright (c) 2023, 2024 Klara, Inc.
.\" Copyright (c) 2023, 2024, 2025, Klara, Inc.
.\" The contents of this file are subject to the terms of the Common Development
.\" and Distribution License (the "License"). You may not use this file except
.\" in compliance with the License. You can obtain a copy of the license at
@@ -17,9 +17,7 @@
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" Copyright (c) 2024, Klara, Inc.
.\"
.Dd November 1, 2024
.Dd May 24, 2025
.Dt ZFS 4
.Os
.
@@ -95,10 +93,6 @@ when the DDT can not report the correct reference count.
Limit the amount we can prefetch with one call to this amount in bytes.
This helps to limit the amount of memory that can be used by prefetching.
.
.It Sy ignore_hole_birth Pq int
Alias for
.Sy send_holes_without_birth_time .
.
.It Sy l2arc_feed_again Ns = Ns Sy 1 Ns | Ns 0 Pq int
Turbo L2ARC warm-up.
When the L2ARC is cold the fill interval will be set as fast as possible.
@@ -725,6 +719,40 @@ Number ARC headers to evict per sub-list before proceeding to another sub-list.
This batch-style operation prevents entire sub-lists from being evicted at once
but comes at a cost of additional unlocking and locking.
.
.It Sy zfs_arc_evict_threads Ns = Ns Sy 0 Pq int
Sets the number of ARC eviction threads to be used.
.Pp
If set greater than 0, ZFS will dedicate up to that many threads to ARC
eviction.
Each thread will process one sub-list at a time,
until the eviction target is reached or all sub-lists have been processed.
When set to 0, ZFS will compute a reasonable number of eviction threads based
on the number of CPUs.
.TS
box;
lb l l .
CPUs Threads
_
1-4 1
5-8 2
9-15 3
16-31 4
32-63 6
64-95 8
96-127 9
128-160 11
160-191 12
192-223 13
224-255 14
256+ 16
.TE
.Pp
More threads may improve the responsiveness of ZFS to memory pressure.
This can be important for performance when eviction from the ARC becomes
a bottleneck for reads and writes.
.Pp
This parameter can only be set at module load time.
.
.It Sy zfs_arc_grow_retry Ns = Ns Sy 0 Ns s Pq uint
If set to a non zero value, it will replace the
.Sy arc_grow_retry
@@ -862,7 +890,9 @@ pressure on the pagecache, yet still allows the ARC to be reclaimed down to
.Sy zfs_arc_min
if necessary.
This value is specified as percent of pagecache size (as measured by
.Sy NR_FILE_PAGES ) ,
.Sy NR_ACTIVE_FILE
+
.Sy NR_INACTIVE_FILE ) ,
where that percent may exceed
.Sy 100 .
This
@@ -1949,7 +1979,7 @@ Disable QAT hardware acceleration for AES-GCM encryption.
May be unset after the ZFS modules have been loaded to initialize the QAT
hardware as long as support is compiled in and the QAT driver is present.
.
.It Sy zfs_vnops_read_chunk_size Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
.It Sy zfs_vnops_read_chunk_size Ns = Ns Sy 33554432 Ns B Po 32 MiB Pc Pq u64
Bytes to read per chunk.
.
.It Sy zfs_read_history Ns = Ns Sy 0 Pq uint
+2 -2
View File
@@ -30,7 +30,7 @@
.\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc.
.\"
.Dd March 16, 2022
.Dd April 28, 2025
.Dt ZFS-ROLLBACK 8
.Os
.
@@ -75,7 +75,7 @@ Destroy any snapshots and bookmarks more recent than the one specified.
.Sh EXAMPLES
.\" These are, respectively, examples 8 from zfs.8
.\" Make sure to update them bidirectionally
.Ss Example 8 : No Rolling Back a ZFS File System
.Ss Example 1 : No Rolling Back a ZFS File System
The following command reverts the contents of
.Ar pool/home/anne
to the snapshot named
+43 -37
View File
@@ -37,12 +37,17 @@
.Sh SYNOPSIS
.Nm zpool
.Cm status
.Op Fl dDegiLpPstvx
.Op Fl T Sy u Ns | Ns Sy d
.Op Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns …
.Oo Ar pool Oc Ns …
.Op Fl DdegiLPpstvx
.Op Fl c Ar script1 Ns Oo , Ns Ar script2 Ns ,… Oc
.Oo Fl j|--json
.Oo Ns Fl -json-flat-vdevs Oc
.Oo Ns Fl -json-int Oc
.Oo Ns Fl -json-pool-key-guid Oc
.Oc
.Op Fl T Ar d|u
.Op Fl -power
.Op Ar pool
.Op Ar interval Op Ar count
.Op Fl j Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
.
.Sh DESCRIPTION
Displays the detailed health status for the given pools.
@@ -59,9 +64,7 @@ and the estimated time to completion.
Both of these are only approximate, because the amount of data in the pool and
the other workloads on the system can change.
.Bl -tag -width Ds
.It Fl -power
Display vdev enclosure slot power status (on or off).
.It Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns …
.It Fl c Ar script1 Ns Oo , Ns Ar script2 Ns ,… Oc
Run a script (or scripts) on each vdev and include the output as a new column
in the
.Nm zpool Cm status
@@ -71,17 +74,14 @@ See the
option of
.Nm zpool Cm iostat
for complete details.
.It Fl j , -json Op Ar --json-int, --json-flat-vdevs, --json-pool-key-guid
Display the status for ZFS pools in JSON format.
Specify
.Sy --json-int
to display numbers in integer format instead of strings.
Specify
.Sy --json-flat-vdevs
to display vdevs in flat hierarchy instead of nested vdev objects.
Specify
.Sy --json-pool-key-guid
to set pool GUID as key for pool objects instead of pool names.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk
and referenced
.Pq logically referenced in the pool
block counts and sizes by reference count.
If repeated, (-DD), also shows statistics on how much of the DDT is resident
in the ARC.
.It Fl d
Display the number of Direct I/O read/write checksum verify errors that have
occurred on a top-level VDEV.
@@ -95,14 +95,6 @@ Direct I/O reads checksum verify errors can also occur if the contents of the
buffer are being manipulated after the I/O has been issued and is in flight.
In the case of Direct I/O read checksum verify errors, the I/O will be reissued
through the ARC.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk
and referenced
.Pq logically referenced in the pool
block counts and sizes by reference count.
If repeated, (-DD), also shows statistics on how much of the DDT is resident
in the ARC.
.It Fl e
Only show unhealthy vdevs (not-ONLINE or with errors).
.It Fl g
@@ -111,19 +103,33 @@ These GUIDs can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl i
Display vdev initialization status.
.It Fl j , -json Oo Ns Fl -json-flat-vdevs Oc Oo Ns Fl -json-int Oc \
Oo Ns Fl -json-pool-key-guid Oc
Display the status for ZFS pools in JSON format.
Specify
.Sy --json-flat-vdevs
to display vdevs in flat hierarchy instead of nested vdev objects.
Specify
.Sy --json-int
to display numbers in integer format instead of strings.
Specify
.Sy --json-pool-key-guid
to set pool GUID as key for pool objects instead of pool names.
.It Fl L
Display real paths for vdevs resolving all symbolic links.
This can be used to look up the current block device name regardless of the
.Pa /dev/disk/
path used to open it.
.It Fl p
Display numbers in parsable (exact) values.
.It Fl P
Display full paths for vdevs instead of only the last component of
the path.
This can be used in conjunction with the
.Fl L
flag.
.It Fl p
Display numbers in parsable (exact) values.
.It Fl -power
Display vdev enclosure slot power status (on or off).
.It Fl s
Display the number of leaf vdev slow I/O operations.
This is the number of I/O operations that didn't complete in
@@ -134,20 +140,20 @@ This does not necessarily mean the I/O operations failed to complete, just took
an
unreasonably long amount of time.
This may indicate a problem with the underlying storage.
.It Fl t
Display vdev TRIM status.
.It Fl T Sy u Ns | Ns Sy d
.It Fl T Sy d Ns | Ns Sy u
Display a time stamp.
Specify
.Sy u
for a printed representation of the internal representation of time.
See
.Xr time 1 .
Specify
.Sy d
for standard date format.
See
.Xr date 1 .
Specify
.Sy u
for a printed representation of the internal representation of time.
See
.Xr time 1 .
.It Fl t
Display vdev TRIM status.
.It Fl v
Displays verbose data error information, printing out a complete list of all
data errors since the last complete pool scrub.
-13
View File
@@ -176,14 +176,6 @@ $(addprefix $(obj)/icp/,$(ICP_OBJS) $(ICP_OBJS_X86) $(ICP_OBJS_X86_64) \
$(addprefix $(obj)/icp/,$(ICP_OBJS) $(ICP_OBJS_X86) $(ICP_OBJS_X86_64) \
$(ICP_OBJS_ARM64) $(ICP_OBJS_PPC_PPC64)) : ccflags-y += -I$(icp_include) -I$(zfs_include)/os/linux/spl -I$(zfs_include)
# Suppress objtool "return with modified stack frame" warnings.
OBJECT_FILES_NON_STANDARD_aesni-gcm-x86_64.o := y
# Suppress objtool "unsupported stack pointer realignment" warnings.
# See #6950 for the reasoning.
OBJECT_FILES_NON_STANDARD_sha256-x86_64.o := y
OBJECT_FILES_NON_STANDARD_sha512-x86_64.o := y
LUA_OBJS := \
lapi.o \
lauxlib.o \
@@ -499,11 +491,6 @@ UBSAN_SANITIZE_sa.o := n
UBSAN_SANITIZE_zfs/zap_micro.o := n
UBSAN_SANITIZE_zfs/sa.o := n
# Suppress incorrect warnings from versions of objtool which are not
# aware of x86 EVEX prefix instructions used for AVX512.
OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512bw.o := y
OBJECT_FILES_NON_STANDARD_vdev_raidz_math_avx512f.o := y
ifeq ($(CONFIG_ALTIVEC),y)
$(obj)/zfs/vdev_raidz_math_powerpc_altivec.o : c_flags += -maltivec
endif
-2
View File
@@ -420,8 +420,6 @@ icp_aes_impl_get(char *buffer, zfs_kernel_param_t *kp)
char *fmt;
const uint32_t impl = AES_IMPL_READ(icp_aes_impl);
ASSERT(aes_impl_initialized);
/* list mandatory options */
for (i = 0; i < ARRAY_SIZE(aes_impl_opts); i++) {
fmt = (impl == aes_impl_opts[i].sel) ? "[%s] " : "%s ";
-2
View File
@@ -948,8 +948,6 @@ icp_gcm_impl_get(char *buffer, zfs_kernel_param_t *kp)
char *fmt;
const uint32_t impl = GCM_IMPL_READ(icp_gcm_impl);
ASSERT(gcm_impl_initialized);
/* list mandatory options */
for (i = 0; i < ARRAY_SIZE(gcm_impl_opts); i++) {
#ifdef CAN_USE_GCM_ASM
+2 -2
View File
@@ -173,12 +173,12 @@ gcm_clear_ctx(gcm_ctx_t *ctx)
#if defined(CAN_USE_GCM_ASM)
if (ctx->gcm_use_avx == B_TRUE) {
ASSERT3P(ctx->gcm_Htable, !=, NULL);
memset(ctx->gcm_Htable, 0, ctx->gcm_htab_len);
explicit_memset(ctx->gcm_Htable, 0, ctx->gcm_htab_len);
kmem_free(ctx->gcm_Htable, ctx->gcm_htab_len);
}
#endif
if (ctx->gcm_pt_buf != NULL) {
memset(ctx->gcm_pt_buf, 0, ctx->gcm_pt_buf_len);
explicit_memset(ctx->gcm_pt_buf, 0, ctx->gcm_pt_buf_len);
vmem_free(ctx->gcm_pt_buf, ctx->gcm_pt_buf_len);
}
/* Optional */
@@ -50,6 +50,7 @@
#define _ASM
#include <sys/asm_linkage.h>
#include <sys/frame.h>
/* Windows userland links with OpenSSL */
#if !defined (_WIN32) || defined (_KERNEL)
@@ -378,6 +379,7 @@ FUNCTION(_aesni_ctr32_ghash_6x)
RET
.cfi_endproc
SET_SIZE(_aesni_ctr32_ghash_6x)
STACK_FRAME_NON_STANDARD _aesni_ctr32_ghash_6x
#endif /* ifdef HAVE_MOVBE */
.balign 32
@@ -706,6 +708,7 @@ FUNCTION(_aesni_ctr32_ghash_no_movbe_6x)
RET
.cfi_endproc
SET_SIZE(_aesni_ctr32_ghash_no_movbe_6x)
STACK_FRAME_NON_STANDARD _aesni_ctr32_ghash_no_movbe_6x
ENTRY_ALIGN(aesni_gcm_decrypt, 32)
.cfi_startproc
@@ -823,6 +826,7 @@ ENTRY_ALIGN(aesni_gcm_decrypt, 32)
RET
.cfi_endproc
SET_SIZE(aesni_gcm_decrypt)
STACK_FRAME_NON_STANDARD aesni_gcm_decrypt
.balign 32
FUNCTION(_aesni_ctr32_6x)
@@ -1198,6 +1202,7 @@ ENTRY_ALIGN(aesni_gcm_encrypt, 32)
RET
.cfi_endproc
SET_SIZE(aesni_gcm_encrypt)
STACK_FRAME_NON_STANDARD aesni_gcm_encrypt
#endif /* !_WIN32 || _KERNEL */
@@ -1257,6 +1262,18 @@ SECTION_STATIC
.byte 65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108,101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
.balign 64
/* Workaround for missing asm macro in RHEL 8. */
#if defined(__linux__) && defined(HAVE_STACK_FRAME_NON_STANDARD) && \
! defined(HAVE_STACK_FRAME_NON_STANDARD_ASM)
.section .discard.func_stack_frame_non_standard, "aw"
#ifdef HAVE_MOVBE
.long _aesni_ctr32_ghash_6x - .
#endif
.long _aesni_ctr32_ghash_no_movbe_6x - .
.long aesni_gcm_decrypt - .
.long aesni_gcm_encrypt - .
#endif
/* Mark the stack non-executable. */
#if defined(__linux__) && defined(__ELF__)
.section .note.GNU-stack,"",%progbits
@@ -24,6 +24,7 @@
#define _ASM
#include <sys/asm_linkage.h>
#include <sys/frame.h>
SECTION_STATIC
@@ -1420,6 +1421,7 @@ ENTRY_ALIGN(zfs_sha256_transform_x64, 16)
RET
.cfi_endproc
SET_SIZE(zfs_sha256_transform_x64)
STACK_FRAME_NON_STANDARD zfs_sha256_transform_x64
ENTRY_ALIGN(zfs_sha256_transform_shani, 64)
.cfi_startproc
@@ -1628,6 +1630,7 @@ ENTRY_ALIGN(zfs_sha256_transform_shani, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha256_transform_shani)
STACK_FRAME_NON_STANDARD zfs_sha256_transform_shani
ENTRY_ALIGN(zfs_sha256_transform_ssse3, 64)
.cfi_startproc
@@ -2739,6 +2742,7 @@ ENTRY_ALIGN(zfs_sha256_transform_ssse3, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha256_transform_ssse3)
STACK_FRAME_NON_STANDARD zfs_sha256_transform_ssse3
ENTRY_ALIGN(zfs_sha256_transform_avx, 64)
.cfi_startproc
@@ -3813,6 +3817,7 @@ ENTRY_ALIGN(zfs_sha256_transform_avx, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha256_transform_avx)
STACK_FRAME_NON_STANDARD zfs_sha256_transform_avx
ENTRY_ALIGN(zfs_sha256_transform_avx2, 64)
.cfi_startproc
@@ -5098,6 +5103,18 @@ ENTRY_ALIGN(zfs_sha256_transform_avx2, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha256_transform_avx2)
STACK_FRAME_NON_STANDARD zfs_sha256_transform_avx2
/* Workaround for missing asm macro in RHEL 8. */
#if defined(__linux__) && defined(HAVE_STACK_FRAME_NON_STANDARD) && \
! defined(HAVE_STACK_FRAME_NON_STANDARD_ASM)
.section .discard.func_stack_frame_non_standard, "aw"
.long zfs_sha256_transform_x64 - .
.long zfs_sha256_transform_shani - .
.long zfs_sha256_transform_ssse3 - .
.long zfs_sha256_transform_avx - .
.long zfs_sha256_transform_avx2 - .
#endif
#if defined(__ELF__)
.section .note.GNU-stack,"",%progbits
@@ -24,6 +24,7 @@
#define _ASM
#include <sys/asm_linkage.h>
#include <sys/frame.h>
SECTION_STATIC
@@ -1463,6 +1464,7 @@ ENTRY_ALIGN(zfs_sha512_transform_x64, 16)
RET
.cfi_endproc
SET_SIZE(zfs_sha512_transform_x64)
STACK_FRAME_NON_STANDARD zfs_sha512_transform_x64
ENTRY_ALIGN(zfs_sha512_transform_avx, 64)
.cfi_startproc
@@ -2627,6 +2629,7 @@ ENTRY_ALIGN(zfs_sha512_transform_avx, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha512_transform_avx)
STACK_FRAME_NON_STANDARD zfs_sha512_transform_avx
ENTRY_ALIGN(zfs_sha512_transform_avx2, 64)
.cfi_startproc
@@ -4005,6 +4008,16 @@ ENTRY_ALIGN(zfs_sha512_transform_avx2, 64)
RET
.cfi_endproc
SET_SIZE(zfs_sha512_transform_avx2)
STACK_FRAME_NON_STANDARD zfs_sha512_transform_avx2
/* Workaround for missing asm macro in RHEL 8. */
#if defined(__linux__) && defined(HAVE_STACK_FRAME_NON_STANDARD) && \
! defined(HAVE_STACK_FRAME_NON_STANDARD_ASM)
.section .discard.func_stack_frame_non_standard, "aw"
.long zfs_sha512_transform_x64 - .
.long zfs_sha512_transform_avx - .
.long zfs_sha512_transform_avx2 - .
#endif
#if defined(__ELF__)
.section .note.GNU-stack,"",%progbits
-7
View File
@@ -52,13 +52,6 @@ secpolicy_zfs(cred_t *cr)
return (priv_check_cred(cr, PRIV_VFS_MOUNT));
}
int
secpolicy_zfs_proc(cred_t *cr, proc_t *proc)
{
return (priv_check_cred(cr, PRIV_VFS_MOUNT));
}
int
secpolicy_sys_config(cred_t *cr, int checkonly __unused)
{
+1
View File
@@ -43,6 +43,7 @@
const int zfs_vm_pagerret_bad = VM_PAGER_BAD;
const int zfs_vm_pagerret_error = VM_PAGER_ERROR;
const int zfs_vm_pagerret_ok = VM_PAGER_OK;
const int zfs_vm_pagerret_pend = VM_PAGER_PEND;
const int zfs_vm_pagerput_sync = VM_PAGER_PUT_SYNC;
const int zfs_vm_pagerput_inval = VM_PAGER_PUT_INVAL;
+45 -15
View File
@@ -25,6 +25,7 @@
* Copyright (c) 2012, 2015 by Delphix. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
* Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2025, Klara, Inc.
*/
/* Portions Copyright 2007 Jeremy Teo */
@@ -4072,6 +4073,33 @@ zfs_freebsd_getpages(struct vop_getpages_args *ap)
ap->a_rahead));
}
typedef struct {
uint_t pca_npages;
vm_page_t pca_pages[];
} putpage_commit_arg_t;
static void
zfs_putpage_commit_cb(void *arg)
{
putpage_commit_arg_t *pca = arg;
vm_object_t object = pca->pca_pages[0]->object;
zfs_vmobject_wlock(object);
for (uint_t i = 0; i < pca->pca_npages; i++) {
vm_page_t pp = pca->pca_pages[i];
vm_page_undirty(pp);
vm_page_sunbusy(pp);
}
vm_object_pip_wakeupn(object, pca->pca_npages);
zfs_vmobject_wunlock(object);
kmem_free(pca,
offsetof(putpage_commit_arg_t, pca_pages[pca->pca_npages]));
}
static int
zfs_putpages(struct vnode *vp, vm_page_t *ma, size_t len, int flags,
int *rtvals)
@@ -4173,10 +4201,12 @@ zfs_putpages(struct vnode *vp, vm_page_t *ma, size_t len, int flags,
}
if (zp->z_blksz < PAGE_SIZE) {
for (i = 0; len > 0; off += tocopy, len -= tocopy, i++) {
tocopy = len > PAGE_SIZE ? PAGE_SIZE : len;
vm_ooffset_t woff = off;
size_t wlen = len;
for (i = 0; wlen > 0; woff += tocopy, wlen -= tocopy, i++) {
tocopy = MIN(PAGE_SIZE, wlen);
va = zfs_map_page(ma[i], &sf);
dmu_write(zfsvfs->z_os, zp->z_id, off, tocopy, va, tx);
dmu_write(zfsvfs->z_os, zp->z_id, woff, tocopy, va, tx);
zfs_unmap_page(sf);
}
} else {
@@ -4197,19 +4227,19 @@ zfs_putpages(struct vnode *vp, vm_page_t *ma, size_t len, int flags,
zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime);
err = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
ASSERT0(err);
/*
* XXX we should be passing a callback to undirty
* but that would make the locking messier
*/
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off,
len, commit, B_FALSE, NULL, NULL);
zfs_vmobject_wlock(object);
for (i = 0; i < ncount; i++) {
rtvals[i] = zfs_vm_pagerret_ok;
vm_page_undirty(ma[i]);
}
zfs_vmobject_wunlock(object);
putpage_commit_arg_t *pca = kmem_alloc(
offsetof(putpage_commit_arg_t, pca_pages[ncount]),
KM_SLEEP);
pca->pca_npages = ncount;
memcpy(pca->pca_pages, ma, sizeof (vm_page_t) * ncount);
zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp,
off, len, commit, B_FALSE, zfs_putpage_commit_cb, pca);
for (i = 0; i < ncount; i++)
rtvals[i] = zfs_vm_pagerret_pend;
VM_CNT_INC(v_vnodeout);
VM_CNT_ADD(v_vnodepgsout, ncount);
}
+4
View File
@@ -67,8 +67,12 @@
#include "zfs_comutil.h"
/* Used by fstat(1). */
#ifdef SYSCTL_SIZEOF
SYSCTL_SIZEOF(znode, znode_t);
#else
SYSCTL_INT(_debug_sizeof, OID_AUTO, znode, CTLFLAG_RD,
SYSCTL_NULL_INT_PTR, sizeof (znode_t), "sizeof(znode_t)");
#endif
/*
* Define ZNODE_STATS to turn on statistic gathering. By default, it is only
-6
View File
@@ -1822,9 +1822,3 @@ error:
return (SET_ERROR(ret));
}
#if defined(_KERNEL) && defined(HAVE_SPL)
module_param(zfs_key_max_salt_uses, ulong, 0644);
MODULE_PARM_DESC(zfs_key_max_salt_uses, "Max number of times a salt value "
"can be used for generating encryption keys before it is rotated");
#endif
+10 -2
View File
@@ -1584,13 +1584,21 @@ zvol_os_update_volsize(zvol_state_t *zv, uint64_t volsize)
void
zvol_os_set_disk_ro(zvol_state_t *zv, int flags)
{
// XXX? set_disk_ro(zv->zv_zso->zvo_disk, flags);
/*
* The ro/rw ZVOL mode is switched using zvol_set_ro() function by
* enabling/disabling ZVOL_RDONLY flag. No additional FreeBSD-specific
* actions are required for readonly zfs property switching.
*/
}
void
zvol_os_set_capacity(zvol_state_t *zv, uint64_t capacity)
{
// XXX? set_capacity(zv->zv_zso->zvo_disk, capacity);
/*
* The ZVOL size/capacity is changed by zvol_set_volsize() function.
* Leave this method empty, because all required job is doing by
* zvol_os_update_volsize() platform-specific function.
*/
}
/*
-23
View File
@@ -555,29 +555,6 @@ ddi_copyin(const void *from, void *to, size_t len, int flags)
}
EXPORT_SYMBOL(ddi_copyin);
#define define_spl_param(type, fmt) \
int \
spl_param_get_##type(char *buf, zfs_kernel_param_t *kp) \
{ \
return (scnprintf(buf, PAGE_SIZE, fmt "\n", \
*(type *)kp->arg)); \
} \
int \
spl_param_set_##type(const char *buf, zfs_kernel_param_t *kp) \
{ \
return (kstrto##type(buf, 0, (type *)kp->arg)); \
} \
const struct kernel_param_ops spl_param_ops_##type = { \
.set = spl_param_set_##type, \
.get = spl_param_get_##type, \
}; \
EXPORT_SYMBOL(spl_param_get_##type); \
EXPORT_SYMBOL(spl_param_set_##type); \
EXPORT_SYMBOL(spl_param_ops_##type);
define_spl_param(s64, "%lld")
define_spl_param(u64, "%llu")
/*
* Post a uevent to userspace whenever a new vdev adds to the pool. It is
* necessary to sync blkid information with udev, which zed daemon uses
+8 -20
View File
@@ -37,6 +37,12 @@
#include <sys/atomic.h>
#include <sys/kstat.h>
#include <linux/cpuhotplug.h>
#include <linux/mod_compat.h>
/* Linux 6.2 renamed timer_delete_sync(); point it at its old name for those. */
#ifndef HAVE_TIMER_DELETE_SYNC
#define timer_delete_sync(t) del_timer_sync(t)
#endif
typedef struct taskq_kstats {
/* static values, for completeness */
@@ -633,7 +639,7 @@ taskq_cancel_id(taskq_t *tq, taskqid_t id)
*/
if (timer_pending(&t->tqent_timer)) {
spin_unlock_irqrestore(&tq->tq_lock, flags);
del_timer_sync(&t->tqent_timer);
timer_delete_sync(&t->tqent_timer);
spin_lock_irqsave_nested(&tq->tq_lock, flags,
tq->tq_lock_class);
}
@@ -1646,18 +1652,8 @@ spl_taskq_kstat_fini(void)
static unsigned int spl_taskq_kick = 0;
/*
* 2.6.36 API Change
* module_param_cb is introduced to take kernel_param_ops and
* module_param_call is marked as obsolete. Also set and get operations
* were changed to take a 'const struct kernel_param *'.
*/
static int
#ifdef module_param_cb
param_set_taskq_kick(const char *val, const struct kernel_param *kp)
#else
param_set_taskq_kick(const char *val, struct kernel_param *kp)
#endif
param_set_taskq_kick(const char *val, zfs_kernel_param_t *kp)
{
int ret;
taskq_t *tq = NULL;
@@ -1687,16 +1683,8 @@ param_set_taskq_kick(const char *val, struct kernel_param *kp)
return (ret);
}
#ifdef module_param_cb
static const struct kernel_param_ops param_ops_taskq_kick = {
.set = param_set_taskq_kick,
.get = param_get_uint,
};
module_param_cb(spl_taskq_kick, &param_ops_taskq_kick, &spl_taskq_kick, 0644);
#else
module_param_call(spl_taskq_kick, param_set_taskq_kick, param_get_uint,
&spl_taskq_kick, 0644);
#endif
MODULE_PARM_DESC(spl_taskq_kick,
"Write nonzero to kick stuck taskqs to spawn more threads");
+2
View File
@@ -1340,6 +1340,8 @@ abd_bio_map_off(struct bio *bio, abd_t *abd,
return (io_size);
}
EXPORT_SYMBOL(abd_alloc_from_pages);
/* Tunable Parameters */
module_param(zfs_abd_scatter_enabled, int, 0644);
MODULE_PARM_DESC(zfs_abd_scatter_enabled,
+18 -26
View File
@@ -24,6 +24,7 @@
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2013, Joyent, Inc. All rights reserved.
* Copyright (C) 2016 Lawrence Livermore National Security, LLC.
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*
* For Linux the vast majority of this enforcement is already handled via
* the standard Linux VFS permission checks. However certain administrative
@@ -35,28 +36,32 @@
#include <linux/security.h>
#include <linux/vfs_compat.h>
/*
* The passed credentials cannot be directly verified because Linux only
* provides and interface to check the *current* process credentials. In
* order to handle this the capable() test is only run when the passed
* credentials match the current process credentials or the kcred. In
* all other cases this function must fail and return the passed err.
*/
static int
priv_policy_ns(const cred_t *cr, int capability, int err,
struct user_namespace *ns)
{
if (cr != CRED() && (cr != kcred))
return (err);
/*
* The passed credentials cannot be directly verified because Linux
* only provides an interface to check the *current* process
* credentials. In order to handle this we check if the passed in
* creds match the current process credentials or the kcred. If not,
* we swap the passed credentials into the current task, perform the
* check, and then revert it before returning.
*/
const cred_t *old =
(cr != CRED() && cr != kcred) ? override_creds(cr) : NULL;
#if defined(CONFIG_USER_NS)
if (!(ns ? ns_capable(ns, capability) : capable(capability)))
if (ns ? ns_capable(ns, capability) : capable(capability))
#else
if (!capable(capability))
if (capable(capability))
#endif
return (err);
err = 0;
return (0);
if (old)
revert_creds(old);
return (err);
}
static int
@@ -249,19 +254,6 @@ secpolicy_zfs(const cred_t *cr)
return (priv_policy(cr, CAP_SYS_ADMIN, EACCES));
}
/*
* Equivalent to secpolicy_zfs(), but works even if the cred_t is not that of
* the current process. Takes both cred_t and proc_t so that this can work
* easily on all platforms.
*/
int
secpolicy_zfs_proc(const cred_t *cr, proc_t *proc)
{
if (!has_capability(proc, CAP_SYS_ADMIN))
return (EACCES);
return (0);
}
void
secpolicy_setid_clear(vattr_t *vap, cred_t *cr)
{
+1 -1
View File
@@ -614,7 +614,7 @@ static inline uint_t
vdev_bio_max_segs(struct block_device *bdev)
{
/*
* Smallest of the device max segs and the tuneable max segs. Minimum
* Smallest of the device max segs and the tunable max segs. Minimum
* 4, so there's room to finish split pages if they come up.
*/
const uint_t dev_max_segs = queue_max_segments(bdev_get_queue(bdev));
+5 -6
View File
@@ -233,9 +233,6 @@ zfs_uiomove_iter(void *p, size_t n, zfs_uio_rw_t rw, zfs_uio_t *uio,
{
size_t cnt = MIN(n, uio->uio_resid);
if (uio->uio_skip)
iov_iter_advance(uio->uio_iter, uio->uio_skip);
if (rw == UIO_READ)
cnt = copy_to_iter(p, cnt, uio->uio_iter);
else
@@ -507,12 +504,14 @@ static int
zfs_uio_pin_user_pages(zfs_uio_t *uio, zfs_uio_rw_t rw)
{
long res;
size_t skip = uio->uio_skip;
size_t skip = uio->uio_iter->iov_offset;
size_t len = uio->uio_resid - skip;
unsigned int gup_flags = 0;
unsigned long addr;
unsigned long nr_pages;
ASSERT3U(uio->uio_segflg, ==, UIO_ITER);
/*
* Kernel 6.2 introduced the FOLL_PCI_P2PDMA flag. This flag could
* possibly be used here in the future to allow for P2P operations with
@@ -577,7 +576,7 @@ static int
zfs_uio_get_dio_pages_iov_iter(zfs_uio_t *uio, zfs_uio_rw_t rw)
{
size_t start;
size_t wanted = uio->uio_resid - uio->uio_skip;
size_t wanted = uio->uio_resid;
ssize_t rollback = 0;
ssize_t cnt;
unsigned maxpages = DIV_ROUND_UP(wanted, PAGE_SIZE);
@@ -611,7 +610,7 @@ zfs_uio_get_dio_pages_iov_iter(zfs_uio_t *uio, zfs_uio_rw_t rw)
#endif
}
ASSERT3U(rollback, ==, uio->uio_resid - uio->uio_skip);
ASSERT3U(rollback, ==, uio->uio_resid);
iov_iter_revert(uio->uio_iter, rollback);
return (0);
+14 -29
View File
@@ -265,6 +265,7 @@ zfs_sync(struct super_block *sb, int wait, cred_t *cr)
{
(void) cr;
zfsvfs_t *zfsvfs = sb->s_fs_info;
ASSERT3P(zfsvfs, !=, NULL);
/*
* Semantically, the only requirement is that the sync be initiated.
@@ -273,39 +274,23 @@ zfs_sync(struct super_block *sb, int wait, cred_t *cr)
if (!wait)
return (0);
if (zfsvfs != NULL) {
/*
* Sync a specific filesystem.
*/
dsl_pool_t *dp;
int error;
if ((error = zfs_enter(zfsvfs, FTAG)) != 0)
return (error);
dp = dmu_objset_pool(zfsvfs->z_os);
/*
* If the system is shutting down, then skip any
* filesystems which may exist on a suspended pool.
*/
if (spa_suspended(dp->dp_spa)) {
zfs_exit(zfsvfs, FTAG);
return (0);
}
if (zfsvfs->z_log != NULL)
zil_commit(zfsvfs->z_log, 0);
int err = zfs_enter(zfsvfs, FTAG);
if (err != 0)
return (err);
/*
* If the pool is suspended, just return an error. This is to help
* with shutting down with pools suspended, as we don't want to block
* in that case.
*/
if (spa_suspended(zfsvfs->z_os->os_spa)) {
zfs_exit(zfsvfs, FTAG);
} else {
/*
* Sync all ZFS filesystems. This is what happens when you
* run sync(1). Unlike other filesystems, ZFS honors the
* request by waiting for all pools to commit all dirty data.
*/
spa_sync_allpools();
return (SET_ERROR(EIO));
}
zil_commit(zfsvfs->z_log, 0);
zfs_exit(zfsvfs, FTAG);
return (0);
}
+11 -1
View File
@@ -341,14 +341,20 @@ zpl_snapdir_rmdir(struct inode *dip, struct dentry *dentry)
return (error);
}
#if defined(HAVE_IOPS_MKDIR_USERNS)
static int
#ifdef HAVE_IOPS_MKDIR_USERNS
zpl_snapdir_mkdir(struct user_namespace *user_ns, struct inode *dip,
struct dentry *dentry, umode_t mode)
#elif defined(HAVE_IOPS_MKDIR_IDMAP)
static int
zpl_snapdir_mkdir(struct mnt_idmap *user_ns, struct inode *dip,
struct dentry *dentry, umode_t mode)
#elif defined(HAVE_IOPS_MKDIR_DENTRY)
static struct dentry *
zpl_snapdir_mkdir(struct mnt_idmap *user_ns, struct inode *dip,
struct dentry *dentry, umode_t mode)
#else
static int
zpl_snapdir_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
#endif
{
@@ -376,7 +382,11 @@ zpl_snapdir_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
ASSERT3S(error, <=, 0);
crfree(cr);
#if defined(HAVE_IOPS_MKDIR_DENTRY)
return (ERR_PTR(error));
#else
return (error);
#endif
}
/*
+2 -9
View File
@@ -226,7 +226,7 @@ zpl_iter_read(struct kiocb *kiocb, struct iov_iter *to)
ssize_t count = iov_iter_count(to);
zfs_uio_t uio;
zfs_uio_iov_iter_init(&uio, to, kiocb->ki_pos, count, 0);
zfs_uio_iov_iter_init(&uio, to, kiocb->ki_pos, count);
crhold(cr);
cookie = spl_fstrans_mark();
@@ -276,8 +276,7 @@ zpl_iter_write(struct kiocb *kiocb, struct iov_iter *from)
if (ret)
return (ret);
zfs_uio_iov_iter_init(&uio, from, kiocb->ki_pos, count,
from->iov_offset);
zfs_uio_iov_iter_init(&uio, from, kiocb->ki_pos, count);
crhold(cr);
cookie = spl_fstrans_mark();
@@ -1004,12 +1003,6 @@ zpl_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return (zpl_ioctl_getdosflags(filp, (void *)arg));
case ZFS_IOC_SETDOSFLAGS:
return (zpl_ioctl_setdosflags(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONE:
return (zpl_ioctl_ficlone(filp, (void *)arg));
case ZFS_IOC_COMPAT_FICLONERANGE:
return (zpl_ioctl_ficlonerange(filp, (void *)arg));
case ZFS_IOC_COMPAT_FIDEDUPERANGE:
return (zpl_ioctl_fideduperange(filp, (void *)arg));
default:
return (-ENOTTY);
}
-82
View File
@@ -212,85 +212,3 @@ zpl_dedupe_file_range(struct file *src_file, loff_t src_off,
return (-EOPNOTSUPP);
}
#endif /* HAVE_VFS_DEDUPE_FILE_RANGE */
/* Entry point for FICLONE, before Linux 4.5. */
long
zpl_ioctl_ficlone(struct file *dst_file, void *arg)
{
unsigned long sfd = (unsigned long)arg;
struct file *src_file = fget(sfd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op) {
fput(src_file);
return (-EXDEV);
}
size_t len = i_size_read(file_inode(src_file));
ssize_t ret = zpl_clone_file_range_impl(src_file, 0, dst_file, 0, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FICLONERANGE, before Linux 4.5. */
long
zpl_ioctl_ficlonerange(struct file *dst_file, void __user *arg)
{
zfs_ioc_compat_file_clone_range_t fcr;
if (copy_from_user(&fcr, arg, sizeof (fcr)))
return (-EFAULT);
struct file *src_file = fget(fcr.fcr_src_fd);
if (src_file == NULL)
return (-EBADF);
if (dst_file->f_op != src_file->f_op) {
fput(src_file);
return (-EXDEV);
}
size_t len = fcr.fcr_src_length;
if (len == 0)
len = i_size_read(file_inode(src_file)) - fcr.fcr_src_offset;
ssize_t ret = zpl_clone_file_range_impl(src_file, fcr.fcr_src_offset,
dst_file, fcr.fcr_dest_offset, len);
fput(src_file);
if (ret < 0) {
if (ret == -EOPNOTSUPP)
return (-ENOTTY);
return (ret);
}
if (ret != len)
return (-EINVAL);
return (0);
}
/* Entry point for FIDEDUPERANGE, before Linux 4.5. */
long
zpl_ioctl_fideduperange(struct file *filp, void *arg)
{
(void) arg;
/* No support for dedup yet */
return (-ENOTTY);
}
+17 -4
View File
@@ -374,14 +374,20 @@ zpl_unlink(struct inode *dir, struct dentry *dentry)
return (error);
}
#if defined(HAVE_IOPS_MKDIR_USERNS)
static int
#ifdef HAVE_IOPS_MKDIR_USERNS
zpl_mkdir(struct user_namespace *user_ns, struct inode *dir,
struct dentry *dentry, umode_t mode)
#elif defined(HAVE_IOPS_MKDIR_IDMAP)
static int
zpl_mkdir(struct mnt_idmap *user_ns, struct inode *dir,
struct dentry *dentry, umode_t mode)
#elif defined(HAVE_IOPS_MKDIR_DENTRY)
static struct dentry *
zpl_mkdir(struct mnt_idmap *user_ns, struct inode *dir,
struct dentry *dentry, umode_t mode)
#else
static int
zpl_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
#endif
{
@@ -390,12 +396,14 @@ zpl_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
znode_t *zp;
int error;
fstrans_cookie_t cookie;
#if !(defined(HAVE_IOPS_MKDIR_USERNS) || defined(HAVE_IOPS_MKDIR_IDMAP))
#if !(defined(HAVE_IOPS_MKDIR_USERNS) || \
defined(HAVE_IOPS_MKDIR_IDMAP) || defined(HAVE_IOPS_MKDIR_DENTRY))
zidmap_t *user_ns = kcred->user_ns;
#endif
if (is_nametoolong(dentry)) {
return (-ENAMETOOLONG);
error = -ENAMETOOLONG;
goto err;
}
crhold(cr);
@@ -422,9 +430,14 @@ zpl_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
spl_fstrans_unmark(cookie);
kmem_free(vap, sizeof (vattr_t));
crfree(cr);
ASSERT3S(error, <=, 0);
err:
ASSERT3S(error, <=, 0);
#if defined(HAVE_IOPS_MKDIR_DENTRY)
return (error != 0 ? ERR_PTR(error) : NULL);
#else
return (error);
#endif
}
static int
+56 -1
View File
@@ -31,6 +31,7 @@
#include <sys/zfs_ctldir.h>
#include <sys/zpl.h>
#include <linux/iversion.h>
#include <linux/version.h>
static struct inode *
@@ -105,6 +106,42 @@ zpl_put_super(struct super_block *sb)
ASSERT3S(error, <=, 0);
}
/*
* zfs_sync() is the underlying implementation for the sync(2) and syncfs(2)
* syscalls, via sb->s_op->sync_fs().
*
* Before kernel 5.17 (torvalds/linux@5679897eb104), syncfs() ->
* sync_filesystem() would ignore the return from sync_fs(), instead only
* considing the error from syncing the underlying block device (sb->s_dev).
* Since OpenZFS doesn't _have_ an underlying block device, there's no way for
* us to report a sync directly.
*
* However, in 5.8 (torvalds/linux@735e4ae5ba28) the superblock gained an extra
* error store `s_wb_err`, to carry errors seen on page writeback since the
* last call to syncfs(). If sync_filesystem() does not return an error, any
* existing writeback error on the superblock will be used instead (and cleared
* either way). We don't use this (page writeback is a different thing for us),
* so for 5.8-5.17 we can use that instead to get syncfs() to return the error.
*
* Before 5.8, we have no other good options - no matter what happens, the
* userspace program will be told the call has succeeded, and so we must make
* it so, Therefore, when we are asked to wait for sync to complete (wait ==
* 1), if zfs_sync() has returned an error we have no choice but to block,
* regardless of the reason.
*
* The 5.17 change was backported to the 5.10, 5.15 and 5.16 series, and likely
* to some vendor kernels. Meanwhile, s_wb_err is still in use in 6.15 (the
* mainline Linux series at time of writing), and has likely been backported to
* vendor kernels before 5.8. We don't really want to use a workaround when we
* don't have to, but we can't really detect whether or not sync_filesystem()
* will return our errors (without a difficult runtime test anyway). So, we use
* a static version check: any kernel reporting its version as 5.17+ will use a
* direct error return, otherwise, we'll either use s_wb_err if it was detected
* at configure (5.8-5.16 + vendor backports). If it's unavailable, we will
* block to ensure the correct semantics.
*
* See https://github.com/openzfs/zfs/issues/17416 for further discussion.
*/
static int
zpl_sync_fs(struct super_block *sb, int wait)
{
@@ -115,10 +152,28 @@ zpl_sync_fs(struct super_block *sb, int wait)
crhold(cr);
cookie = spl_fstrans_mark();
error = -zfs_sync(sb, wait, cr);
#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 17, 0)
#ifdef HAVE_SUPER_BLOCK_S_WB_ERR
if (error && wait)
errseq_set(&sb->s_wb_err, error);
#else
if (error && wait) {
zfsvfs_t *zfsvfs = sb->s_fs_info;
ASSERT3P(zfsvfs, !=, NULL);
if (zfs_enter(zfsvfs, FTAG) == 0) {
txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
zfs_exit(zfsvfs, FTAG);
error = 0;
}
}
#endif
#endif /* < 5.17.0 */
spl_fstrans_unmark(cookie);
crfree(cr);
ASSERT3S(error, <=, 0);
ASSERT3S(error, <=, 0);
return (error);
}

Some files were not shown because too many files have changed in this diff Show More